jsoup 刚刚发布了 1.7.3 版本,改进了表单处理、更可靠的字符集检测、CSS 选择器和解析的性能提升以及内存优化,修复了一些 bug。 jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。 jsoup的主要功能如下:
jsoup是基于MIT协议发布的,可放心使用于商业项目。 详细改进内容如下: Improvements: - Added the element type FormElement, to facilitate simple form submissions. Find forms in a doc using Elements.forms(), then prepare it for submission with FormElement.submit(). - Improved the reliability of HTTP character-set recognition from response headers, particularly for when servers return out-of-spec responses. - Added Document.location() to retrieve the document's location URL. Handy if the request was redirected from the original URL. - Large decrease in the amount of temporary objects created during parsing, leading to less GC load (helpful particularly on Android), and faster parsing. - Improved the time to match elements with common CSS selectors by ~ 27%. Bug Fixes: - Fixed support for self-closing script tags. - Fixed a crash when reading an unterminated CDATA section. - Fixed an issue where elements added via the adoption agency algorithm did not preserve their attributes. - Fixed an issue when cloning a document with extremely nested elements that could cause a stack-overflow. - Fixed an issue when connecting or redirecting to a URL that contains a space. |