jsoup 1.11.3发布，Java的HTML解析器

2018-4-16 22:24| 发布者: joejoe0332| 查看: 350| 评论: 0|原作者: oschina|来自: oschina

摘要: jsoup 是一款 Java 的HTML 解析器，可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API，可通过DOM，CSS以及类似于JQuery的操作方法来取出和操作数据。jsoup的主要功能如下：从一个URL，文件或字符串 ...

jsoup 是一款 Java 的HTML 解析器，可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API，可通过DOM，CSS以及类似于JQuery的操作方法来取出和操作数据。

jsoup的主要功能如下：

jsoup是基于MIT协议发布的，可放心使用于商业项目。

改进

CDATA sections are now treated as whitespace preserving (regardless of the containing element), and are round-tripped into output HTML.
Added support for Deflate encoding.
When parsing <pre> tags, skip the first newline if present.
Support nested quotes for attribute selection queries.
Character references from Windows-1252 that are not valid Unicode are mapped to the appropriate Unicode replacement.
Accept a custom SSL socket factory in Jsoup.Connection. Note that Connection.validateTLSCertificates() will be removed in the next release; Connection.sslSocketFactory(SSLSocketFactory sslSocketFactory)provides a path to implement a workaround if you need to keep using a similar approach.

Bug 修复

Bugfix: A Mark has been invalidated exception was thrown when parsing some URLs on Android <= 6.
Bugfix: The Element.text() for <div>One</div>Two was OneTwo, not One Two.
Bugfix: boolean attributes with empty string values were not collapsing in HTML output.
Bugfix: when using the XML Parser set to lowercase normalize tags, uppercase closing tags were not correctly handled.
Bugfix: when parsing from a URL, an end tag could be read incorrectly if it started on a buffer boundary.

完整内容请查看发布主页和下载地址。

上一篇：Python 2.7.15 candidate 1现已发布下一篇：Hibernate Search 5.10.0.Beta2发布，数据检索框架

帐号		自动登录	找回密码
密码			注册

最新评论