设为首页收藏本站

LUPA开源社区

 找回密码
 注册
文章 帖子 博客
LUPA开源社区 首页 业界资讯 软件追踪 查看内容

jsoup 1.10.2发布,Java的HTML解析器

2017-1-5 22:39| 发布者: joejoe0332| 查看: 759| 评论: 0|原作者: oschina|来自: oschina

摘要: jsoup 1.10.2 发布了,该版本带来了更快的启动时间,扩展 DOM 树的遍历,提升了 HTTP 兼容性以及修复了一些 bug。详情包括:ImprovementsImproved startup time, particularly on Android, by reducing garbage gene ...

jsoup 1.10.2 发布了,该版本带来了更快的启动时间,扩展 DOM 树的遍历,提升了 HTTP 兼容性以及修复了一些 bug。

详情包括:

Improvements

  • Improved startup time, particularly on Android, by reducing garbage generation and CPU execution time when loading the HTML entity files. About 1.72x faster in this area.

  • Added Element.is(query) to check if an element matches this CSS query.

  • Added new methods to Elements: next(query), nextAll(query), prev(query), prevAll(query) to select next and previous element siblings from a current selection, with optional selectors.

  • Added Node.root() to get the topmost ancestor of a Node.

  • Added the new selector :containsData(), to find elements that hold data, like script and style tags.

  • Changed Jsoup.isValid(bodyHtml) to validate that the input contains only body HTML that is safe according to the whitelist, and does not include HTML errors. And in the Jsoup.Cleaner.isValid(Document) method, make sure the doc only includes body HTML.

  • In Whitelists, validate that a removed protocol exists before removing said protocol.

  • Allow the Jsoup.Connect thread to be interrupted when reading the input stream; helps when reading from a long stream of data that doesn't read timeout.

  • Jsoup.Connect now uses a desktop user agent by default. Many developers were getting caught by not specifying the user agent, and sending the default Java. That causes many servers to return different content than what they would to a desktop browser, and what the developer was expecting.

  • Increased the default connect/read timeout in Jsoup.Connect to 30 seconds.

  • Jsoup.Connect now detects if a header value is actually in UTF-8 vs the HTTP spec of ISO-8859, and converts the header value appropriately. This improves compatibility with servers that are configured incorrectly.

Fixes

  • Bugfix: in Jsoup.Connect, URLs containing non-URL-safe characters were not encoded to URL safe correctly.

  • Bugfix: a "SYSTEM" flag in doctype tags would be incorrectly removed.

  • Bugfix: removing attributes from an Element with removeAttr() would cause a ConcurrentModificationException.

  • Bugfix: the contents of Comment nodes were not returned by Element.data()

  • Bugfix: if source checked out on Windows with git autocrlf=true, Entities.load would fail because of the r char.

下载地址:https://jsoup.org/download

酷毙

雷人

鲜花

鸡蛋

漂亮
  • 快毕业了,没工作经验,
    找份工作好难啊?
    赶紧去人才芯片公司磨练吧!!

最新评论

关于LUPA|人才芯片工程|人才招聘|LUPA认证|LUPA教育|LUPA开源社区 ( 浙B2-20090187 浙公网安备 33010602006705号   

返回顶部