Sunday, July 11, 2010

jsoup : A Java library for working with real-world HTML

jsoup is a Java library for working with real-world HTML. It can parse HTML from a URL, file, or string. It can find and extract data, using DOM traversal or CSS selectors. The HTML elements, attributes, and text can be manipulated. It can clean user-submitted content against a safe white-list. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup; jsoup will create a sensible parse tree.
  • Licenses : MIT/X
  • Operating Systems : Java, Cross Platform
  • Implementation : Java, Java 5, HTML

No comments: