- Licenses : MIT/X
- Operating Systems : Java, Cross Platform
- Implementation : Java, Java 5, HTML
Sunday, July 11, 2010
jsoup : A Java library for working with real-world HTML
jsoup is a Java library for working with real-world HTML. It can parse HTML from a URL, file, or string. It can find and extract data, using DOM traversal or CSS selectors. The HTML elements, attributes, and text can be manipulated. It can clean user-submitted content against a safe white-list. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup; jsoup will create a sensible parse tree.