You can change the OutputSettings for this:
Example:
final String html = ...; OutputSettings settings = new OutputSettings(); settings.escapeMode(Entities.EscapeMode.xhtml); String cleanHtml = Jsoup.clean(html, "", Whitelist.relaxed(), settings); This is possible with a Document parsed by Jsoup too:
Document doc = Jsoup.parse(...); doc.outputSettings().escapeMode(Entities.EscapeMode.xhtml); // ... Edit:
Removing tags:
doc.select("p:matchesOwn((?is) )").remove(); Please note: after (?is) there's not a blank, but char #160 (= nbsp). This will remove all p-Tags who's own text is only a . If you want do so with all other tags, you can replace the p: with *:.
Read full article from java - alternative of JSoup or how to clean whitespaces - Stack Overflow
No comments:
Post a Comment