Any good Scala/Java solutions for sanitizing HTML?

I'd like to use the Scala XPath features, and it's quite possible some
of the HTML I'll be dealing with won't be properly formatted. Can
someone recommend a good sanitizer?

Thanks,
Ken

Re: Any good Scala/Java solutions for sanitizing HTML?

Kenneth McDonald schrieb:
> I'd like to use the Scala XPath features, and it's quite possible some
> of the HTML I'll be dealing with won't be properly formatted. Can
> someone recommend a good sanitizer?

http://www.nabble.com/How-to-use-TagSoup-with-Scala-XML--td17575225.html

- Florian

Re: Any good Scala/Java solutions for sanitizing HTML?

I was looking into this recently, and I found an article that was helpful. The comments are worth reading too.

http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/

Cheers
Rich

On Wed, Jan 21, 2009 at 7:22 PM, Florian Hars <hars [at] bik-gmbh [dot] de> wrote:
Kenneth McDonald schrieb:
> I'd like to use the Scala XPath features, and it's quite possible some
> of the HTML I'll be dealing with won't be properly formatted. Can
> someone recommend a good sanitizer?

http://www.nabble.com/How-to-use-TagSoup-with-Scala-XML--td17575225.html

- Florian

--
http://www.richdougherty.com/

Re: Any good Scala/Java solutions for sanitizing HTML?

Rich Dougherty schrieb:
> I was looking into this recently, and I found an article that was
> helpful. The comments are worth reading too.
>
> http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/

Most are DOM parsers, while scala wants SAX. I put up code for the two
that are usable without a DOM2SAX converter there:

http://www.hars.de/2009/01/html-as-xml-in-scala.html

- Florian

Copyright © 2013 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland