This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Any good Scala/Java solutions for sanitizing HTML?

3 replies
Kenneth McDonald
Joined: 2009-01-11,
User offline. Last seen 42 years 45 weeks ago.

I'd like to use the Scala XPath features, and it's quite possible some
of the HTML I'll be dealing with won't be properly formatted. Can
someone recommend a good sanitizer?

Thanks,
Ken

Florian Hars
Joined: 2008-12-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Any good Scala/Java solutions for sanitizing HTML?

Kenneth McDonald schrieb:
> I'd like to use the Scala XPath features, and it's quite possible some
> of the HTML I'll be dealing with won't be properly formatted. Can
> someone recommend a good sanitizer?

http://www.nabble.com/How-to-use-TagSoup-with-Scala-XML--td17575225.html

- Florian

Rich Dougherty 2
Joined: 2009-01-19,
User offline. Last seen 42 years 45 weeks ago.
Re: Any good Scala/Java solutions for sanitizing HTML?
I was looking into this recently, and I found an article that was helpful. The comments are worth reading too.

http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/

Cheers
Rich

On Wed, Jan 21, 2009 at 7:22 PM, Florian Hars <hars [at] bik-gmbh [dot] de> wrote:
Kenneth McDonald schrieb:
> I'd like to use the Scala XPath features, and it's quite possible some
> of the HTML I'll be dealing with won't be properly formatted. Can
> someone recommend a good sanitizer?

http://www.nabble.com/How-to-use-TagSoup-with-Scala-XML--td17575225.html

- Florian

--
http://www.richdougherty.com/
Florian Hars
Joined: 2008-12-18,
User offline. Last seen 42 years 45 weeks ago.
Re: Any good Scala/Java solutions for sanitizing HTML?

Rich Dougherty schrieb:
> I was looking into this recently, and I found an article that was
> helpful. The comments are worth reading too.
>
> http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/

Most are DOM parsers, while scala wants SAX. I put up code for the two
that are usable without a DOM2SAX converter there:

http://www.hars.de/2009/01/html-as-xml-in-scala.html

- Florian

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland