- About Scala
- Documentation
- Code Examples
- Software
- Scala Developers
More elegant way of reading HTML from a URL than this?
Here's a bit of code I wrote to read the HTML from a URL, and return
it as a string. I was wondering if a Scala guru could show me the
"right" way to do this. I'm sure there's a more elegant solution.
-----------------------------
class URLLineReader(url:String) extends Iterator[String] {
val reader = new java.io.BufferedReader(new
java.io.InputStreamReader(new java.net.URL(url).openStream()))
var line:String = null;
def hasNext = {
line = reader.readLine()
line != null
}
def next = line
}
object Main {
def main(args: Array[String]) {
val reader = new URLLineReader("http://www.yahoo.com/")
val html = (for (line <- reader) yield line).mkString("")
println(html)
}
}
------------------------------










Re: More elegant way of reading HTML from a URL than this?
here the URLLineReader using the java.util.Scanner
--------------------
class URLLineReader(urlstring:String) extends Iterator[String] {
val url = new java.net.URL(urlstring)
val scan = new java.util.Scanner(url.openStream)
def hasNext = scan.hasNextLine
def next = scan.nextLine
}
--------------------
and if you like to read the text in one piece
--------------------
def text(urlstring:String):String = {
val url = new java.net.URL(urlstring)
val scan = new java.util.Scanner(url.openStream)
scan.useDelimiter("\\Z") /* End Of File */
scan.next
}
--------------------
Re: More elegant way of reading HTML from a URL than this?
InputStreamResource.url("http://...").readString
InputStreamResource.url("http://...").readLines
InputStreamResource.url("http://...").lines.foreach(println(_))
InputStreamResource is part of scalax.
BTW, I think InputStreamResource-like classes must be included into
the scala standard library.
S.
On Wed, Jan 21, 2009 at 06:39, Kenneth McDonald
wrote:
> Here's a bit of code I wrote to read the HTML from a URL, and return it as a
> string. I was wondering if a Scala guru could show me the "right" way to do
> this. I'm sure there's a more elegant solution.
>
> -----------------------------
> class URLLineReader(url:String) extends Iterator[String] {
> val reader = new java.io.BufferedReader(new java.io.InputStreamReader(new
> java.net.URL(url).openStream()))
> var line:String = null;
>
> def hasNext = {
> line = reader.readLine()
> line != null
> }
>
> def next = line
> }
>
> object Main {
> def main(args: Array[String]) {
> val reader = new URLLineReader("http://www.yahoo.com/")
> val html = (for (line <- reader) yield line).mkString("")
> println(html)
> }
> }
> ------------------------------
>
>
Re: More elegant way of reading HTML from a URL than this?
Stepan Koltsov wrote:
> InputStreamResource.url("http://...").readString
>
> InputStreamResource.url("http://...").readLines
>
> InputStreamResource.url("http://...").lines.foreach(println(_))
>
> InputStreamResource is part of scalax.
>
> BTW, I think InputStreamResource-like classes must be included into
> the scala standard library.
I second that. Scala's included IO package is pretty anemic. At the very
least, it would be nice to have some wrappers similar to JCL to add some
nice scala-ish functionality to existing Java IO classes. Of course, I
don't personally have time to work on it so I can't complain too loudly ;)
Derek
Re: More elegant way of reading HTML from a URL than this?
val reader = new java.io.BufferedReader(new java.io.InputStreamReader(new java.net.URL(url).openStream(), "US-ASCII"));
def foldLeft[T](init: T)(f: (T, String) => T): T = reader.readLine match {
case null => init
case line => foldLeft(f(init, line))(f)
}
}
object Main {
def main(args: Array[String]) = println(new URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
}
Re: More elegant way of reading HTML from a URL than this?
At least the original not-so-precise version was almost linear. This is
quadratic. And pages tend to be quite lengthy these days, so beware.
O/H Ricky Clarkson έγραψε:
> class URLLineReader(url: String) {
> val reader = new java.io.BufferedReader(new
> java.io.InputStreamReader(new java.net.URL(url).openStream(),
> "US-ASCII"));
> def foldLeft[T](init: T)(f: (T, String) => T): T = reader.readLine
> match {
> case null => init
> case line => foldLeft(f(init, line))(f)
> }
> }
>
> object Main {
> def main(args: Array[String]) = println(new
> URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
> }
Re: More elegant way of reading HTML from a URL than this?
2009/1/21 Dimitris Andreou <jim [dot] andreou [at] gmail [dot] com>
Re: More elegant way of reading HTML from a URL than this?
Maybe my scala-code-parsing brain neurons are still too weak, but I
think you wrote the equivalent of:
val lines: Seq[String] = ...
var output = ""
for (line <- lines) output += line
No?
O/H Ricky Clarkson έγραψε:
> How is what I showed quadratic?
>
> 2009/1/21 Dimitris Andreou >
>
> At least the original not-so-precise version was almost linear.
> This is quadratic. And pages tend to be quite lengthy these days,
> so beware.
>
> O/H Ricky Clarkson έγραψε:
>
> class URLLineReader(url: String) {
> val reader = new java.io.BufferedReader(new
> java.io.InputStreamReader(new java.net.URL(url).openStream(),
> "US-ASCII"));
> def foldLeft[T](init: T)(f: (T, String) => T): T =
> reader.readLine match {
> case null => init
> case line => foldLeft(f(init, line))(f)
> }
> }
>
> object Main {
> def main(args: Array[String]) = println(new
> URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
> }
>
>
>
Re: More elegant way of reading HTML from a URL than this?
object Main {
def main(args: Array[String]) = println(new URLLineReader("http://www.yahoo.com/").foldLeft(new StringBuilder)(_ append _))
}
2009/1/21 Dimitris Andreou <jim [dot] andreou [at] gmail [dot] com>
Re: More elegant way of reading HTML from a URL than this?
On Wed, Jan 21, 2009 at 12:00 PM, Dimitris Andreou <jim [dot] andreou [at] gmail [dot] com> wrote:
Re: More elegant way of reading HTML from a URL than this?
Surely. It would be much faster even with the typically modest default
initial size.
I wanted to make the (obvious, in my opinion) point that making an
algorithm so much slower is inexcusable, for whatever kind of elegance's
sake. (I had thought that Ricky consciously chosen this kind of
'elegance' over that performance, but probably by mistake, so it's ok)
O/H Bryan έγραψε:
> If performance is such an issue, couldn't you first get the
> content-length from the HTTP headers and then allocate the initial
> capacity of a StringBuilder with that content-length. StringBuilder's
> append should be faster than String concatenation.
>
> On Wed, Jan 21, 2009 at 12:00 PM, Dimitris Andreou
> > wrote:
>
> Maybe my scala-code-parsing brain neurons are still too weak, but
> I think you wrote the equivalent of:
>
> val lines: Seq[String] = ...
> var output = ""
> for (line <- lines) output += line
>
> No?
>
>
> O/H Ricky Clarkson ������:
>
> How is what I showed quadratic?
>
> 2009/1/21 Dimitris Andreou >>
>
>
> At least the original not-so-precise version was almost linear.
> This is quadratic. And pages tend to be quite lengthy these
> days,
> so beware.
>
> O/H Ricky Clarkson ������:
>
> class URLLineReader(url: String) {
> val reader = new java.io.BufferedReader(new
> java.io.InputStreamReader(new
> java.net.URL(url).openStream(),
> "US-ASCII"));
> def foldLeft[T](init: T)(f: (T, String) => T): T =
> reader.readLine match {
> case null => init
> case line => foldLeft(f(init, line))(f)
> }
> }
>
> object Main {
> def main(args: Array[String]) = println(new
> URLLineReader("http://www.yahoo.com/").foldLeft("")(_ + _))
> }
>
>
>
>
>
Re: More elegant way of reading HTML from a URL than this?
2009/1/21 Dimitris Andreou <jim [dot] andreou [at] gmail [dot] com>
Re: More elegant way of reading HTML from a URL than this?
2009/1/21 Bryan <<..
2009/1/21 Bryan <germish [at] gmail [dot] com>
--
Viktor Klang
Senior Systems Analyst
Re: More elegant way of reading HTML from a URL than this?
Scala apart, it's quite bad style for hasNext() not to be idempotent.
O/H Kenneth McDonald έγραψε:
> Here's a bit of code I wrote to read the HTML from a URL, and return
> it as a string. I was wondering if a Scala guru could show me the
> "right" way to do this. I'm sure there's a more elegant solution.
>
> -----------------------------
> class URLLineReader(url:String) extends Iterator[String] {
> val reader = new java.io.BufferedReader(new
> java.io.InputStreamReader(new java.net.URL(url).openStream()))
> var line:String = null;
>
> def hasNext = {
> line = reader.readLine()
> line != null
> }
>
> def next = line
> }
>
> object Main {
> def main(args: Array[String]) {
> val reader = new URLLineReader("http://www.yahoo.com/")
> val html = (for (line <- reader) yield line).mkString("")
> println(html)
> }
> }
> ------------------------------
>
>
Re: More elegant way of reading HTML from a URL than this?
--
__~O
-\ <, Christos KK Loverdos
(*)/ (*) http://ckkloverdos.com
Re: More elegant way of reading HTML from a URL than this?
So not only it's bad style, it's just plain wrong :)
On Wed, Jan 21, 2009 at 09:07, Dimitris Andreou <jim [dot] andreou [at] gmail [dot] com> wrote:
More elegant way of reading HTML from a URL than this?
-----------------------------
class URLLineReader(url:String) extends Iterator[String] {
val reader = new java.io.BufferedReader(new java.io.InputStreamReader(new java.net.URL(url).openStream()))
var line:String = null;
def hasNext = {
line = reader.readLine()
line != null
}
def next = line
}
object Main {
def main(args: Array[String]) {
val reader = new URLLineReader("http://www.yahoo.com/")
val html = (for (line <- reader) yield line).mkString("")
println(html)
}
}
------------------------------