[Fwd: Re: continue keyword]

Detering Dirk wrote:
> I would be really, really interested in seeing a simplified
> (but not oversimplified!) fully functional prototype version
> of this usecase written in your style, and than let us
> (well, not me, but the experts ;) ) show how that would be
> solved in The Scala Way.
>
> It seems a sufficiently complex, but not too complex,
> properly isolated problem for a showcase, and even coming
> out of real world necessities.
>

A hearty +1 to that!

I'm someone from the Java / traditional procedural OO world who has been
watching Scala and other functional hybrids with interest (though as yet
without time to devote to really getting my feet wet). Though by all rights
my bias should be for adding the continue and break (I've written plenty of
file parsing loops, and have used these constructs now and then), from what I've seen,
I find the alternative compelling, and actually find

data.takeWhile(_.time < endTime).filter(_.time > startTime).map(line
=> println("Found "+line.recordType+" at "+line.time))

quite comprehensible even though my grasp of Scala syntax is shaky at best.

As someone still deciding whether and to what degree to take the plunge into Scala
or FP in general, it would be EXTREMELY useful to see the non-simplified version of
this in several forms:

* The procedural without continue and break
* The procedural with continue and break (even if hypothetical)
* However many FP ways of doing it the various FP proponents think are better,
perhaps including one or more FP version using continue and break
* Perhaps a version with a bit of DSL work to soup up the readability?

It doesn't even matter if there is remotely a consensus afterwards as to which
is more readable, it would still be valuable.

Eric

Re: [Fwd: Re: continue keyword]

Yeah, it returns a Seq[String] which is backed by a List[String].

And yeah, the for-comprehension is just a sequence of calls to filter, map, and flatMap, which preserve the laziness of ManagedSequence.

And yes, your revised version works.

--j

On Fri, Mar 20, 2009 at 3:08 PM, Brett Knights <brett [at] knightsofthenet [dot] com> wrote:
Jorge Ortiz wrote:

   Doesn't this one still read all the lines?

 No. The 'lines' method returns a ManagedSequence[String], which is a lazy "collection". Evaluation (and thus, file access) will be delayed until it's needed. Because of the takeWhile condition, evaluation (and thus, file access) will terminate early. Thanks to lazy evaluation, it also processes the file in O(1) (constant) memory.

Your code, on the other hand, will not only read the entire file, it will put the entire file into memory (O(N)), as readLines returns a (strict) List[String]. Not only will your code read the entire file, it will throw OOMException for large files.

--j
Actually I thought that I was using readLines from scalax that produces a Seq[String]. Oh well. It also works just as well to use lines so the following is revised:


      import scalax.io.Implicits._
      import java.io.File

      val infile = new File(args(0));

      var go = true;
      for { l <- infile.lines.takeWhile(l => go)}{
          l.trim() match{ // no need for continue
                  case "ted" => go = false; // break handled
                  case x => println(x);
              }
      }

but what I think you are saying is that your original for comprehension is actually being pulled from only so long as the takeWhile returns true. That's pretty cool if that is what's happening.

(for {l <- infile.toFile.lines
... guards etc
    } yield LineTime(line, time)).takeWhile

Re: [Fwd: Re: continue keyword]

On further reflection, another way to write it (without takeWhile) would be:

  var go = true
  for (line <- infile.toFile.lines if go) {
    ...
  }

--j

--j

On Fri, Mar 20, 2009 at 3:19 PM, Jorge Ortiz <jorge [dot] ortiz [at] gmail [dot] com> wrote:
Yeah, it returns a Seq[String] which is backed by a List[String].

And yeah, the for-comprehension is just a sequence of calls to filter, map, and flatMap, which preserve the laziness of ManagedSequence.

And yes, your revised version works.

--j

On Fri, Mar 20, 2009 at 3:08 PM, Brett Knights <brett [at] knightsofthenet [dot] com> wrote:
Jorge Ortiz wrote:

   Doesn't this one still read all the lines?

 No. The 'lines' method returns a ManagedSequence[String], which is a lazy "collection". Evaluation (and thus, file access) will be delayed until it's needed. Because of the takeWhile condition, evaluation (and thus, file access) will terminate early. Thanks to lazy evaluation, it also processes the file in O(1) (constant) memory.

Your code, on the other hand, will not only read the entire file, it will put the entire file into memory (O(N)), as readLines returns a (strict) List[String]. Not only will your code read the entire file, it will throw OOMException for large files.

--j
Actually I thought that I was using readLines from scalax that produces a Seq[String]. Oh well. It also works just as well to use lines so the following is revised:


      import scalax.io.Implicits._
      import java.io.File

      val infile = new File(args(0));

      var go = true;
      for { l <- infile.lines.takeWhile(l => go)}{
          l.trim() match{ // no need for continue
                  case "ted" => go = false; // break handled
                  case x => println(x);
              }
      }

but what I think you are saying is that your original for comprehension is actually being pulled from only so long as the takeWhile returns true. That's pretty cool if that is what's happening.

(for {l <- infile.toFile.lines
... guards etc
    } yield LineTime(line, time)).takeWhile

Re: [Fwd: Re: continue keyword]

Jorge Ortiz wrote:
> On further reflection, another way to write it (without takeWhile)
> would be:
>
> var go = true
> for (line <- infile.toFile.lines if go) {
> ...
> }
>
according to the article I cited that would produce the correct output
but would end up reading all the lines (and I assume discarding them --
no OOM at least) because the guard doesn't stop the generator so once go
has been set to true all the rest of the lines in the file will just be
read and skipped.

This has certainly been an educational thread.

Re: [Fwd: Re: continue keyword]

Ahh, yes, you're correct. Every line will be read and discarded, but not processed.

--j

On Fri, Mar 20, 2009 at 3:36 PM, Brett Knights <brett [at] knightsofthenet [dot] com> wrote:
Jorge Ortiz wrote:
On further reflection, another way to write it (without takeWhile) would be:

 var go = true
 for (line <- infile.toFile.lines if go) {
   ...
 }

according to the article I cited that would produce the correct output but would end up reading all the lines (and I assume discarding them -- no OOM at least) because the guard doesn't stop the generator so once go has been set to true all the rest of the lines in the file will just be read and skipped.

This has certainly been an educational thread.


Re: [Fwd: Re: continue keyword]

For comprehensions for streams should yield streams (it is just
transformed into filters and flatMaps) and therefore takeWhile should
cause early termination even in the original code.

On 3/20/09, Brett Knights wrote:
>
> >
> Doesn't this one still read all the lines?
>
>

Re: [Fwd: Re: continue keyword]

I think the trick is that if there is a good idea or solution, you learn it only once, but you save a lot on applying it many times afterwards. And even if learning looks hard before you do it... it does not look so hard when you are done.

2009/3/19 Russ Paielli <russ [dot] paielli [at] gmail [dot] com>
Wow, that was fast! I doubt your first cut is completely correct, but it's probably "close enough for government work," as they say. You're obviously a very proficient Scala programmer.

But here's an important point. Your approach would have been much harder for me (and most other non-experts in Scala) to program than the simple break/continue version. Now, if this were critical production code, you could conceivably make a case that it shouldn't use break or continue (I don't think I would buy it, but you could make the case). But this is not critical production code. It is simply a utility for converting an input file for research purposes. So why should I be forced to do it the harder way, without break or continue? In this case, I just want something that llooks reasonable and works so I can move on to more important issues.

Russ P.

On Thu, Mar 19, 2009 at 1:03 PM, Jorge Ortiz <jorge [dot] ortiz [at] gmail [dot] com> wrote:
This is my best shot at recreating that in Scala. I'm assuming Scalax's IO library, which is lazy.

The "skip" conditions have to be turned into "keep" conditions, but that's just a matter of applying logical NOT to them.

  import scalax.data.Implicits._
  import scalax.io.Implicits._

  case class LineTime(line: String, time: Float)

  (for {
    l <- infile.toFile.lines
    val line = l.trim
    if (line != "" && !line.startsWith("#"))

    val data = line.split(",").toList
    if data.length >= 3

    val ac = data(2)
    if (!acpair.isDefined || acpair.contains(ac) || line.startsWith("WND "))

    val time = data(1).toFloat * sec
    if (time >= startTime || data(0) != "TRK")
  } yield LineTime(line, time)).takeWhile(_.time <= endTime).map(_.line)

Now, that last line is arguably a bit ugly. It'd be the only thing I'd change if break/continue were made available for Scala collections. I might make the code look like so:

  for {
    l <- infile.toFile.lines
    val line = l.trim
    if (line != "" && !line.startsWith("#"))

    val data = line.split(",").toList
    if data.length >= 3

    val ac = data(2)
    if (!acpair.isDefined || acpair.contains(ac) || line.startsWith("WND "))

    val time = data(1).toFloat * sec
    if (time >= startTime || data(0) != "TRK")
    val _ = if (time > endTime) break
  } yield line

--j

On Thu, Mar 19, 2009 at 12:37 PM, Russ Paielli <russ [dot] paielli [at] gmail [dot] com> wrote:
Detering Dirk wrote:
> I would be really, really interested in seeing a simplified
> (but not oversimplified!) fully functional prototype version
> of this usecase written in your style, and than let us
> (well, not me, but the experts ;) ) show how that would be
> solved in The Scala Way.
>
> It seems a sufficiently complex, but not too complex,
> properly isolated problem for a showcase, and even coming
> out of real world necessities.
>

OK, you asked for it, so here is a sample for you. This is a section from one of several Python scripts I use for processing the air traffic data files I mentioned. This is the simplest of those scripts, but it has the basics. Sorry, but this section of code by itself is not fully functional. First, here is the header comment just to give you a clue what it is doing:

"""
This Python script by Russ Paielli extracts data for specified flights
and a specified time window around an operational error from a TSAFE
input data file (default: TSAFE-all.in). It puts the data into another
smaller TSAFE input file (default: TSAFE.in). By default, the aircraft
IDs of the conflict pair are specified in the input file ACpair.dat. If
the --all option is used, all flights are extracted for the specified
time period.
"""

And here is the main looping portion of the script:

count = 0

for line in file(infile): # main loop

    line = line.strip()
    if not line or line.startswith("#"): print >> out, line; continue # skip blank line or comment

    data = line.split()
    if len(data) < 3: continue # bad data line
    AC = data[2] # aircraft ID

    if ACpair and AC not in ACpair and not line.startswith("WND "): continue

    time = float(data[1]) * sec

    if time < starttime and data[0] == "TRK": continue
    if time > endtime: break

    print >> out, line

    count += 1





--
http://RussP.us



--
Thanks,
-Vlad

Re: [Fwd: Re: continue keyword]

Oops, my split should split on whitespace, not commas. It's been a while since I did Python, sorry.

--j

On Thu, Mar 19, 2009 at 1:03 PM, Jorge Ortiz <jorge [dot] ortiz [at] gmail [dot] com> wrote:
This is my best shot at recreating that in Scala. I'm assuming Scalax's IO library, which is lazy.

The "skip" conditions have to be turned into "keep" conditions, but that's just a matter of applying logical NOT to them.

  import scalax.data.Implicits._
  import scalax.io.Implicits._

  case class LineTime(line: String, time: Float)

  (for {
    l <- infile.toFile.lines
    val line = l.trim
    if (line != "" && !line.startsWith("#"))

    val data = line.split(",").toList
    if data.length >= 3

    val ac = data(2)
    if (!acpair.isDefined || acpair.contains(ac) || line.startsWith("WND "))

    val time = data(1).toFloat * sec
    if (time >= startTime || data(0) != "TRK")
  } yield LineTime(line, time)).takeWhile(_.time <= endTime).map(_.line)

Now, that last line is arguably a bit ugly. It'd be the only thing I'd change if break/continue were made available for Scala collections. I might make the code look like so:

  for {
    l <- infile.toFile.lines
    val line = l.trim
    if (line != "" && !line.startsWith("#"))

    val data = line.split(",").toList
    if data.length >= 3

    val ac = data(2)
    if (!acpair.isDefined || acpair.contains(ac) || line.startsWith("WND "))

    val time = data(1).toFloat * sec
    if (time >= startTime || data(0) != "TRK")
    val _ = if (time > endTime) break
  } yield line

--j

On Thu, Mar 19, 2009 at 12:37 PM, Russ Paielli <russ [dot] paielli [at] gmail [dot] com> wrote:
Detering Dirk wrote:
> I would be really, really interested in seeing a simplified
> (but not oversimplified!) fully functional prototype version
> of this usecase written in your style, and than let us
> (well, not me, but the experts ;) ) show how that would be
> solved in The Scala Way.
>
> It seems a sufficiently complex, but not too complex,
> properly isolated problem for a showcase, and even coming
> out of real world necessities.
>

OK, you asked for it, so here is a sample for you. This is a section from one of several Python scripts I use for processing the air traffic data files I mentioned. This is the simplest of those scripts, but it has the basics. Sorry, but this section of code by itself is not fully functional. First, here is the header comment just to give you a clue what it is doing:

"""
This Python script by Russ Paielli extracts data for specified flights
and a specified time window around an operational error from a TSAFE
input data file (default: TSAFE-all.in). It puts the data into another
smaller TSAFE input file (default: TSAFE.in). By default, the aircraft
IDs of the conflict pair are specified in the input file ACpair.dat. If
the --all option is used, all flights are extracted for the specified
time period.
"""

And here is the main looping portion of the script:

count = 0

for line in file(infile): # main loop

    line = line.strip()
    if not line or line.startswith("#"): print >> out, line; continue # skip blank line or comment

    data = line.split()
    if len(data) < 3: continue # bad data line
    AC = data[2] # aircraft ID

    if ACpair and AC not in ACpair and not line.startswith("WND "): continue

    time = float(data[1]) * sec

    if time < starttime and data[0] == "TRK": continue
    if time > endtime: break

    print >> out, line

    count += 1



Copyright © 2013 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland