Day three and I’m now sort of getting the hang of Scala. Either that or the excercise is easier! I think that in learning about the neat text processing tricks that Scala allows, it helped me to click with it a bit more than in previous days. I’ve also started to get more used to the type system. At least I didn't spend as many hours figuring that bit out today as in the last two days.
The material guides us through some of the more cool features of the language. XML as a first class data type. That is pretty cool. Then there's a look at pattern matching and regexps. I really like the match feature that can be applied to functions and methods. The RegExp implementation seems OK, though the three quotes way of writing them is rather ugly to my eyes. The day ends with a look at concurrency, which is one of Scala's real strong points. I’ve grown rather to like Scala. Not in the same way that I like Ruby. But its definitely more fun and I can imagine, once you know it, more productive than churning out Java code. A good language to play with.
sizer.scala
- Take the sizer application and add a message to count the number of links on a page
- Bonus problem: Make the sizer follow the links on a given page and load them as well. For example, google.com would compute the size for Goolge and all the pages it links too.
Nothing amazingly exciting about my implementation, but it wasn't too hard to write and it was kinda fun. Here is the code.
import scala.io._
import scala.actors._
import Actor._
object PageLoader {
def getPageSize(url: String) = Source.fromURL(url).mkString.length
def getPageSizeWithLinks(url: String): Int = {
val size = getPageSize(url)
return getPageLinks(url).foldLeft(size) { (size,link) =>
println("\t( "+ url +" ) Fetching: " + link)
( size + getPageSize(link) )
}
}
def getPageLinks(url: String): List[String] = {
val regex = """a href=\"(https?:)([^\"]+)\" """.r // nb naive, who says that there will always be one space? Weird that I have to add a space after the regexp.
// something to do with escaped quotes?
regex.findAllIn(Source.fromURL(url).mkString).matchData.map{ m =>
m.group(1) + m.group(2)
}.toList
}
}
val urls = List("http://www.google.com","http://charlieharvey.org.uk/","http://amazon.com","http://base.ox4.org")
def timeMethod(method: () => Unit) = {
val start = System.nanoTime
method()
val end = System.nanoTime
println("Time [" + (end-start)/1000000000.0 + "s]")
}
def getLinks(url: String): List[String] = {
return PageLoader.getPageLinks(url)
}
def getNumLinks(url: String): Int = {
return PageLoader.getPageLinks(url).size
}
def getSizeOfPagesAndLinkedPagesConcurrently() = {
val caller = self
for(url <- urls) {
actor { caller ! (url, PageLoader.getPageSizeWithLinks(url)) }
}
for(i <- 1 to urls.size) {
receive {
case (url,size) =>
println("Size of " + url + ": " + size)
}
}
}
def getPageSizeSequentially() = {
for(url <- urls) {
println("Size of " + url + ": " + PageLoader.getPageSize(url))
}
}
def getPageSizeConcurrently() = {
val caller = self
for(url <- urls) {
actor { caller ! (url, PageLoader.getPageSize(url)) }
}
for(i <- 1 to urls.size) {
receive {
case (url,size) =>
println("Size of " + url + ": " + size)
}
}
}
val break = "\n=======================================================================\n"
println("Sequential run:")
timeMethod { getPageSizeSequentially }
println(break)
println("Concurrent run:")
timeMethod { getPageSizeConcurrently }
println(break)
println("All links from a page (newint.org):")
println(getLinks("http://www.newint.org/").mkString("\n"))
println(break)
println("Number of links on a page(newint.org):")
println(getNumLinks("http://www.newint.org/"))
println(break)
println("Get size of linked pages too:")
timeMethod { getSizeOfPagesAndLinkedPagesConcurrently }
println(break)
As you can see the methods that answer the tasks are getNumLinks and the snappiy named getSizeOfPagesAndLinkedPagesConcurrently . Well at least it does what it says on the tin!
Scala: My thoughts
Tate's writeup is as usual on the ball, at least from what I have seen so far. He points out Scala's strengths: concurrency, leveraging legacy java code, DSLs, XML processing, Bridging the object/functional programming styles.
Then he goes on to point to some weaknesses. I have to agree with Tate that I found the static typing a pain, he also points to the syntax and the introduction of mutability into the language as weaknesses. Although I struggled a bit with the syntax, I’m not sure I’ve got used enough to it to think its especially troublesome. I don't write much java nowadays, so perhaps the problem may be more significant for Java coders. The mutability issue is something that it'd be hard not to hit if you're going to support OO. I guess scala is stuck with that design compromise.
One weakness that I found was the lack of simple examples in the API reference. It would have been super useful to see how to construct a multidimensional array in cut and pasteable code. Documentation with lots of examples is one of the ways that a language can distinguish itself, at least to lazy people like me!