- About Scala
- In the Enterprise
- Scala Community
- Language Research
- In the Press
- The Scala Team
- Scala's Prehistory
- Contact Us
- Learning Scala
- Tour of Scala
- Scala API
- Setup & Getting Started
- Programming Guides
- Other Guides
- Code Examples
- Scala Developers
Created by bagwell on 2011-10-28. Updated: 2011-10-29, 11:33
It is not often we have the opportunity to be right on the crest of a breaking technology wave, especially one built from the ground up with Scala. zeebox has just launched a site that brings the power and excitement of social networking like Facebook, Twitter and Foursquare to watching TV. In this interview with Anthony Rose, CTO & co-founder of zeebox and Kevin Wright, senior engineer they tell you all about it.
Anthony Rose, formerly the head of the highly successful iPlayer for the BBC, and his team have created zeebox, a new dimension in TV viewing. This promises to be as big a change as it was going from black and white TV to colour or to HD. zeebox gives an extra real time dimension, adding richness and depth to any program you watch. Read all about it in the interview.
Interview with Anthony Rose and Kevin Wright from zeebox.
What is zeebox?
zeebox is a companion application you can download to your iPad or use as Web client that brings a wealth of well organised information to your finger tips about the programs you watch on the 60 or so UK TV channels and soon thousands in the rest of the world. Things like actor bios, words spoken, products, music played, related news items, subject tweets, people histories, what your friends are watching, what they think about it, what is popular and a whole lot more. It's all updated in real-time tracking the program content and your friends activities while you watch and hang out in the ultimate TV party.
Having chosen a program to watch, zeebox brings together all the related information for you to browse while you watch. The principle is simple. We extract, or spider, all the salient information about a program second by second, sub-titling, music, natural language processing, video finger printing, OCR'd screen text to create a stream of zeetags. The zeetags are used to assemble a database of related information like, tweets, Wikipedia articles, Google and other search engines, news feeds, Facebook connections and so on. Then based on popularity and significance profiles we organise the zeetags into a priority order which we present to the user in an elegant, continually updated browser page.
What would this be like?
Best to watch a demonstration. But imagine for the moment that you want to watch Top Gear. On your iPad you select zeebox and will see a program guide. This is where the magic starts. You can browse the guide by most popular program, programs about cars, Top Gear by name, what your friends are watching or programs you regularly watch at that time of day. You select Top Gear because you notice one of your friends is already watching it, another keen car enthusiast like yourself. The TV changes to the channel you selected.
How does it do that?
Over 80% of the “smart” or connected TV's use a variant of the DLNA protocol to allow them to be controlled over wi-fi or a LAN. When you select the program zeebox sends the appropriate request to your TV over the LAN and the channel changes just as if you had done it with the remote.
So you are now watching Top Gear. You glance at your iPad and see that zeetags are appearing and changing in real-time. One each for Jeremy Clarkson, Richard Hammond and James May. One for The Stig who is driving the Bugatti Veyron round the test track Stig music playing in the background. A Zeetag appears for the Buggati, Dunsfold Aerodrome and iTunes for the music. Stephen Fry is tweeting about Clarkson’s miss-use of the English language.
Who is the Stig? You click on the tag and his bio is revealed. Click on iTunes and you can down load the music from the store. Check out the Buggati specifications. Click buy? Book a test drive? Well, perhaps not, dream on.
The thought is interrupted as a chat box appears, your friend is talking to you. He has noticed "60 Seconds" is just starting on another channel. You decide to watch it together and one click later both your TVs change channel, and a new set of zeetags start appearing, actors, cars, places, director... You carry on chatting about the cars in the movie. Another friend joins you. "How about going for a drink after the movie?"
How does the friend know?
When you connect to zeebox you can include all you Facebook friends, for example, as shared viewers. Then whenever they watch a program they appear under a Zeetag and they get notified what you watch. Of course you can always turn the sharing off if you wish.
The secret sauce is the extra information around the program. We call this 'augmentainment'. An augmented experience. Some people use it as background, sometimes drama, news , or cooking. While you watch a cooking program recipes, ingredients, chefs, other programs are all displayed in real time. Watching Drama character names, actors, dialog meanings, literary references, other similar programs all pop up. For the news maps, place references, people biographies, other news articles, tweets, statistics, sites. Eventually, even real time translation of content.
How do you do all that in real time?
All built on Cloud services with Scala that ingest live TV, starting with 60 TV channels across the UK, rolling out to hundreds perhaps 1000s across the world in due course. On each channel we ingest as much as possible, recognising second by second what people are saying on TV, what ads are playing, what music is playing, what program is playing, All the information goes into our cloud service.
We have to deal with all the incoming channel data which is second to second for 100s of channels. So the data scales with number of channels. Next we (hope to!) have 100's of thousands of users connected to our program guide and presence servers. The EPG not only shows you what is on TV but also what each of your friends are watching right now, Most Popular programs from all viewers, what your friends are watching right now, etc. This scales in a non-linear way. The presence service is multi homed, to deal with down time and potentially 100's of thousands of connections. The real time chat service is also multi-homed. Grouping is a challenge with users on multiple groups and deciding on group homes is tricky. An engineering challenge. But it all works nicely.
What was the technology thinking?
Being a start-up we chose the cloud because it lets us scale capability and cost. Really important to use the money wisely. For the same reason we use open source and Scala.
With the cloud we can do neat things like replicating the live system and do high volume destructive testing on the copy. We do not have to waste money to buy a bunch of servers that we only need for full load testing.
Today I found the hot developers use Scala. For me I am amazingly impressed by the team’s ability to turn around complicated requirements in a short time. Very impressed how easy it to adapt. Refactoring to meet new needs and rapidly adapt was a big problem. We needed to feel we had the flexibility to meet the unknown.
As we come up with new viewer demanded features the Scala team is very responsive. Scala gives the developers the power to say yes. For me it's faster to get things tested with viewers. I have to come up with some really extreme stuff before they say maybe. One of the most important decisions we took was to hire people who are fabulously good and right at the top of their game.
Perhaps Kevin is the best person to talk about how they do it.
How did you come to choose Scala?
As Anthony said we decided to build a team of really good people. Some already worked with him at the BBC. One of them, Kerry Jones was the prime mover to use Scala – this was before I joined the company. Quite a bit of time was spent looking at Groovy and Clojure for conciseness and expressiveness. C++ was considered for pure speed. Finally Zeebox settled on Scala it gave the necessary expressiveness, with much more opportunity to optimise at the algorithm level, instead of through low level bit-twiddling. Moreover we could find more developers in the market for Scala than Clojure, more developers interested to move to Scala. As and when we need to grow,we know that we can easily find good people who have Java production experience and it's also so much easier to cross train a Java programmer to Scala than to Clojure.
Ultimately, the choice of Scala proved to be a good one for us. As we dug into the problem we found that an immutable functional style mapped beautifully onto the Zeetag extraction while Akka gave us a sound actor style foundation for extreme scaling needed for hundreds of thousands of user request and huge social network graph growth.
It's hard to realize just how much work we do to process 60 TV channels and knowing it will grow to 1000's in the future, and all in real time. This is a massive parallel processing problem.
For every channel we take the subtitles, synopsis, cast, and other feeds from that program. This is then processed using some very clever natural language analysis work, supported in part by external services, to create potential zeetags. The whole project is about integrating data so the initial tags are filtered and many concurrent tasks triggered to gather related information. zeetags are mapped to Twitter hashtags and a Twitter searches kicked off or to search terms that are fired off to google and other search engines. The same data gathering process brings in news feeds, Wikipeadia and many others. It's important to us that we're able to handle and co-ordinate all these asynchronous calls to external services while keeping the internal logic flow of the system very clean and maintainable.
How do you do the processing?
We use a dataflow concurrency model using immutable FP, but we're pragmatic enough to keep isolated sections of mutability around where it simplifies the job.. It makes some of our complex stream processing problem so more manageable and easier to reason about. Written this way, the code is easier to read, easier to write, and easier to maintain, which is vital given the speed at which we plan on adding new functionality.
The stream processing core logic boils down to composing functions that return either a successful transform or failure. We pimped the + operator onto functions so that a set of transformation steps can be written as step1 + step2 + step3 and so on. Other operators allow two steps to run in parallel or take the one that finishes first or fail the entire block when the first one fails. This very useful with some fail fast modes where we want to abandon the slow task if the fast fail triggers.
At the other end of the processing, once we've finally determined what zeetags we'll be delivering to our users, we then have the task of enriching that tag with other information. Parsing wikipedia articles for example is really messy, we take it as html that's so convoluted it could be classed as 'malformed' by every standard going. Even apparently simple tasks like pulling the first paragraph from an article is not easy. Tasks like handling disambiguation pages that are full of useful content, and tweaking redirection based on the ontology of discovered tags will also come into play here. There's a certain amount of fuzzy logic involved, though I sometimes feel that Terry Pratchett had it right when saying that he preferred the term "wooly thinking". That's just the nature of processing Wikipedia content, so it's hardly a surprise that Dbpedia also chose Scala to help them deal with the complexity of the problem.
Did you start from scratch in Scala?
No, a lot of the code base in zeebox started as Java code and has mostly now been converted to Scala. It's a lot easier working with a pure Scala code base than a mixed one. Much easier and you do not have to keep all the Java oddities in your head. The IDE's like it better too.
We are never afraid to use a Java library that is best in class, but we when we find a Scala equivalent it's always been easier to use, and easier to integrate. Idomatic scala tends to be immutable, which is great for concurreny, and free of nulls, which is much easier to reason with about. Java libraries come with no such niceties.
Scala lets us use better practices, using Options instead of Nulls for example. We got rid of all the Null handlers. A lot of the annotation driven stuff in Java goes away too. Everything gets smaller and easier to read. Going from Java to Scala we have seen a code reduction of around 75%, it's about a quarter the size. Part of it came from boilerplate reduction, while a lot came from the ability to use the functional paradigm. For example, when we have a request to external services the response will come back as a Future from multiple services. It is ridiculously simple to combine those Futures with a for-comprehension. Now you have a nice way to think about the asynch tasks as a linear sequential problem.
We love Scala's (and Akka’s) concurrency capability. We have found it adds real clarity using dataflow concurrency style programing implemented with chained futures. Scala offers a great big a la carte set of solutions. As a programmer I can look at a problem and decide which is the best way to solve it. Pick the right tool from my mental tool box and know I can implement it in Scala. That just makes things fun.
How do you say yes to Anthony's 'crazy' requests?
Of course we eat, sleep and breath agile. Every vertical surface in our office is covered in cards. But Scala is critical too. It's a very good fit for working in a lean agile style. I can't imagine any other way we'd pull it all together, especially not when so many of us have young families as well.
The real key to saying yes has been the passion for creating and maintaining as clean and readable a code base as possible. If the code base is clean and a request for change comes in then it is so so much easier to go in an make the necessary changes. We are heavy users of test driven development. The test frameworks available in Scala are amazing. I often use them as a showcase for what you can do with a DSL if you put your mind to it.
What about bring new people in?
Scala does have a lot of paradigms that Java programmers have to get used to. But that is balanced out by the fact that our code base is now so clean, so well documented with simple DSL expressions that anyone new can jump into and feel at home in the code base very quickly. No boilerplate to get in your way. What you see is what you get.
When a new person comes to the company they have to learn the language that company is using and then they have to learn the company’s terminology and business domain. Using Scala makes learning about the business domain much faster so it quickly pays back any time lost in learning Scala, three times over.
How do you use Akka?
We use the Akka actor model to support and scale all the user interaction.
As I explained, we create all the channel related information then we mix this information with presence info, which friends are on-line, what program are they watching, to build up a list of the most popular programs. We have potentially 1000's of thousands of individuals who are connected to programs, joining friends groups, chatting, browsing zeetags and so on. We have millions of individual small transactions that we must handle quickly and be able to scale to new services transparently. Akka's actor model was a beautiful fit. It can be scaled out effortlessly to any number of users as we grow and provides robust failure tolerant environment. We get reliable massive horizontal scaling and with cloud based resources we are confident we can grow to any number of users.
We start with 60 channels today and these will grow to many thousands especially as you bring in other countries and the video on demand content. With YouTube, for example, you may want to share a really cool viral video clip. Each YouTube clip is effectively a channel. We want to be able to give the same integration, the same topics information to pop up as for any other program.
What is next?
This has been the single most fun company I have ever worked for. The product is amazing, the code base is amazing the people are all at the top of their game and the future is sure to be exciting. With people like zeebox, Sky and the BBC using Scala the next adoption vector for it has got to be Media. It fits so well.