Wednesday 20 March 2013
This year the Scala team applied again for the Google Summer of Code program to work with enthusiastic students on challenging Scala projects and got accepted!
- What is Google Summer of Code
Google invites students to come up with interesting, non-trivial problems for their favourite open-source projects and work on them over the summer. Participants get support from the community, plus a mentor who makes sure you don’t get lost and that you meet your goals. Aside from the satisfaction of solving challenging problems, students get paid for their work. But wait, there’s more! Successful participants also receive an official Google Summer of Code t-shirt! This is an incredible opportunity to get involved in the Scala community and get helpful support along the way.
- Project Ideas
Below we have collected a list of project ideas. The suggestions are only a starting point for students. We expect students to explore the ideas in much more detail, preferably with their own suggestions and detailed plans on how they want to proceed. Don’t feel constrained by the provided list! We welcome any of your own challenging ideas, but make sure that the proposed project satisfies the main requirements mentioned here below.
- How to get involved
The best place to propose and discuss your proposals is our “scala-language” mailing list (scala-language @ Google Groups, instructions to subscribe are available at http://www.scala-lang.org/node/199#scala). This way you will get quickly responses from the whole Scala community.
- Previous Summer of Code
Please make sure to read carefully the instructions at the end of this page for requirements and submission details.
Here are some project ideas for you. The list is non-binding and any reasonable project related to Scala that is proposed by student will be thoroughly reviewed.
Prototype an extension of for-comprehensions for Scala
The to goal of this project is to extend the desugaring of for-comprehensions in the Scala compiler to support additional operations like grouping or sorting. Similar work has been done for Haskell (see http://research.microsoft.com/en-us/um/people/simonpj/papers/list-comp/). The optimal design for this is not clear yet. We have some written documentation of our past thoughts on the topic you can use as a basis for your own ideas (drafts SIP and SIP alternative). Your implementation along with a SIP (Scala Improvement Process) proposal, should serve as a proof of concept that could lead to the addition of this feature to the Scala language. We want to work with you in an agile, ticket-based development style with frequent communication and coordination.
Alternative / social Scala documentation
This project has two parts. First you will work on generating alternative api documentation formats for Scaladoc. You will need to hook into the scaladoc part of the scala compiler, to access and work with Scaladoc’s code model. We have a small prototype you can use as a starting point. You should then generate documentation in (non-framed, non-js) single/multi-page HTML, PDF, CHM formats. The result should make sensible use of the scaladoc @-commands and you should think about a good way to present apis designed as in cake patterns. HTML-Formats should support external linking to methods. You can use a documentation generator such as Sphinx to help generating different formats. The second part of this project concerns collaboration. You will need to come up with smart and simple ideas and an implementation how to make Scala api documentation more social, favorably in a way that can be shared between the different HTML doc formats. It can be as simple as adding GitHub Links and Disqus or more involved. Make sure to also review the work that has been done on Colladoc. We are very open for your own new or alternative ideas regarding any aspect of this project. Good knowledge of modern, accessible HTML/CSS and possibly JS will be helpful. We want to work with you in an agile, ticket-based development style with frequent communication and coordination.
Scala Parallel Collections are a data-parallel shared-memory programming framework that was integrated into the Scala Standard Library in version 2.9. http://docs.scala-lang.org/overviews/parallel-collections/overview.html http://infoscience.epfl.ch/record/150220/files/pc.pdf
Specializing parallel collections with customized work-stealing and Scala Macros
The goals of this project are twofold. First, the Parallel Collections scheduler must be extended with work-stealing customized for specific collections. This work-stealing should be based on the work-stealing tree scheduling which we currently have a prototype of. Second, the Scala Macro system should be integrated with the Parallel Collections to generate more efficient operation versions at the callsite where there is more information about the collection type, element type and other parallel operation arguments like higher-order functions. It is desirable that the student is knowledgeable in the Scala programming language and its collections model, has knowledge about concurrent lock-free programming and parallel programming, Java memory model, and has an interest in data-structures in general. Knowledge about the JVM, JIT compilation, benchmarking and the JVM performance model is a plus. Some of the tasks in this project are:
- replace parallel operation implementations with macro implementations that generate specialized operation instances at the callsite
- specialize parallel operation instances for the concrete data-structures
- tweak and verify performance on different architectures
- implement a comprehensive test suite
- implement a comprehensive microbenchmarking suite with performance regression tests
- implement several bigger benchmark applications
- investigate API improvements and missing parallel operations to support a wider range of problems more effectively (asynchronous execution, in-place bulk modification, etc.)
Background: Currently, the Scala Parallel Collections incur some abstraction penalties, that lead to boxing, virtual calls, and reliance on iterators. With the arrival of Scala Macros, there exists a plethora of optimization opportunities at the library level that allows us to eliminate these inefficiencies. Orthogonally, the scheduling based purely on the fork-join framework (http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html) suffers from several abstraction penalties, and needs to be specialized to achieve better performance and load-balancing.
Schema manipulations for Slick
The goal of this project is to extend Slick to support database schema manipulations (that could be used for writing migration scripts for example). You need to add an API for manipulating existing database schemas (alter columns, change constraints, etc. similar to Rails or http://code.google.com/p/scala-migrations/). In coordination with us, you will have to come up with a sensible set of manipulation operations to be supported and a strategy how to integrate them into Slick considering existing code and code, which is still in development. We want to work with you in an agile, ticket-based development style with frequent communication and coordination.
Data migration tool
The goal of this project is creating a tool for managing data migration scripts based on Slick. We have a rough proof of concept implementation you can base work on at https://github.com/cvogt/migrations . The idea is roughly similar to migrations/evolutions in Ruby on Rails or Scala Play 2. You will need to significantly improve (or re-write) the code of the proof of concept implementation to make it production ready, extend it by futher features and explore some exciting ideas like reliable migrations in a development scenario with branches (like git branches). Your ideas are important to this project. Here is how we see it right now:
Existing features (which all require revision)
- migration scripts written in Slick/SQL/Scala
- diagnostic features for reviewing scripts and database
- flexible Scala code-generation for case classes, Slick table objects and custom user code related to the data model
- make it work with all Slick backends (the proof of concept is hard-coded to h2)
- make it as production ready as possible
- improve it based on comparison with other tools and potential user feedback gathered via mailing list, etc.
New feature ideas
- dumping Slick migration scripts into SQL scripts (desired by potential users)
- integration with SQL-file-based migration tools like Play
- a version compatibility scheme between generated code and database schema versions
- support for a convenient work flow for working with databases and git branches
- support for common industry use cases (like pre, during, post shutdown migration scripts)
- < your ideas here >
What features you need to implement exactly is open and needs to re-evaluated as we go in discussion with you and based on potential user feedback. We want to work with you in an agile, ticket-based development style with frequent communication and coordination.
Scala Library as a Deep EDSL Module
Deep EDSLs intrinsically trade user experience for performance improvements. The Scala library Yin-Yang  allows both good user experience and high-performance by keeping parallel implementations of shallow and deep embeddings. Transparent and reliable transformation from the shallow to the deep embedding allows programmer friendly development with the shallow EDSL and good performance in production with the equivalent deep EDSL.
Duplication of code in the parallel shallow embedding requires additional user effort and can lead to diverging semantics of DSLs. In this project we would utilize the implementations of existing Scala libraries/DSLs to generate the adequate deep embeddings. Yin-Yang should be modified to generate high-level IR nodes from existing libraries/DSLs and generate the DSL IR transformers that will lower those high-level nodes into their implementations. The transformers would be generated from shallow implementations by the existing Yin-Yang transformation. Furthermore, the side-effect information  will be collected from the Scala annotations and introduced into the deep embedding. The generated DSLs would use the Lightweight Modular Staging  as the back-end compiler for the DSLs.
With this project the deep DSL embedding would be greatly simplified. The DSL author would provide the shallow embedding with minimal number of annotations and Yin-Yang >iii) code generation. Finally, the DSL author would provide optimizations on the high-level IR.
The deep embedding of run-time compiled DSLs can further improve performance by utilizing the run-time collected data for optimizations. However, when this data changes the costly recompilation must be performed greatly increasing delay. Another goal of this project would be to investigate the DSL author guided JIT compilation of DSLs. With JIT compilation the run-time compiled DSLs could be utilized in latency critical programs that operate on smaller data (e.g. web servers, compilers, GUI applications).
The grand vision for the project is to generate the DSL modules from the whole Scala Library and provide DSL optimizations for them. These modules could then be used in critical paths of all Scala projects. Due to the magnitude of the project the GSoC candidate would focus on one well defined part.
Skills required for this project are:
- Good knowledge of Scala:
- Scalable Components
- Implicit conversions
- Basic knowledge of DSL embedding in Scala
 Yin-Yang: Transparent Deep Embedding of DSLs - https://infoscience.epfl.ch/record/185832
 Lightweight Polymorphic Effects - http://infoscience.epfl.ch/record/175240
 Lightweight Modular Staging - http://scala-lms.github.com/
Constraint Solver in Scala
The goal of this project is to develop a reasonably efficient constraint solver in Scala.
You will build on a SAT solver we are currently releasing and will add incremental
SAT solving capabilities as well as reasoning about functional data structures.
Full project description: http://lara.epfl.ch/w/solver
- Kaplan paper: http://lara.epfl.ch/~kuncak/papers/KoeksalETAL12ConstraintsControl.html
Script Tracing in Kojo
This feature will allow users to trace programs within Kojo - to get a better understanding of what/how/when the programs do.
High Level Specification
- There are two new UI elements - a trace button and a trace window.
- You click on the trace button to trace a program. Trace output goes into the trace window in the form of a trace history.
- When a trace run is over (this happens when a program finishes or you stop it), you can go over to the trace history to see the different trace points in the program. These are things like the entry to and exit from every procedure (command or function).
- When you click on a trace point, the corresponding source code line is highlighted in the editor, the canvas reverts to its state at that point, and you can view the value of all the visible variables/values at that point in time.
- You can move back and forth across the trace history, and jump to any point in the history.
- Good working knowledge of Scala.
- Familiarity with Java and the JVM.
- Familiarity with Swing (the Java GUI framework).
- Familiarity with 2D graphics.
- Experience with JPDA.
- Experience with Debugger Implementation.
Sprite Enhancements in Kojo
This set of features will make it easier for users to write sprite based games, animations, and cartoons within Kojo.
Kojo has historically supported drawing and animation primarily through Vector graphics. This has included the command oriented Turtle graphics and the functional Picture graphics.
In recent times, Kojo has acquired good support for sprite based animation. This has involved:
- Refinements related to multiple turtles/sprites.
- Support for specifying sprite costumes.
- Support for cycling through a sequence of sprite costumes during an animation.
- Refinements to the translation, rotation, and scaling of sprites.
High Level Specification
The next set of required sprite features, which are the goal of this GSOC project, include:
Sprite Collision Detection. This involves:
- Implementing image edge detection for sprite images (to obtain a vector representation of the sprite image boundary)
- Leveraging the vector collision detection support in Kojo to do collision detection for sprites.
- Making this feature available via the Kojo API, so that it is available in games/animations.
- Sprite Messaging - to allow sprites to communicate with each other.
- Sprite ‘Speaking’ and ‘Thinking’ - to enable the display of speaking/thinking bubbles next to sprites for specified durations.
The Scratch project (http://scratch.mit.edu) is a good source of ideas in this area.
- Good working knowledge of Scala.
- Good working knowledge of 2D graphics.
- Familiarity with Java and the JVM.
Friendlier Error Messages in Kojo
This feature will make it easier for users to understand and recover from syntax errors in their scripts.
One of the areas that needs improvement within Kojo relates to the complexity of the error messages that show up for scripts with syntax errors.
There are two approaches to trying to solve this problem:
- Write custom Scala parsers for different subsets of Scala (Level 1, Level2, etc) that output suitable error messages for Kojo.
- Annotate the current error messages with helpful (in the context of Kojo) text. This approach can potentially be very powerful; it can analyze the current code to determine what the user is trying to do, the error location to determine the context of the error, and the error message to determine what went wrong. It can then combine these three elements to help the user understand and fix the error.
- Good working knowledge of Scala.
- Good working knowledge of Parser writing.
- Familiarity with Swing (the Java GUI framework).
SubScript is a way to extend common programming languages aimed to ease event handling and concurrency. Typical application areas are GUI controllers, text processing applications and discrete event simulations. SubScript is based on a mathematical concurrency theory named Algebra of Communicating Processes (ACP).
You can regard ACP as an extension to Boolean algebra with ‘things that can happen’. These items are glued together with operations such alternative, sequential and parallel compositions. This way ACP combines the essence of grammar specification languages and notions of parallelism.
Adding ACP to a common programming language yields a lightweight alternative for threading concurrency. It also brings the 50 year old but still magic expressiveness of languages for parser generators and compiler compilers, so that SubScript suits language processing. The nondeterministic style combined with concurrency support happens to be very useful for programming GUI controllers. Surprisingly, ACP with a few extras even enables data flow style programming, like you have with pipes in Unix shell language.
Currently a SubScript extension to Scala is available. This comes with a branch of the Scala compiler, a run-time library, support for the Scala-Swing reactive framework and example programs. The “C” part of ACP is not yet supported.
The project would involve some of the following tasks:
- develop use cases, both for client-side and server-side applications
- define a strategy to send over SubScript in HTML pagesand have it translated
Enhance Akka using SubScript
Akka is the Scala actor implementation, very useful for distributed functions. Typically an actor operates a state machine, which is programmed using state variables. This is relatively inconvenient to program and read. SubScript may provide a better alternative for programming actor internals. This project would involve:
- develop typical actors in two versions: just Scala and SubScript
- compare these versions in terms of clearness and conciseness
- measure the performance of these versions
- make a tutorial
More information on SubScriptActors is available at http://subscript-lang.org/subscript-actors/.
Generic Tasks & Deliverables In each project the student should
- investigate the problem area: what are current practices: inspect open source projects, how could SubScript help
- build software
- make a tutorial
- measure performance
- discuss the pros and cons, and make recommendations for future work
Each project has the following generic deliverables:
- a working plan
- a report
- a GitHub repository
- a Linux VM containing: a complete development environment with build scripts, all developed software in sources and executables, test data, a console with a command history for building and testing
Full project descriptions available at http://subscript-lang.org/student-projects/
Saddle is a new high-performance data manipulation library for Scala. Saddle provides array-backed, indexed, one- and two-dimensional data structures; vectorized numerical calculations; automatic data alignment; and robustness to missing values. Saddle aims to be the easiest and most expressive way to program with structured data on the JVM. Saddle is missing features in a few key areas that the interested student could help implement. The following ideas touch on all aspects of Saddle.
- CSV (comma-separated value) data output
- specialization on Float and Short numeric types
- implement sparse matrix data structure
- implement diehard random number tests for marsiglia xorshift prng
- extend test coverage using ScalaCheck
- prototype and implement data visualization using D3/NVD3
The student would learn about efficient numerical computing on the JVM, as well as advanced Scala features such as specialization, the typeclass pattern, Java-Scala integration, and D3/NVD3. The interested student would already have a decent working knowledge of Scala and the mechanics of the JVM.
Ensime Plugin for Sublime Text
The Sublime-Ensime plugin enhances the Sublime Text 2 editor with Scala-specific error highlighting, autocompletion and many other features. With about 1.5K downloads, and many users sucessfully using it, we are keen on introducing additional features to provide comprehensive integrated experience. We also expect that users will demand the plugin for Sublime Text 3, but this requires refactoring the plugin, preparing it for running in Python 3.3 and possibly integrating with more recent features such as symbol indexing.
The ideal candidate is fluent in both Python and Scala and currently uses the Sublime Text editor frequently. We also welcome candidates who are active and engage the community via issue reports, mailing lists and forums.
The project will have three phases:
- Develop several features for the plugin: a) finish the debugger, b) implement refactorings provided by Ensime,
c) add support for multiple projects. This phase of the project will target Sublime Text 2, so that all the users of the plugin
will benefit from your contributions.
Port the plugin to Python 3.3 and Sublime Text 3, possibly taking advantage of the features introduced in the new version. A guide can be found on the Sublime Text 3 webpage. There’s also the ST3 branch, which has already ported a bulk of functionality.
Create a github (markdown) for the project detailing the plugin installation and usage and a screencast of using the new plugin.
Before applying, please choose one of the bugs in the current plugin (https://github.com/sublimescala/sublime-ensime/issues?state=open) and try to fix it, so you see how the plugin works. We do accept partial fixes and simple workarounds, but please do try to fix it yourself.
Sublime Text 2 API Reference (http://www.sublimetext.com/docs/2/api_reference.html);
Sublime-Ensime repository (https://github.com/sublimescala/sublime-ensime) and the README (https://github.com/sublimescala/sublime-ensime/blob/master/README.md) with detailed instructions on installing;
Sublime-Ensime Issue tracker (https://github.com/sublimescala/sublime-ensime/issues?state=open);
A work in progress Sublime-Ensime hacker’s guide (https://github.com/VladUreche/sublimescala.github.com/blob/hacker-guide/hacker.md).
Contact details for the plugin project
Requirements and Guidelines
General Student Application Requirements
This is the second time the Scala project has applied to the Summer of Code, and from last years experience, increased popularity of the language and stories of other mentor organizations we expect a high number of applications. First, be aware of the following:
- Make sure that you understand, fulfill and agree to the general Google Summer of Code rules
- The work done during GSoC requires some discipline from the students as they have to plan their day-to-day activities by themselves. Nevertheless we expect regular contact with the mentors by the usual forms of communication (mail, chat, phone) to make sure that the development is going according to the plan and students don’t get stuck for weeks at a time (3 months may seem long, but in reality it is very easy to run out of time).
- The official SoC timetable mentions June 17th as the official start of coding, but if you have time you are encouraged to research your proposals even before that (and definitely learn the basics of Scala, if you haven’t done that already). Note that the official time for coding has now shifted more towards summer and autumn (rather than spring and summer).
Student Application Guidelines
- Student proposals should be very specific. We want to see evidence that you can succeed in the project. Applications with one-liners and general descriptions definitely won’t make the cut.
- Because of the nature of our projects students must have at some knowledge of the Scala language. Applicants with Scala programming experience will be preferred. Alternatively, experience with functional programming could suffice, but in your application we want to see evidence that you can quickly be productive in Scala.
- You can think of Google Summer of Code as a kind of independent internship. Therefore, we expect you to work full-time during the duration. Applicants with other time commitments are unlikely to be selected. From our previous experience we know that students’ finishing their studies (either Bachelor, Master of PhD) are likely to be overwhelmed by their final work, so please don’t be too optimistic and carefully plan your time for the project.
- If you are unsure whether your proposal is suitable, feel free to discuss it on our “scala-language” mailing list (registration instructions are at http://www.scala-lang.org/node/199#scala). We have many community members on our mailing list who will quickly answer any of your questions regarding the project. Mentors are also constantly monitoring the mailing list. Don’t be afraid of asking questions, we enjoy solving puzzles like that!
General Proposal Requirements
The proposal will be submitted via the standard web-interface at http://www.google-melange.com/gsoc/homepage/google/gsoc2013, therefore plain text is the best way to go. We expect your application to be in the range of 700-1500 words. Anything less than that will probably not contain enough information for us to determine whether you are the right person for the job.
Your proposal should contain at least the following information, but feel free to include anything that you think is relevant:
- Please include your name (weird as it may be, people do forget about it)
- Title of your proposal
- Abstract of your proposal
- Detailed description of your idea including explanation on why is it innovative (maybe you already have some prototype?), what contribution do you expect to make to the Scala community and why do you think your project is needed, a rough plan of your development and possible architecture sketches.
- Description of previous work, existing solutions (links to prototypes, bibliography are more than welcome!)
- Write us about yourself and convince us that you are the right person for the job (linking to your resume/CV is good but not sufficient)
- Mention the details of your academic studies, any previous work, internships
- Any relevant skills that will help you to achieve the goal (programming languages, frameworks)?
- Any previous open-source projects (or even previous GSoC) you have contributed to?
- Do you plan to have any other commitments during SoC that may affect you work? Any vacations/holidays planned? Please be specific as much as you can.
- Contact details (very important!)