Google Summer of Code 2014 Scala Projects

Google Summer of Code

This year the Scala team applied again for the Google Summer of Code program to work with enthusiastic students on challenging Scala projects

This page provides a list of project ideas. The suggestions are only a starting point for students. We expect students to explore the ideas in much more detail, preferably with their own suggestions and detailed plans on how they want to proceed. Don’t feel constrained by the provided list! We welcome any of your own challenging ideas, but make sure that the proposed project satisfies the main requirements mentioned below.

How to get involved

The best place to propose and discuss your proposals is our “scala-language” mailing list. This way you will get quickly responses from the whole Scala community.

Previous Summer of Code

We encourage you to have a look at our Summer of Code 2010, 2011, 2012 and 2013 page to get an idea on what we and you can expect while working on Scala.

Project Ideas

Here are some project ideas for you. The list is non-binding and any reasonable project related to Scala that is proposed by a student will be thoroughly reviewed.


Miniboxing for Breeze and Spire

A very tricky question in compilers is how to translate high level generic code, such as classes with type parameters to optimized low level code. Researchers have proposed many translations in the last 25 years, yet we can’t claim to have a perfect solution. This is because generic code is uniform at a high level, but its low level translation requires optimized non-uniform representations to obtain good performance. Thus we have an inherent tension between uniformity and optimality.

In the context of the Java Virtual Machine, miniboxing is a new translation that sidesteps the shortcomings of previous approaches, while maintaing the optimality of the low level non-uniform code. It builds on the specialization transformation, which is already included in the Scala compiler, but produces too much bytecode to be useful in practical applications. Although miniboxing is developed for the Scala language, the same transformation could also be implemented in Java and in other JVM languages, such as IBM’s X10, JetBrains’ Kotlin and RedHat’s Ceylon.

The miniboxing transformation matches optimal performance in microbenchmarks, but in order to make a significant impact in the Scala community, it needs to be prove itself on large benchmarks, such as the spire numeric abstractions library and the breeze numeric processing library. In this context, the tasks for this project are:

  • develop the necessary mechanisms around the miniboxing plugin to allow running the spire and breeze benchmarks
  • identify slowdowns caused by the miniboxing translation and
  • propose and implement solutions to improve the performance.

This project requires familiarity with compilers (requirement: having taken at least a basic Compilers course), the Scala programming language (requirement: having taken the Functional Programming in Scala course) and with Java bytecode. Note that acceptance for this project is conditioned by the successful completion of a challenge, which will be explained once you apply for the project. A big plus is having contributed to OSS software written in Scala.

Resources

Keywords: generic code translation, miniboxing, specialization

Mentored by Vlad Ureche and David Hall.

MacroGL library

OpenGL is a cross-language, multi-platform API for rendering 2D and 3D computer graphics. The API interacts with a GPU to achieve hardware-accelerated rendering. OpenGL is used heavily in the industry and is constantly evolving. On the JVM, namely Scala and Java, you can currently use OpenGL by relying either on the JOGL library (Java OpenGL) or LWJGL (Lightweight Java Game Library). Both libraries are low-level bindings on top of OpenGL - these libraries are ports of the C-like OpenGL API to Java.

While this is highly configurable, programming with these low-level constructs may be cumbersome.

  • OpenGL state has to be manually set up before performing the desired task, and then reverted to previous state once the task is done
  • Creating OpenGL objects like buffers, textures or shader programs requires a large number of API calls and imperative state changes, while it could be done declaratively for typical programs
  • Communicating values to shader programs requires imperative API calls and dealing with the layout of shader variables in the OpenGL programs
  • Error states have to be queried manually and there are not exceptions being thrown, let alone error messages other than numeric error codes.

All this results in a lot of boilerplate code that is hard to understand, maintain and debug.

The MacroGL library is a high-level API that allows writing more structured, yet efficient OpenGL code. Here are some of its features.

  • It provides structured constructs for setting transformation matrices, OpenGL settings, textures, programs, buffers, and in general setting up the rendering pipeline

      for {
        _ <- enabling(GL_CULL_FACE)
        _ <- using.matrix(scene.camera.projectionMatrix)
        _ <- using.matrix(scene.camera.modelviewMatrix)
        _ <- using.texture(GL_TEXTURE0, shadowChannel)
        b <- using.framebuffer(screenBuffer)
        _ <- b.attachTexture2D(GL_COLOR_ATTACHMENT_0, screenChannel, 0)
      } objectBuffer.render()

    These for-comprehensions use Scala Macros to inline the state-handling code and avoid object allocations for closures.

  • It prefers convention over configuration when declaratively instantiating OpenGL objects like shaders:

      val glowFilter = new Program("glowfilter")(
        Program.Shader.Vertex(resources("glowfilter.vert")),
        Program.Shader.Fragment(resources("glowfilter.frag"))
      )
  • It allows easy communication between the Scala program and the shaders:

      glowFilter.uniform.width = width
      glowFilter.uniform.height = height
  • Supports Vector and Matrix types

  • easier error querying

  • easy access to many rendering pipeline settings

The current MacroGL functionality is, however, restricted to core OpenGL features. We would like to extend the library to completely cover the OpenGL API. Some of the features missing:

  • texture types - 1D, 3D, cube maps, texture arrays, mipmapping
  • tessellation and geometry shaders
  • implementing missing buffer types and consolidating the existing buffer types
  • facilities efficiently uploading images to textures
  • we want to use the quasiquoting features of Scala 2.11 to simplify macros in MacroGL and bring them more up-to-date
  • optional exception throwing when an error state is detected, governed by ErrorPolicy typeclasses (must be turned off for performance after development)
  • structured constructs for geometry rendering
  • allow a richer set of uniform variable types for shaders (currently MacroGL API only supports integers, floats, float vectors and float matrices)
  • avoiding boxing in some places, such as setting uniform variables for shaders
  • implementing more Vector and Matrix types (e.g. integers), divide them into mutable and immutable versions
  • parsing common 3d model formats and producing buffers

The goal of this project is to provide a set of higher-level abstractions and a Scala-based API that works with JOGL, while in the same time retaining most of the performance and efficiency of JOGL. This should allow Scala programmer to write safer and more reliable OpenGL code which is also easier to understand and maintain.

The final goal is to produce and document a usable, concise and powerful open source library for writing OpenGL programs more efficiently in Scala. Deliverables are the following:

  • implemented most of missing features described above and merged them into the main distribution

  • MacroGL ported from Scala 2.10 to Scala 2.11, leaving 2.11 as the development version

  • detailed documentation in form of a GitHub page, containing API docs, overview of different features with descriptions, a getting started guide, and several example use cases

Mentored by Aleksandar Prokopec

Generic XML interpolator

Scala features XML literals which desugar to calls of the XML library. The support for XML literals is part of the parser and the compiler, and are bound to a single desugared API. Scala 2.10 added support for generic, customizable string interpolators. Standard ones are s"foo = $foo" or f"bar = $bar%.2f". Quasi-quotes, added in Scala 2.11, show that these interpolators can be implemented by macros to support compile-time checking of the syntax inside an interpolator.

The goal of this project is to design and implement a macro-based generic XML string interpolator. It should be primarily designed as a replacement for the XML literal syntax of Scala, and hence would provide at least all the features of that syntax. For example, the XML literals

val name = "John"
val span = <span>{ name }</span>
val node = <div>{ span }</div>
node match { case <div><span>{ inner }</span></div> => inner }

could be rewritten as

val name = "John"
val span = xml"<span>$name</span>"
val node = xml"<div>$span</div>"
node match { case xml"<div><span>$inner</span></div>" => inner }

As an additional requirement, this interpolator should be generic with respect to the actual API that is called by the generated code, called the “back-end”. As a possible example, on Scala.js we would like to have XML literals that create DOM elements directly instead of scala.xml objects. The details on how to make the interpolator generic are not specified, and you are encouraged to come up with a smart idea to solve that requirement. You are not required to actually provide a second back-end besides that targeting the scala.xml API, but your implementation should allow others to do so without modifying the code.

Required skills:

  • Good working knowledge of Scala
  • Good knowledge of XML
  • Basic knowledge of def macros

Mentored by Sébastien Doeraene

Slick: Type-checking of plain SQL

Slick supports the use of Plain SQL queries as a thin execution layer on top of JDBC. This requires that users not only write their own SQL code but also know the result type of the query and how to map that back to Scala types. There is also no compile-time checking for these kinds of queries.

The goal of this project is to alleviate both issues with the use of macros. Plain SQL queries based on string interpolation have all the necessary information (query string, parameter types) available at compile-time to allow a macro-based implementation to send the query to a database for verification, get the result type from the database, and translate that back into a Scala type. As an implementer of such a feature you will have to come up with practical solutions for configuring the database access for the macro, caching the database results for good performance, and ensuring that the solution would be usable in environments that cannot expand the macro (e.g. some IDEs). You will have to work towards these goals with the Slick team and possibly the Scala Macros implementers.

Related work has been done previously in sqlTyped.

Mentored by Stefan Zeiger

Runtime Expansion for Scala Macros

Scala macros are functions called by the compiler during program compilation. Since their inception in Scala 2.10, macros are widely used in libraries and domain-specific languages to analyze and generate code.

There’s always been a number of interesting extensions to our macro system that we planned to experiment with, and here’s one of them. Even though macros look and quack like normal methods, they are not first-class. This brings problems when compile-time information alone is not sufficient to perform macro expansion, preventing macros from being effectively used for runtime code generation. The main goal of this project is to remove this restriction and implement runtime macros - functions that produce executable code during program runtime using familiar macro APIs.

When working on the project you will:

  • Design syntax for designating runtime-expandable expressions in code
  • Figure out changes to the macro engine and reflection APIs that are necessary to enable runtime macro expansion
  • Experiment with and deliver runtime macro expansion for Scala macros

Beware! There’s already been a lot of applications for this project! Luckily there are other macro-related ideas to hack on during GSoC. Contact Eugene for more information.

Mentored by Eugene Burmako

Refactor Scaladoc’s Scalac dependencies

Scaladoc currently has strong dependencies on the Scala compiler’s internal classes. Large parts of it could be refactored to use reflection instead in order to loosen its ties with the compiler. Part of this task is identifying the smallest remaining required interface to the scala compiler classes.

Mentored by @cvogt http://people.epfl.ch/christopher.vogt

Refactor Scaladoc to allow generating generate other documentation format

Scaladoc currently only generates the default html format. Other formats could be several of single file plain HTML, multi file plain HTML, pdf, CHM, dash doclet… This requires cooperation with the other Scaladoc project. This could help: https://github.com/szeiger/extradoc This is Scala compiler related project.

Mentored by @cvogt http://people.epfl.ch/christopher.vogt

Projects with Breeze

Breeze is a high-performance linear algebra library for Scala that uses . The ideas here generally require a fairly advanced knowledge of Scala. In general, experience with frameworks like NumPy or languages like R, MatLab, and Julia would be immensely valuable as well. Other projects related to numerical computing and linear algebra are encouraged. Please introduce yourself on the group’s mailing list with any ideas.

All Breeze projects are mentored by David Hall

Multi-dimensional Arrays

Breeze currently only supports 1-dimensional vectors and 2-dimensional matrices. Frameworks like NumPy support arbitrarily dimensioned arrays. This is easier in dynamically typed languages that don’t enforce correctness at compile time, because Scala does not provide mechanisms for abstracting over the “arity” of types. Nevertheless, we’d like to extend support to arbitrarily dimensioned arrays. This project would involve the creation of a new high-order array type, with keys based on a flexible data type, like Shapeless HLists, as well as support functions for manipulating these arrays, with the ultimate goal being parity with NumPy’s multidimensional arrays.

More concretely, the student would create the new data structure, along with array creation (e.g. zeros, ones) and manipulation routines (e.g. roll, ravel), as well as operator (e.g. addition and subtraction) and universal function (e.g. exp, sum) support, following the design patterns used in Breeze.

The ideal student would have a strong command of Scala, familiarity with NumPy, and some familiarity with Shapeless. The student would work closely with the maintainers of Breeze. Interested students should introduce themselves on the group’s mailing list.

Spire/Algebird Interoperability

There are currently a number of different math libraries for Scala. Spire (and, to a lesser extent, Algebird) have fairly fleshed out algebraic hierarchies for Rings, Semirings, Monoids, and the like. Breeze has its own parallel hierarchy, but it is less developed. In this project, the student would bridge these different libraries, either by integrating Spire’s data types into Breeze, or by introducing implicit conversions to provide an interoperability layer. Students should have an intermediate knowledge of Scala (e.g. how to write implicits to provide implementation), and some familiarity with or interest in abstract algebra (i.e. what a ring is, and how it relates to a field). The deliverables for this project would include

  1. a proposed plan for either interoperability and integration with Spire and
  2. an implementation of the plan.

If time permits, we would then consider adding interoperability with other libraries likes Algebird. The student would work closely with the maintainers of Breeze. Interested students should introduce themselves on the group’s mailing list.

GPU Extensions for Breeze

GPUs have recently become popular for scientific computing due to their high performance and ubiquity. In this project, the student would extend Breeze to provide wrappers for third-party GPU computing libraries like JCublas or JavaCL. Breeze is designed to be flexible enough to support this kind of extension, so this project should not require changing too much of the core of Breeze, though of course it is a large undertaking nonetheless. The student would propose a design, backed by either of the two implementations, and then an implementation for common linear algebra operations, along with a design document for extending their work.

This project requires intermediate knowledge of Scala (e.g. how to write implicits to provide implementation), access to either a CUDA- or OpenCL-compatible device, and some familiarity with GPU computing. The student would work closely with the maintainers of Breeze. Interested students should introduce themselves on the group’s mailing list.

Visualization Library

Languages for scientific computing are often accompanied by a great library for visualizing data. In this project, the student would design and implement such a framework for Scala and Breeze. There is some work in this direction, but this project would require a strong design sense from the student. Students should have an intermediate knowledge of Scala, and experience using or implementing visualization libraries like matplotlib, ggplot2, or gadfly.

The deliverables include a specification of the overall design (e.g. API use cases, proposed architecture and backend), a proof-of-concept implementation that can be fleshed out into a more complete library, and a tutorial for using this library. The student would work closely with the maintainers of Breeze. Interested students should introduce themselves on the group’s mailing list.

Requirements and Guidelines

General Student Application Requirements

This is the fifth time the Scala project has applied to the Summer of Code, and from last years experience, increased popularity of the language and stories of other mentor organizations we expect a high number of applications. First, be aware of the following:

  • Make sure that you understand, fulfill and agree to the general Google Summer of Code rules
  • The work done during GSoC requires some discipline from the students as they have to plan their day-to-day activities by themselves. Nevertheless we expect regular contact with the mentors by the usual forms of communication (mail, chat, phone) to make sure that the development is going according to the plan and students don’t get stuck for weeks at a time (3 months may seem long, but in reality it is very easy to run out of time).
  • The official SoC timetable mentions May 19th as the official start of coding, but if you have time you are encouraged to research your proposals even before that (and definitely learn the basics of Scala, if you haven’t done that already).

Student Application Guidelines

  • Student proposals should be very specific. We want to see evidence that you can succeed in the project. Applications with one-liners and general descriptions definitely won’t make the cut.
  • Because of the nature of our projects students must have at some knowledge of the Scala language. Applicants with Scala programming experience will be preferred. Alternatively, experience with functional programming could suffice, but in your application we want to see evidence that you can quickly be productive in Scala.
  • You can think of Google Summer of Code as a kind of independent internship. Therefore, we expect you to work full-time during the duration. Applicants with other time commitments are unlikely to be selected. From our previous experience we know that students’ finishing their studies (either Bachelor, Master of PhD) are likely to be overwhelmed by their final work, so please don’t be too optimistic and carefully plan your time for the project.
  • If you are unsure whether your proposal is suitable, feel free to discuss it on our “scala-language” mailing list. We have many community members on our mailing list who will quickly answer any of your questions regarding the project. Mentors are also constantly monitoring the mailing list. Don’t be afraid of asking questions, we enjoy solving puzzles like that!

General Proposal Requirements

The proposal will be submitted via the standard web-interface at http://www.google-melange.com/gsoc/homepage/google/gsoc2014, therefore plain text is the best way to go. We expect your application to be in the range of 700-1500 words. Anything less than that will probably not contain enough information for us to determine whether you are the right person for the job.

Your proposal should contain at least the following information, but feel free to include anything that you think is relevant:

  • Please include your name (weird as it may be, people do forget about it)
  • Title of your proposal
  • Abstract of your proposal
  • Detailed description of your idea including explanation on why is it innovative (maybe you already have some prototype?), what contribution do you expect to make to the Scala community and why do you think your project is needed, a rough plan of your development and possible architecture sketches.
  • Description of previous work, existing solutions (links to prototypes, bibliography are more than welcome!)
  • Write us about yourself and convince us that you are the right person for the job (linking to your resume/CV is good but not sufficient) * Mention the details of your academic studies, any previous work, internships * Any relevant skills that will help you to achieve the goal (programming languages, frameworks)? * Any previous open-source projects (or even previous GSoC) you have contributed to? * Do you plan to have any other commitments during SoC that may affect you work? Any vacations/holidays planned? Please be specific as much as you can.
  • Contact details (very important!)