- About Scala
- In the Enterprise
- Scala Community
- Language Research
- In the Press
- The Scala Team
- Scala's Prehistory
- Contact Us
- Learning Scala
- Tour of Scala
- Scala API
- Setup & Getting Started
- Programming Guides
- Other Guides
- Code Examples
- Scala Developers
Time for a New Generation of ORM?
Thu, 2011-09-01, 15:48
I'm admittedly a Scala Noob, but I'm not a noob to ORMs or enterprise/internet/cloud development.
I think its time for a new generation of ORMs. I did a review of the existing Scala ORMs, and while there are some good ideas, none of them really kick ass. Before people start chiming in with "such and such is good", let me say that I'm a big picture guy, and part of the problems with all of the ORMs is that they take too narrow a view. I don't really want an ORM or a persistence mechanism, I want something beyond that, sort of a generalized persistence+transactional+recovery architecture.
Right now, I'm just looking for people to chime in with "yeah, it would be nice if we had this..."
My minimum features for a new generation of ORM in addition to the basic idea of persisting objects somehow:
1. Don't assume relational.
This new generation should be as comfortable with a NoSQL store of any flavor (document, XML, key value, column) as it is with a relational database. Flat files should even be ok. If we remove relational, that makes it an OM, which is appropriate because I'm seeking coding nirvana.
2. Support caching/cache invalidation on save from the get go: Terracotta, Infinispan, memcached.
It's easy to implement a cache, the hard part is implementing the invalidation. But an ORM is itself inherently a cache because the millisecond after you receive a record from the database someone else could have changed it. But everyone has a product table that's a lot smaller than their user table, it makes sense to cache it, and have a mechanism for invalidating it. The challenge is not just caching the individual objects, but caching the relationships and entire tables.
3. A new scalable paradigm for transactions/locking.
How we got here:
There's nothing new under the sun. Transactions on databases were created to solve a concurrency problem. The original intent was good, and for those systems it worked well. You can look at the TPC-E benchmark for how much you can do at the database layer. In essence, the old fashioned way was to move the code down to the datastore level, and expose a set of atomic operations at that level.
That is, for the classic "update the bank account balance" example of how to do transactions, you can do that with a stored procedure and expose an atomic interface to all. Which worked ok in the past, and the TPC-E benchmark above does that writ large: it's a stock brokerage implemented mostly in SQL.
The computing world has moved on though, and it can be almost impossible to code the services you need at the SQL level. We need objects! So People muddle about with "optimistic locking" and other paradigms that attempt to replicate compare-and-swap. If you get an optimistic locking failure though, you have to catch it at the appropriate place, and generally you need to re-attempt at the appropriate place.
The way forward:
I hinted at this above, to deal with concurrency we need to think in terms of exposing services on our data. EJB/JPA all that encourages you to really ignore concurrency problems rather than thinking about them up front. When you do think about them, you don't necessarily have a lot of solutions available; mostly it may just be "hey that EJB call I just did? Try that again, maybe we'll get lucky this time!"
But with NoSQL, failures may not happen until replication time.
Plus, in the real world, life is messy. Bank balances have "overdrafts", warehouses have "wastage" on their inventory. A bank account balance is really cache(sum(transactions.amount)). So our bank account balance example has always been somewhat fake.
So the solution I foresee to all this is the ability to encapsulate the data changes we want to make with the recovery steps, and to describe the changes we want at a higher level. Cassandra has added "counter" columns to deal with things like the bank balance problem above; the replication knows that the balance column is a sum of the transactions, so as transactions get replicated collisions are dealt with because both transactions go in, and the bank balance is calculated using both.
That is, we code the balance update as a command against the account balance and such commands are integrated into the OM. Such a command can operate in a completely transactionless fashion, because we're only adding the adjust_balance row to the account command table, which should always succeed. Reconciling colliding commands is an explicit step to creating a command, and part of the OM architecture for creating a command.
What do you all think? Are you frustrated with your tools? What problems do you want to see solved.