This page is no longer maintained — Please continue to the home page at www.scala-lang.org

Time for a New Generation of ORM?

15 replies
Pierce Wetter
Joined: 2011-09-01,
User offline. Last seen 42 years 45 weeks ago.

   I'm admittedly a Scala Noob, but I'm not a noob to ORMs or enterprise/internet/cloud development. 
   I think its time for a new generation of ORMs. I did a review of the existing Scala ORMs, and while there are some good ideas, none of them really kick ass. Before people start chiming in with "such and such is good", let me say that I'm a big picture guy, and part of the problems with all of the ORMs is that they take too narrow a view. I don't really want an ORM or a persistence mechanism, I want something beyond that, sort of a generalized persistence+transactional+recovery architecture. 
  Right now, I'm just looking for people to chime in with "yeah, it would be nice if we had this..."
   My minimum features for a new generation of ORM in addition to the basic idea of persisting objects somehow:
      1. Don't assume relational. 
 This new generation should be as comfortable with a NoSQL store of any flavor (document, XML, key value, column) as it is with a relational database. Flat files should even be ok. If we remove relational, that makes it an OM, which is appropriate because I'm seeking coding nirvana. 

      2. Support caching/cache invalidation on save from the get go: Terracotta, Infinispan, memcached. 
            It's easy to implement a cache, the hard part is implementing the invalidation. But an ORM is itself inherently a cache because the millisecond after you receive a record from the database someone else could have changed it. But everyone has a product table that's a lot smaller than their user table, it makes sense to cache it, and have a mechanism for invalidating it. The challenge is not just caching the individual objects, but caching the relationships and entire tables.    
      3. A new scalable paradigm for transactions/locking. 
                  How we got here:
        There's nothing new under the sun. Transactions on databases were created to solve a concurrency problem. The original intent was good, and for those systems it worked well. You can look at the TPC-E benchmark for how much you can do at the database layer. In essence, the old fashioned way was to move the code down to the datastore level, and expose a set of atomic operations at that level. 
         That is, for the classic "update the bank account balance" example of how to do transactions, you can do that with a stored procedure and expose an atomic interface to all. Which worked ok in the past, and the TPC-E benchmark above does that writ large: it's a stock brokerage implemented mostly in SQL. 
                What's Broken:
          The computing world has moved on though, and it can be almost impossible to code the services you need at the SQL level. We need objects! So People muddle about with "optimistic locking" and other paradigms that attempt to replicate compare-and-swap. If you get an optimistic locking failure though, you have to catch it at the appropriate place, and generally you need to re-attempt at the appropriate place.
                 The way forward:
           I hinted at this above, to deal with concurrency we need to think in terms of exposing services on our data. EJB/JPA all that encourages you to really ignore concurrency problems rather than thinking about them up front. When you do think about them, you don't necessarily have a lot of solutions available; mostly it may just be "hey that EJB call I just did? Try that again, maybe we'll get lucky this time!"
           But with NoSQL, failures may not happen until replication time.
           Plus, in the real world, life is messy. Bank balances have "overdrafts", warehouses have "wastage" on their inventory. A bank account balance is really cache(sum(transactions.amount)). So our bank account balance example has always been somewhat fake. 
           So the solution I foresee to all this is the ability to encapsulate the data changes we want to make with the recovery steps, and to describe the changes we want at a higher level. Cassandra has added "counter" columns to deal with things like the bank balance problem above; the replication knows that the balance column is a sum of the transactions, so as transactions get replicated collisions are dealt with because both transactions go in, and the bank balance is calculated using both. 
           That is, we code the balance update as a command against the account balance and such commands are integrated into the OM. Such a command can operate in a completely transactionless fashion, because we're only adding the adjust_balance row to the account command table, which should always succeed. Reconciling colliding commands is an explicit step to creating a command, and part of the OM architecture for creating a command. 
 What do you all think? Are you frustrated with your tools? What problems do you want to see solved. 
PIerce 

             
Raoul Duke
Joined: 2009-01-05,
User offline. Last seen 42 years 45 weeks ago.
Re: Time for a New Generation of ORM?

On Thu, Sep 1, 2011 at 7:48 AM, Pierce Wetter wrote:
>       3. A new scalable paradigm for transactions/locking.

are you aware of:
STM
Eventual Consistency
Hewlett's push for Inconsistency Robustness
et. al.
?

sincerely.

Pierce Wetter
Joined: 2011-09-01,
User offline. Last seen 42 years 45 weeks ago.
Re: Time for a New Generation of ORM?

On Sep 1, 2011, at 10:37 AM, Raoul Duke wrote:

> On Thu, Sep 1, 2011 at 7:48 AM, Pierce Wetter wrote:
>> 3. A new scalable paradigm for transactions/locking.
>
> are you aware of:
> STM
> Eventual Consistency
> Hewlett's push for Inconsistency Robustness
> et. al.
> ?

Yes. That's really exactly what I'm talking about. Transactions/locking I consider fundamentally broken in practice, so an ORM that was built on eventual consistency and thinking ahead about collisions just make sense. Even if collisions are rare: 1/1000, that means they happen every day in production...

Pierce

soc
Joined: 2010-02-07,
User offline. Last seen 34 weeks 5 days ago.
Re: Time for a New Generation of ORM?

Hi,

I pretty much agree with your notion about needing a better solution.

While I think some stuff like the OTS from Gosu and Type Providers in F#
are pretty interesting, I always wonder when I have to work with
databases/ORM:

Why can't I just use some random collection, which is not backed by a
linked list or array, but by some database table? E. g. "import
collection.db.{DatabaseSet, DatabaseMap}" and be done? Sure, there will
always be cases where people need to write SQL, but ihmo the easy cases
are not easy enough.

Just some thoughts ...

Bye,

Simon

Raoul Duke
Joined: 2009-01-05,
User offline. Last seen 42 years 45 weeks ago.
Re: Time for a New Generation of ORM?

On Thu, Sep 1, 2011 at 11:19 AM, Simon Ochsenreither
wrote:
> Why can't I just use some random collection, which is not backed by a linked
> list or array, but by some database table? E. g. "import
> collection.db.{DatabaseSet, DatabaseMap}" and be done? Sure, there will
> always be cases where people need to write SQL, but ihmo the easy cases are
> not easy enough.

linq?

soc
Joined: 2010-02-07,
User offline. Last seen 34 weeks 5 days ago.
Re: Time for a New Generation of ORM?

> linq?
You still have to setup the entities, the database, the connection,
etc., right?

Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.
Re: Time for a New Generation of ORM?
On Thu, Sep 1, 2011 at 10:58 AM, Pierce Wetter <obastard [at] gmail [dot] com> wrote:
On Sep 1, 2011, at 10:37 AM, Raoul Duke wrote:
> are you aware of:
> STM
> Eventual Consistency
> Hewlett's push for Inconsistency Robustness 
 Yes. That's really exactly what I'm talking about. Transactions/locking I consider fundamentally broken in practice, so an ORM that was built on eventual consistency and thinking ahead about collisions just make sense. Even if collisions are rare: 1/1000, that means they happen every day in production...

I urge you to look into CQRS and look forward to your comments on it.  It's not that I think it's your holy grail, but it's a very interesting way of looking at domain modelling and persistence that makes a lot of the challenges of ORM approaches disappear (and brings its own fresh challenges, of course!)  
This should whet your appetite: 
http://www.infoq.com/presentations/Events-Are-Not-Just-for-Notifications
-0xe1a
H-star Development
Joined: 2010-04-14,
User offline. Last seen 2 years 26 weeks ago.
Re: Time for a New Generation of ORM?

imo, the perfect solution would behave just like a thread safe collection (map, list, set, etc.) as if it were completely in memory. the concrete implementation should decide when to save what, and all data inside this collection should be immutable. writing something to a collection would be equal to a commit in sql.

this should cover 95% of all use cases.

Am 01.09.2011 16:48, schrieb Pierce Wetter:
F54A5C6A-B507-4A99-9FE3-4B278F7D4155 [at] gmail [dot] com" type="cite">
   I'm admittedly a Scala Noob, but I'm not a noob to ORMs or enterprise/internet/cloud development. 
   I think its time for a new generation of ORMs. I did a review of the existing Scala ORMs, and while there are some good ideas, none of them really kick ass. Before people start chiming in with "such and such is good", let me say that I'm a big picture guy, and part of the problems with all of the ORMs is that they take too narrow a view. I don't really want an ORM or a persistence mechanism, I want something beyond that, sort of a generalized persistence+transactional+recovery architecture. 
  Right now, I'm just looking for people to chime in with "yeah, it would be nice if we had this..."
   My minimum features for a new generation of ORM in addition to the basic idea of persisting objects somehow:
      1. Don't assume relational. 
 This new generation should be as comfortable with a NoSQL store of any flavor (document, XML, key value, column) as it is with a relational database. Flat files should even be ok. If we remove relational, that makes it an OM, which is appropriate because I'm seeking coding nirvana. 

      2. Support caching/cache invalidation on save from the get go: Terracotta, Infinispan, memcached. 
            It's easy to implement a cache, the hard part is implementing the invalidation. But an ORM is itself inherently a cache because the millisecond after you receive a record from the database someone else could have changed it. But everyone has a product table that's a lot smaller than their user table, it makes sense to cache it, and have a mechanism for invalidating it. The challenge is not just caching the individual objects, but caching the relationships and entire tables.    
      3. A new scalable paradigm for transactions/locking. 
                  How we got here:
        There's nothing new under the sun. Transactions on databases were created to solve a concurrency problem. The original intent was good, and for those systems it worked well. You can look at the TPC-E benchmark for how much you can do at the database layer. In essence, the old fashioned way was to move the code down to the datastore level, and expose a set of atomic operations at that level. 
         That is, for the classic "update the bank account balance" example of how to do transactions, you can do that with a stored procedure and expose an atomic interface to all. Which worked ok in the past, and the TPC-E benchmark above does that writ large: it's a stock brokerage implemented mostly in SQL. 
                What's Broken:
          The computing world has moved on though, and it can be almost impossible to code the services you need at the SQL level. We need objects! So People muddle about with "optimistic locking" and other paradigms that attempt to replicate compare-and-swap. If you get an optimistic locking failure though, you have to catch it at the appropriate place, and generally you need to re-attempt at the appropriate place.
                 The way forward:
           I hinted at this above, to deal with concurrency we need to think in terms of exposing services on our data. EJB/JPA all that encourages you to really ignore concurrency problems rather than thinking about them up front. When you do think about them, you don't necessarily have a lot of solutions available; mostly it may just be "hey that EJB call I just did? Try that again, maybe we'll get lucky this time!"
           But with NoSQL, failures may not happen until replication time.
           Plus, in the real world, life is messy. Bank balances have "overdrafts", warehouses have "wastage" on their inventory. A bank account balance is really cache(sum(transactions.amount)). So our bank account balance example has always been somewhat fake. 
           So the solution I foresee to all this is the ability to encapsulate the data changes we want to make with the recovery steps, and to describe the changes we want at a higher level. Cassandra has added "counter" columns to deal with things like the bank balance problem above; the replication knows that the balance column is a sum of the transactions, so as transactions get replicated collisions are dealt with because both transactions go in, and the bank balance is calculated using both. 
           That is, we code the balance update as a command against the account balance and such commands are integrated into the OM. Such a command can operate in a completely transactionless fashion, because we're only adding the adjust_balance row to the account command table, which should always succeed. Reconciling colliding commands is an explicit step to creating a command, and part of the OM architecture for creating a command. 
 What do you all think? Are you frustrated with your tools? What problems do you want to see solved. 
PIerce  

             

Raoul Duke
Joined: 2009-01-05,
User offline. Last seen 42 years 45 weeks ago.
Re: Time for a New Generation of ORM?

perhaps what we want is:
http://dl.acm.org/citation.cfm?id=615226
orthogonal persistence?

Pierce Wetter
Joined: 2011-09-01,
User offline. Last seen 42 years 45 weeks ago.
Re: Time for a New Generation of ORM?

On Sep 2, 2011, at 10:41 PM, Alex Cruise wrote:
On Thu, Sep 1, 2011 at 10:58 AM, Pierce Wetter <obastard [at] gmail [dot] com> wrote:
On Sep 1, 2011, at 10:37 AM, Raoul Duke wrote:
> are you aware of:
> STM
> Eventual Consistency
> Hewlett's push for Inconsistency Robustness 
 Yes. That's really exactly what I'm talking about. Transactions/locking I consider fundamentally broken in practice, so an ORM that was built on eventual consistency and thinking ahead about collisions just make sense. Even if collisions are rare: 1/1000, that means they happen every day in production...

I urge you to look into CQRS and look forward to your comments on it.  It's not that I think it's your holy grail, but it's a very interesting way of looking at domain modelling and persistence that makes a lot of the challenges of ORM approaches disappear (and brings its own fresh challenges, of course!)  

  So a funny thing happened on the way to posting to the list. I went and looked up CQRS and it didn't mean what I thought it meant, so I didn't mention it in my posting...
  All CQRS implies really is that the model for changing objects isn't the same as the model for asking about objects. People then take CQRS and do interesting things with it, like the presentation you found says:
This should whet your appetite: 
http://www.infoq.com/presentations/Events-Are-Not-Just-for-Notifications

  Great find!

  This is a fantastic presentation about modeling data as a series of events. That's exactly the direction I'm talking about, its what I thought CQRS meant... 
   If you watch the presentation, what I want to do is to be able to create a structural models, and then have the persistence mechanism do all the grunt work to persist it using events. That is, I can design objects and think about them as objects. But changes to those objects get recorded as events, persisted as events and so on. 
   I don't even think its that "radical" a change, because once your organization gets to the maturity level that it needs audit trails you need to record events anyways. Since in practice all that means you that have support staff, it makes more sense to start from that point to begin with. So you can add something like Hibernate Envers to your code, which in the end, is bolting some events after the fact onto the data model. 
     So wouldn't it make more sense to start from an event driven point of view? Then we could leverage all the things mentioned in the presentation. 
 Pierce
 
Raoul Duke
Joined: 2009-01-05,
User offline. Last seen 42 years 45 weeks ago.
Re: Time for a New Generation of ORM?

On Sat, Sep 3, 2011 at 2:51 PM, Pierce Wetter wrote:
> http://www.infoq.com/presentations/Events-Are-Not-Just-for-Notifications
>   This is a fantastic presentation about modeling data as a series of
> events. That's exactly the direction I'm talking about, its what I thought
> CQRS meant...

yes. people tend to confuse DDD, CQRS, and Event Sourcing, among other
things in the mix. oh well. and/or believe that if you want one, you
want them all. etc.

Alex Cruise
Joined: 2008-12-17,
User offline. Last seen 2 years 26 weeks ago.
Re: Time for a New Generation of ORM?
On Sat, Sep 3, 2011 at 2:51 PM, Pierce Wetter <obastard [at] gmail [dot] com> wrote:
   If you watch the presentation, what I want to do is to be able to create a structural models, and then have the persistence mechanism do all the grunt work to persist it using events. That is, I can design objects and think about them as objects. But changes to those objects get recorded as events, persisted as events and so on. 

Congratulations, you've found an area of ongoing research: Graph differencing. :) 
-0xe1a
Grey
Joined: 2009-01-03,
User offline. Last seen 42 years 45 weeks ago.
Re: Time for a New Generation of ORM?
CQRS is nothing more than a common technique that someone decided to "name" and go on the road giving seminars about it.
The events-are-not...-notifications is just a heuristic to the same ol' same ol'.  I'm tempted to say 30% snake oil and a recycling of know approaches for the rest, admittedly a fresh one.   I view it more of a testimonial of successful application of a particular technique to a particular problem by a team of sharp people.
On Sat, Sep 3, 2011 at 8:17 PM, Raoul Duke <raould [at] gmail [dot] com> wrote:
On Sat, Sep 3, 2011 at 2:51 PM, Pierce Wetter <obastard [at] gmail [dot] com> wrote:
> http://www.infoq.com/presentations/Events-Are-Not-Just-for-Notifications
>   This is a fantastic presentation about modeling data as a series of
> events. That's exactly the direction I'm talking about, its what I thought
> CQRS meant...

yes. people tend to confuse DDD, CQRS, and Event Sourcing, among other
things in the mix. oh well. and/or believe that if you want one, you
want them all. etc.

Jim Powers
Joined: 2011-01-24,
User offline. Last seen 36 weeks 2 days ago.
Re: Time for a New Generation of ORM?
Scala Integrated query looks promising: http://code.google.com/p/scala-integrated-query/

SIQ was inspired by Ferry: http://www.pathfinder-xquery.org/research/ferry (IIRC)

-- 
Jim Powers
Pierce Wetter
Joined: 2011-09-01,
User offline. Last seen 42 years 45 weeks ago.
Re: Time for a New Generation of ORM?

  Wow, SIQ looks amazingly cool. It would be a fantastic building block for what I'm talking about. 

Scala Integrated query looks promising: http://code.google.com/p/scala-integrated-query/

SIQ was inspired by Ferry: http://www.pathfinder-xquery.org/research/ferry (IIRC)

-- 
Jim Powers

Pierce Wetter
Joined: 2011-09-01,
User offline. Last seen 42 years 45 weeks ago.
Re: Time for a New Generation of ORM?



Congratulations, you've found an area of ongoing research: Graph differencing. :) 

  Not at all. 
  I'm not talking about comparing two graphs, I'm talking about recording the changes to a graph as they happen in memory, then saving the results to the database. 
   Hibernate Envers already does this for Hibernate. Versioned entities record additions/deletions from their to-many relationships, and save before snapshots of their previous revision to a history table per table. 
   But if you're using something like Hibernate Envers so you can have auditing and history, doesn't it make sense to build it into the ORM at the foundation layer? And since you're already doing that, doesn't it make more sense to approach the database from a transactional, change log point of view?
   The real challenge will be identifying the parts of the data model that require reconciliation if we want to go completely lockless and eventually consistent. 
   Thinking of account balance as cache(sum(credits-debits)) is probably going to be easy. We record the UUID of the last transaction along with the balance in the database, we store the previous transaction UUID with the old balance in the update event. If there's an N way collision, the reconciler process can figure out which update events need to be combined. NULL could be used as a signal that the event log is out of date. 
    To-one relationships might be harder, because you can only sell an airline seat once. (Of course, my assumption with all this has been that in the real world, they oversell, over draft, sell over inventory all the time.) In this case the reconciler would have to allocate the user a new seat from a pool of reserve seats. 
     Pierce
  P.S.
  Technically, Hibernate already does graph differencing.     

Copyright © 2012 École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland