Skip to content

Performance is Good: ActiveObjects vs ActiveRecord

14
Aug
2007

So ActiveObjects is a fairly cool ORM.  However, coolness alone does not an enterprise ORM make.  In fact, the real qualifications for an enterprise-ready framework are as follows:

  • Stability
  • Performance

I’m sure there are other questions which factor into design decisions on whether or not to use a library, but those are the two which I look at most closely.  Stability is usually a hard metric to find, since it usually depends on a lot of adopters hammering the library until it breaks, is fixed and then hammered again.  However, performance numbers are almost always easy to come by, since all that is required are a few simple benchmark tests to just get a ballpark-number.

Since benchmarks are so fun, I’ve decided to do a few for ActiveObjects.  Or rather, I’ve decided to run a simple (read, very simple) benchmark test with ActiveObjects as well as a number of other ORMs.  At the moment, I’ve only been able to run the test with ActiveRecord (sorry guys, Hibernate’s a really complex framework), but I think the numbers are still worth looking at.

ActiveRecord claims only a 50% overhead compared to manual database access (that number is actually listed as a feature).  There has been some dispute over whether the test used to obtain that particular figure was valid or not, but that’s besides the point.  ActiveObjects should be able to do at least that well, right?

Well, as it turns out, it can.  Here are the numbers from my reasonably simple benchmark:

ActiveObjects
==============
Queries test: 55 ms
Retrieval test: 68 ms
Persistence test: 55 ms
Relations test: 154 ms

ActiveRecord
=============
Queries test: 154 ms
Retrieval test: 6 ms
Persistence test: 76 ms
Relations test: 75 ms

Surprisingly close numbers actually.  I had assumed that there would be some significant disparity, one way or another.  However, as you can see ActiveObjects is fairly comparable to ActiveRecord on a set of extremely trivial tests.  There are some jumps and obvious areas of strength/weakness in both frameworks, but on average they’re pretty similar in performance.

As my friend Lowell Heddings pointed out, ORM benchmarks are far more useful if you actually examine the SQL generated to see how efficient it really is from a theoretical standpoint.  So, to make things easier I sed/grepped the logs and arrived at the following SQL outputs for each respective ORM.

Details

Now, I will be the first to admit that this is hardly at even test to begin with.  Obviously there are different strengths and weaknesses in every library, and though I tried to be impartial in the designing of the benchmarks, I probably accidentally favored one ORM over the other.  Also, there are inherent performance advantages to Java over Ruby, especially in the area of database access.  In short, ActiveObjects probably had a sizeable advantage coming right out of the gate, so take my numbers with a grain of salt.

The test itself consisted of four phases, each involving three entities: Person, Profession and Workplace.  Person has a many-to-many relation with Profession through a fourth entity, Professional.  Workplace has a one-to-many relation with Person.  These relations were exploited directly in the relations benchmark (e.g. Person#getProfessions(), Workplace#getPeople(), etc).  Each entity had a number of fields, including one CLOB (or TEXT, as MySQL refers to them) in the Person entity.  The tables for each respective schema were pre-populated with the same data, which involved several rows with different values (except for the CLOB, which was a roughly 4000 character paragraph and the same for every row).  In the ActiveObjects Person entity, I used the @Preload annotation to eagerly load firstName and lastName.

For the retrieval test, the benchmark iterates through every Person row and grabs firstName, lastName, age, alive, and bio.  Since ActiveObjects only preloaded firstName and lastName, it suffered a bit here. 

The persistence test iterates through every person row and changes the first and last name to one selected from a pool of names I populated with random names which came to mind.  It then goes through the same iteration again and sets the age, alive flag and the bio to our 4000 word Pulitzer-winning essay.  Each row is saved through each iteration, thus each row is saved exactly twice throughout the test.  ActiveObjects came out ahead here probably because of its use of PreparedStatements, as well as the more efficient UPDATE statement generation.

The relations test involved first finding all of the Professions associated with each individual Person and retrieving the Profession name.  Next, the Workplace for the Person is retrieved, then all of the Person(s) associated with that Workplace and their firstName and lastName values accessed.

The queries test was little more than getting all of the Person(s), all of the Workplace(s), all of the Professional mappings, along with all of the Profession(s).  ActiveObjects far outperformed ActiveRecord in this area since ActiveRecord uses SELECT * for everything and eagerly loads the row values.  This means (especially with a CLOB thrown into the mix) that ActiveRecord’s initial query time will be very long, while it’s field access time will be very quick.  Most ORMs function in this way, and it can be a very good thing at times (our benchmark is one of those times).

Lessons Learned

  • Eager loading can be a good thing
  • ActiveObjects generates some weird SQL for relations access

Obviously I can only do so much about the eager loading issue.  I believe pretty strongly that ActiveObject’s approach (in lazy loading most things) is the right one for most use-cases.  However, the second lesson to be learned here is one which I think I need to take a bit more to heart: keep it simple SQL.

Normally, ActiveObjects will generate a query something like the following for accessing a one-to-many relation:

SELECT DISTINCT a.outMap AS outMap FROM (
    SELECT ID AS outMap,workplaceID AS inMap FROM people 
       WHERE workplaceID = ?) a

Yuck!  For obvious reasons, this is an incredibly inefficient bit of querying.  Actually, not only is it inefficient, but needlessly so.  You and I of course know that we could replace the above query with the much simpler:

SELECT ID FROM people WHERE workplaceID = ?

So why doesn’t ActiveObjects do that?  Frankly, I was lazy in my coding of the EntityProxy#retrieveRelations method, so a lot of ugly SQL slipped through the cracks in cases where it really wasn’t necessary.  I’ve spent a bit of time on this, and I think I’ve got the issue resolved.  The problem is that ActiveObjects was assuming that any relation (one-to-many or many-to-many) can have multiple mapping fields, thus requiring a wrapping DISTINCT outer query around a subquery SELECT which is UNIONed with an arbitrary number of other SELECTs, corresponding to the other mapping fields.  Obviously, it is almost never the case that we have to deal with multiple mapping paths, so I added a short-circuit to the logic which creates far simpler queries if at all possible.  As a result, the benchmark numbers for the relations test in ActiveObjects are between 80 and 100 ms.  Still slower than ActiveRecord, but much improved.

It’s worth noting that if we ran each benchmark twice, we would see a marked improvement in the ActiveObjects performance the second time through.  Not just because a lot of the values would be cached, but also because the prepared statements in question would have been compiled and stored.  This is a fairly major area in which ActiveRecord falls short since it doesn’t utilize prepared statements, thus having a constant runtime for its queries and remaining unable to take advantage of cached, compiled queries.

So in short, ActiveObjects may be really neat, but it’s performance numbers don’t seem all that superior to those of ActiveRecord, a Ruby ORM with numerous known shortcomings in this area.  I guess I need to work on things a bit more.  :-)  Next up, either manual JDBC code or Hibernate running the same benchmark, depending on how soon I’m able to figure out Hibernate’s crazy XML mapping schema.

Note: I forgot to mention this… You can get the source for my benchmark from the ActiveObjects SVN repository: svn co https://activeobjects.dev.java.net/svn/activeobjects/trunk/Benchmarks

An Easier Java ORM

18
Jul
2007

There is no doubt that Hibernate is the dominant ORM for Java applications. There are Hibernate conferences and Hibernate user groups. It even has its own shelf at Barnes and Nobel. However, I have always found it suffering from a fatal flaw: it’s incredibly complex. Even accomplishing the simplest of functionality requires hundreds of lines of code, reams of boiler-plate POJO definitions, and a fair share of XML. Granted, it’s probably simpler and more scalable than using JDBC in its raw form, but I still think it makes things harder than they have to be.

This has been a common theme throughout most ORMs in any language. Even various Python ORMs, touted for simplicity and power, still remained burdened with either cryptic syntaxes, or mountains of boiler-plate. This general failing is probably the single most important factor which has contributed to the success of Ruby on Rails.

Rails is nothing without its flagship component, ActiveRecord. AR provides an incredibly simple and powerful database wrapper layer which makes working with almost any schema simple and intuitive. Granted, migrations are a bit weird and it’s performance is far from satisfactory, but the fact remains that it’s much easier, quicker (and thus cheaper, at least in the short term) to develop a database-reliant application using ActiveRecord than it is to build the same application on top of Hibernate or iBatis. This has been carried to such an extreme that developers have taken to using ActiveRecord on top of JDBC through JRuby just to handle the database layer of their Java application. Clearly a better solution is required.

Much of ActiveRecord’s power is drawn from the fact that very little code is required to interact with the database. For example, extracting data from a hypothetical people table in a database is as simple as:

class Person < ActiveRecord::Base
end
 
puts Person.find(1).first_name   # roughly == SELECT first_name FROM people WHERE id = 1

Obviously, the first_name method doesn’t exist in the Person class. This code is using a feature of Ruby which allows for methods to be defined at runtime (or rather, not defined and just handled). It should be apparent how much this could simplify database access, and how unfortunately inapplicable it is to Java.

Fortunately, the principles of ActiveRecord are not entirely irrelevant. The Java reflection API contains a little-known feature called interface proxies. It allows code to define a dynamic implementation of a given interface which handles method invocations in a single, dynamic location. It’s not quite dynamic method definitions, but it is dynamic method handling and it eliminates a lot of boiler-plate.

So what good does this do us? Enter ActiveObjects!

ActiveObjects is a Java ORM based on the concept of interface proxies. The whole idea behind it is that you shouldn’t have to write any more code than absolutely necessary. In fact, ActiveObjects was designed from the ground up to be dead-easy to use. Here’s a rough approximation of the previous ActiveRecord example, translated into Java using the ActiveObjects ORM:

public interface Person extends Entity {
    public String getFirstName();
    public void setFirstName(String firstName);
}
 
EntityManager em = new EntityManager(jdbcURI, username, password);
 
Person p = em.get(1);
System.out.println(p.getFirstName());    // executes "SELECT firstName FROM person WHERE id = 1"

Isn’t that refreshingly simple? With the exception of placing the appropriate JDBC driver in the classpath, that’s all there is to using ActiveObjects. AO automatically detects the database type from the JDBC URI and loads the appropriate driver. If you have placed a pooling library on the classpath, it will be used to pool the database connections. You’ll notice the Person implementation is created by the EntityManager instance corresponding to the (presumably existing) database row, and the SQL is automatically generated and executed, typing the result and returning it from the method. AO even uses PreparedStatement(s) for maximum performance (currently unsupported by ActiveRecord). And if we were to call the setFirstName(String) method, an SQL UPDATE statement would be automatically executed based on the data specified.

Of course, simple CRUD operations do not an ORM make. For one thing, it’d be really nice to be able to automatically generate database-specific DDL statements, ala Hibernate and ActiveRecord. Coolly enough, ActiveObjects allows for this too:

System.out.println(Generator.generate("../classes", "jdbc:mysql://localhost/db", "test.package.Person"));

This would output:

CREATE TABLE person (
    ID INTEGER AUTO_INCREMENT,
    firstName VARCHAR(45),
    PRIMARY KEY(ID)
);

With a different JDBC URI, the DDL would be different. ActiveObjects completely insulates the developer from any database inconsistencies. In fact, the only steps needed to switch to a different database are to place the appropriate driver on the classpath, and to change the URI.

To round up our little tour of ActiveObjects, it’s worth looking at relations a bit. In ActiveRecord, relations are specified using the has_many and belongs_to methods. Hibernate uses annotations/custom javadoc syntax to accomplish roughly the same thing. In ActiveObjects, things are nice and clean:

public interface Person extends Entity {
    public String getFirstName();
    public void setFirstName(String firstName);
 
    public House getHouse();
    public void setHouse(House house);
}
 
public interface House extends Entity {
    public String getAddress();
    public void setAddress(String address);
 
    @OneToMany
    public Person[] getPeople();
}

With just the above entity definitions, calls can be made to instances of Person, passing the House instance to store. Likewise, House#getPeople() can be called, which will return an array of all Person instances associated with that House. Obviously, the Java instance itself isn’t being stored in the database, but rather the houseID INTEGER value is persisted, through the restriction of the appropriate foreign key.

It almost seems magical when you first start using it. :-) I’ve written two test applications (not downloadable, but included within the ActiveObjects SVN) which illustrate the framework’s potential more fully. Also, I’ve been using ActiveObjects as the basis for a reasonably complex Wicket application deployed in the “real world”, and I must say it’s performed beautifully.

Now here comes a moment of bad news: it’s not fully documented yet. Since it’s only on its 0.3 release, I’ve been more interested in squashing bugs and adding features than writing javadoc, though I assure you it is on my agenda. There are a few brief (and somewhat dated) examples on the website, as well as partial javadoc for the EntityManager class. Thankfully, the framework itself is remarkably intuitive and very easy to pick up.

There’re so many more cool features I would love to have shown here, but unfortunately all bloggers have sworn in blood that they will not extend their posts beyond 1500 words. If you are interested, I’ll write a follow-up post going into further detail about the framework (including the more performant SaveableEntity super-interface, which does not execute an UPDATE on every setter call).

Do yourself a favor and take a look at the API and examples. Once you get used to the simplicity of Entity interfaces and non-configuration of database connections, you’ll never look back.

https://activeobjects.dev.java.net

JRuby: The Future of Scalable Rails?

14
Jun
2007

So I was talking earlier today with my good friend, Lowell Heddings, regarding certain annoyances we had with web frameworks. The conversation started talking about the difficulties of developing PHP applications due to the lack of a debugger, but (as conversations on web frameworks are wont to do) eventually the migrated to Rails.

I mentioned how I’d always been a bit distrustful of a web framework which ran on the one-request-per-persistent-process model. My reasons for distrusting this sort of framework were mainly related to performance and scalability, but Lowell brought up an interesting point that I hadn’t considered before: process-shared sessions.

See, because Rails runs as a parallel share-nothing process (i.e. the mongrel instances don’t share memory state with one-another), trivial in-memory data can be a bit of a problem. Also, caching to disk can be a bit problematic since there are multiple processes attempting to access the on-disk data simultaneously. I’m sure many clever solutions have been mooted to solve this problem, but (I think) a new one occurred to me as we were discussing the problem. (caveat: I haven’t fleshed out this solution at all with any code. I’m posting it because Lowell thought it was an idea worth sharing) :-)

My solution to the problem drops back into one of the hottest topic in Ruby today: JRuby on Rails. JRuby allows you to run Rails applications in a Java-based and integrated environment, even to the extent of using existing Java tools, libraries and process containers. An ancillary project to JRuby even allows you to package up your Rails application within a WAR and host it directly within a Java application server like Tomcat or Glassfish.

Packaging Rails as a WAR obviously necessitates a bit of configuration that wouldn’t normally go into the deployment of a Rails application. For example, if you want to serve multiple requests with a Rails app concurrently, you would run multiple Mongrel instances and use an Apache mod_proxy configuration which would proxy requests to available server processes. Java web application, while they do run on the persistent-process model, are designed to be multi-threaded, rather than multi-process. Thus, Java web applications automatically scale to involve concurrent requests since all that is required is the spawning of a new thread within the application server.

Rails however, is designed to be hosted as a separate process and would have to be extensively modified to support this kind of scaling directly. The solution found by the JRuby-extras project is to allow multiple Rails instances to be controlled by a single Rails app WAR. The number of instances is controlled by a configuration option within the web.xml config file within the WAR. Thus, the JRuby WAR will spawn a new instance of the JRuby interpreter for each Rails process (as Rails expects), all hosted in separate threads within the same JVM instance, controlled by the Java app server. Thus, instead of going to all the hassle of configuring a new Mongrel instance and adding a mod_proxy rule to scale your Rails app, all that is necessary is to change a value in an XML file and to redeploy.

This single-process encapsulation of Rails in this way allows us to provide a solution for Lowell’s shared data problem. Instead of storing shared data (like application sessions or cached values) within the Rails process itself, the Rails application should use a Java class (hosted within the same WAR) to store the data. Thus, all shared application data should be stored at lower level than Rails, within the Java process itself. Java has some very solid concurrency APIs which would allow this sort of shared state without data corruption.

JRuby-on-Rails Diagram

As the application scaled and the shared data requirements increased, the SharedCache class (as we’ll christen it) could be modified to cluster, using Terracotta. The Rails WAR itself is already transparently clusterable through Java application servers like JBoss. As Lowell put it, it’s like an infinitely scalable memcached, without all the fuss.

Well, it’s a thought anyway…