Skip to content
Print

Even More ActiveObjects: Preloading

13
Aug
2007

There has been some talk recently regarding the ActiveObjects lazy-loading mechanism.  It’s starting to seem that what I thought was a great idea and terribly innovative when I designed the framework might not have been such a great idea after all.  :-)   That’s a good thing though, finding my mistakes that is, it just forces me to think a little harder about how to solve the problem.

One of the guiding ideas behind ActiveObjects is that nothing should be loaded until it’s needed.  Once it’s loaded, it should be cached and then up-chucked on command, obviating the need for multiple loads.  This technique, commonly known as “lazy-loading”, works really well if you’re in a memory-crunch situation.  This is because even for tables with extremely large numbers of columns (think 50-100), none of the data in a row is loaded if you don’t need it.  Thus, you could work with a database-peered object without having to load the entire row into memory, a potentially long and expensive operation.

The problem with this is it tends to create large numbers of queries.  Also, it can be very inefficient for certain types of operations.  For example:

for (Person p : manager.find(Person.class)) {
    System.out.println(p.getName());
}

This will generate the following SQL (assuming 6 rows in the people table):

SELECT ID FROM people
SELECT NAME FROM people WHERE ID = ?
SELECT NAME FROM people WHERE ID = ?
SELECT NAME FROM people WHERE ID = ?
SELECT NAME FROM people WHERE ID = ?
SELECT NAME FROM people WHERE ID = ?
SELECT NAME FROM people WHERE ID = ?

Granted, it’s a prepared statement, so it will be compiled and run very quickly 5 out of 6 times.  However, this is still pretty inefficient.  Imagine if there were 100,000 people in the database, instead of 6 (not an unreasonable assumption).  This code could take hours to run.

Now, if you were writing the JDBC code by hand, you’d probably do something like this (exception handling omitted):

Connection conn = getConnection();
PreparedStatement ps = conn.prepareStatement("SELECT name FROM people");
ResultSet res = ps.executeQuery();
while (res.next()) {
    System.out.println(res.getString("name"));
}
res.close();
ps.close();
conn.close();

One statement, that’s all that’s really required.  Paging through a result set is a pretty quick operation, so even with 100,000 rows this shouldn’t be an insanely slow piece of code.  In fact, the slow-down here is probably how fast the console can print the text in question (not very fast actually).

So, obviously we have very disparate performance between JDBC by hand and using ActiveObjects, and we really can’t have that.  The solution is to force ActiveObjects to somehow load all of the names for the people in the first query, like we did when we ran the SQL by hand.  For a while now, ActiveObjects has had this capability:

for (Person p : manager.find(Person.class, Query.select("id,name"))) {
    System.out.println(p.getName());
}

Now we just execute a single line of SQL:

SELECT ID,NAME FROM people

Much more efficient.  However, the code is now much uglier and a little unintuitive. (I mean, who’s going to think of Query.select(“…”) when looking to override lazy-loading?)  Also, we would have to use this cryptic syntax in every single query in which we want to override the lazy-loading.  This could be a bit of a pain, especially if you know at design time that every time you get a Person, you’ll probably need a “name” shortly thereafter.  So, for situations just like this one, I’ve now added the @Preload annotation (not in the 0.4 release, available in trunk/)

@Preload("name")
public interface Person extends Entity {
    public String getName();
    public void setName(String name);
 
    public int getAge();
    public void setAge(int age);
}
 
// ...
for (Person p : manager.find(Person.class)) {
    System.out.println(p.getName());
}

Just as we would expect, this now runs the following single-query SQL statement:

SELECT NAME,ID FROM people

If we were to add a call to p.getAge(), it would of course lazy-load that value, leading to another SQL statement.  However, we can just as easily add it to the @Preload clause like this:

@Preload({"name", "age"})
public interface Person extends Entity {
    // ...
}

Or, since this is really all of the properties in Person, we can use the following, shorter syntax:

@Preload
public interface Person extends Entity {
    // ...
}

So effectively, you can disable lazy-loading in ActiveObjects by adding the @Preload annotation without any parameters to every entity you use.  However, this is a little inefficient since it will pretty much turn any non-joining SELECT statement into a SELECT *.  For this reason, I suggest you only use @Preload for situations like our name-printing loop.  In other words: only for values you know will be queried every time you grab a bunch of entities of a given type.

One more thing worthy of note: this is a hint only.  It doesn’t mean that every Person instance will have a preloaded name value.  Any Query(s) with JOIN clauses will ignore the @Preload annotation to avoid accidentally running JOINs with SELECT *.  Also, quite a few Person instances won’t have any values at all by default.  For example, if you use EntityManager#create(), a new row will be INSERTed into the people table, but the resulting Person instance won’t have any value cached for name.  Likewise, if you make a simple call to EntityManager#get(Class<? extends Entity>, int), this will return the Entity instance which corresponds to that id value, but it may or may not have a cached name.  Thus, the get() method still does not run any queries, it merely creates the object peers.

Comments

  1. How this work when we have OneToMany assotiation and I want to include assotitated items in query? e.g.

    public interface Person extends Entity {
    public String getName();
    public void setName(String name);

    public int getAge();
    public void setAge(int age);

    @OneToMany
    public Cars[] getCars();
    }

    David Marko Tuesday, August 14, 2007 at 2:30 am
  2. Also I run following code:
    Person p1=getEntityManager().create(Person.class);
    p1.setUsername(“davidm”);
    p1.setFirstname(“davidm”);
    p1.setLastname(“Marko”);
    p1.save();

    … and this is what I can see. Why so many particular select hits?
    14.8.2007 13:39:38 net.java.ao.DatabaseProvider executeInsertReturningKeys
    INFO: INSERT INTO person (id) VALUES (DEFAULT)
    14.8.2007 13:39:38 net.java.ao.EntityProxy invokeGetter
    INFO: SELECT username FROM person WHERE id = ?
    14.8.2007 13:39:38 net.java.ao.EntityProxy invokeGetter
    INFO: SELECT firstname FROM person WHERE id = ?
    14.8.2007 13:39:38 net.java.ao.EntityProxy invokeGetter
    INFO: SELECT lastname FROM person WHERE id = ?
    14.8.2007 13:39:38 net.java.ao.EntityProxy save
    INFO: UPDATE person SET username = ?,firstname = ?,lastname = ? WHERE id = ?

    David Marko Tuesday, August 14, 2007 at 4:41 am
  3. Oh crud, I know what’s doing that! No wonder I was so mystified…

    The dynamic proxy fires a PropertyChangeListener for every changed value. The problem is PropertyChangeEvent requires an old value as well. I’ll make that code conditional on the value already being cached, to avoid extra database hits. Thanks for finding this!

    In the case of the relations, they’re always lazily loaded, since I haven’t been able to figure out a way to cache them without opening the door to a stale cache and thus inaccurate values. So, in short, no @OneToMany or @ManyToMany relation is eagerly loadable right now. Incidentally, if you have any ideas on the caching I welcome them. :-)

    daniel Tuesday, August 14, 2007 at 9:43 am
  4. Daniel,

    This looks great. I am going to use AO as my ORM; hopefully, you will have a stable release by the time I’ll need to have a stable release myself. :)

    On to the blog post topic: pre-loading is great. My app needs it in spades to trade some start-up time for better performance later.

    However, what about batch saves? My app almost always gets new objects in batches of 10-1000. We also profiled behavior of our users, and they tend to batch their work and submit 50-100 items at a time, even though they can get each item processed immediately.

    Oleg Monday, September 24, 2007 at 10:37 am
  5. Batch saving is an interesting idea that I might look into later. The thing is you really wouldn’t be saving too much by doing your saves in batch. In fact, the only difference is there would only be a single Connection opened, rather than dozens opened and closed in quick succession. The use of a connection pool works around this problem in a more transparent way.

    Batch creating is a more interesting thing to look at, since some databases support an “INSERT INTO blah (id,value) VALUES (?,?),(?,?)” syntax; allowing the insertion of multiple rows in a single statement. The problem is, this isn’t supported by very many databases (it’s not part of the ANSI standard), so you wouldn’t see a performance gain across the board.

    daniel Monday, September 24, 2007 at 10:43 am
  6. I want the way to switch the lazy-loading and pre-loading (with _all_ columns).

    Is there any way to omit (all) column names with Query.select? I tried Query.select(“*”) but didn’t work.

    I think it is necesary to disable lazy-loading when using combined with Wicket/DataView because numerous selects (col*row times) occurrs with lazy-loading.

    tohkawa Thursday, October 18, 2007 at 12:08 am
  7. Adding “@Preload(“*”)” to every entity should effectively disable lazy-loading. Oddly enough, using Query.select(“*”) should have worked fine for that particular query. This probably means I need to fix a bug somewhere…

    Incidentally, if you’re working with DataView, you should probably also take advantage of Query’s pagination feature (Query#limit(int).offset(int)). This will add additional efficiency to the display algorithm. Eventually, this sort of functionality will be handled for you in the wicket integration project, but for the moment you’ll have to do it yourself.

    Daniel Spiewak Thursday, October 18, 2007 at 7:48 am

Post a Comment

Comments are automatically formatted. Markup are either stripped or will cause large blocks of text to be eaten, depending on the phase of the moon. Code snippets should be wrapped in <pre>...</pre> tags. Indentation within pre tags will be preserved, and most instances of "<" and ">" will work without a problem.

Please note that first-time commenters are moderated, so don't panic if your comment doesn't appear immediately.

*
*