Skip to content

Polymorphic Relational Types in ActiveObjects

11
Dec
2007

Remember when I said that ActiveObjects would always attempt to take the simplest approach to any problem, even when it meant eschewing some more esoteric features?  Well, this is probably a bit of an exception.  Polymorphic types are most certainly not simple (at least, not from an ORM design standpoint) and I had to do a bit of hackery under the surface to make them work.  However, given the usefulness of this feature, I think that it was probably worthwhile.

Simply put, a polymorphic relational type is a table which has a relation with one or more other tables based on the value of a non-constrained field.  Got that?

Maybe a diagram would be helpful…

image

We’ve all seen this pattern at some time or other.  Rather than having insurancePolicies contain n mapping fields (e.g. “employeeID” and “managerID”), we make the mapping polymorphic and provide an ancillary type field which specifies which table is actually mapped.  This simplifies queries, not to mention making things far more extensible.  For example, if we wanted to add a janitors table here, we wouldn’t need to change insurancePolicies at all to allow mapping.  Rather, we just add the table and define (in our documentation) another type value which specifies janitors as opposed to employees as the mapped table.

So the concept itself is fairly straightforward.  The difficulties come when you try to map this into ORM-land.  At first glance, it seems like it should be a cakewalk.  Right away we can see some inline inheritance shared between employees and managers, so maybe our entities will look like this:

public interface Person extends Entity {
    public String getFirstName();
    public void setFirstName(String firstName);
 
    public String getLastName();
    public void setLastName(String lastName);
 
    @OneToMany
    public InsurancePolicy[] getPolicies();
}
 
public interface Employee extends Person {
    public int getHourlyWage();
    public void setHourlyWage(int wage);
}
 
public interface Manager extends Person {
    public long getSalary();
    public void setSalary(long salary);
}
 
public interface InsurancePolicy extends Entity {
    public int getValue();
    public void setValue(int value);
 
    public Person getPerson();
    public void setPerson(Person person);
}

As with any other form of table inheritance in ActiveObjects, the supertype doesn’t correspond to a table.  There’s no multi-JOIN mapping going on.  The difference is that now we’re not only inheriting fields from the supertype, but the inheritance also allows other entities to treat the type polymorphically on the supertype.  We can assume that the personType field is auto-generated by the ORM during the migration.  Seems reasonable enough.

Unfortunately, as it stands right now, ActiveObjects will blissfully recurse into the InsurancePolicy entity, see the getPerson method and precede to generate a table for Person, rather than ignoring Person in favor of its subtypes.  This is because AO has no way of knowing that Person even has subtypes.  Java doesn’t provide a convenient way of getting derived interfaces or anything so nice.  So as far as the migration process is concerned, Person is a totally valid entity which requires a peered table.

The solution here is fairly simple, just tack on an annotation to the Person type to indicate to the schema generator that any relations on the type are to be polymorphic.  I vacillated for a while between @Abstract and @Polymorphic, eventually choosing the latter.  However, if you have any strong preferences either way, let me know!

The new Person declaration looks something like this:

@Polymorphic
public interface Person extends Entity {
    // ...
}

Ok, one problem down.  Now we run into the issue of the type mapping value itself (e.g. “employee”, “manager”, etc).  This seems like it should be something AO could handle for us auto-magically, right?  After all, there’s already a hierarchy in place for generating table names from a given entity type, extending this to handle polymorphic mapping values should be trivial.

The problem is that the process of converting an entity type into a table name is non-invertible, meaning that you can’t just feed in a table name and get a valid entity type out the other end.  Information is actually lost in the transition between type and table.  Think about it; the process starts with a fully-qualified class name, strips off the package info, messes with case, special chars and (potentially) plurality.  By the time we get to the result, the table name is so mangled and transformed as to bear absolutely no resemblance to the original type (at least from the perspective of a generic algorithm).

So the current table name generation hierarchy is insufficient for our purposes.  Potentially it could generate the values, but it certainly couldn’t retrieve the type which corresponds with those values.  To solve this problem, we need to introduce a whole new generator to the group: PolymorphicTypeMapper.

All that our type mapper implementation needs to do is define a process by which types are transformed into string values and back again.  We could just rely on storing the fully-qualified class name, but this is both rigid (hard to refactor) and ugly.  No, this is one place where I think we can do something a bit more sophisticated.

It is possible to simply force the users to specify mappings from type to String in the form of a Map<Class, String>, and in the end, this is what our process will boil down to.  However, I think we can add some syntactic sugar to the process which will allow it to default to a de-pluralized (if necessary) version of the table name:

EntityManager em = new EntityManager(...);
em.setPolymorphicTypeMapper(new DefaultPolymorphicTypeMapper(
     Employee.class, Manager.class));

This way, DefaultPolymorphicTypeMapper will auto-generate the mappings based on the classes we pass.  Since we’re statically specifying which subtypes will be used polymorphically, we’re still giving the system enough information to produce an invertible process.  We are coupling our EntityManager initialization a bit to our entity hierarchy.  However, if you’re using migrations, chances are you’ve already taken this plunge.  Anyway, I think it’s about as clean as the syntax can possibly become (with the possible exception of a less verbose class name).

Thanks to the introduction of the type mapper, we can now use our polymorphic hierarchy in the following way:

// ...
Employee employee = em.get(Employee.class, 1);
InsurancePolicy[] policies = employee.getPolicies();
 
for (InsurancePolicy policy : em.find(InsurancePolicy.class)) {
    System.out.println("Found policy with value: " + policy.getValue());
 
    if (policy.getPerson() instanceof Employee) {
        System.out.println("Belongs to an employee");
    } else if (policy.getPerson() instanceof Manager) {
        System.out.println("Belongs to a manager");
    }
}

Okay, maybe a bad example, but the functionality is expressed.  This sort of mapping can be a very powerful tool for reducing code hassle and improving extensibility.  It even works with many-to-many and (the new) one-to-one relations.  With this functionality, ActiveObjects table inheritance is more or less feature-complete.  Unless I’m missing something obvious, this provides about all the reasonable functionality you could possible want from an entity inheritance scheme.

Currently, this feature is just in the SVN trunk/ and slated for release in the upcoming 0.7 build.  There are still a few bugs to be worked out (specifically in the cache expiry mechanism for complex polymorphic many-to-many relations), but everything should be more or less stable and usable.  Enjoy!

Table Inheritance with ActiveObjects

19
Nov
2007

The main obstacle in building an ORM is deciding how to map between a class hierarchy and a database table set.  While this may seem fairly clear cut in many situations, it’s unfortunately not so easy once you get into more complex issues like inheritance.

At a basic level, tables are classes.  It makes sense, right?  A class defines a structure which is instantiated and populated with data.  A table defines a structure into which data is inserted, one set per row.  Superficially, these two concepts sound more-or-less identical.  The differences creep in when you look at things like a class hierarchy.  Classes have the ability to inherit attributes from a superclass.  Thus, if class A inherits from class B, class A has everything that class B has (simplistically put). 

Unfortunately, database tables have no such capability.  You can’t define a table which has everything a “supertable” has.  Nor can you select data out of a “subtable” as if it were a supertable (polymorphism).  This lack of capability creates a disconnect between database structure and object-oriented class structure.

As with all of these “little differences” (between languages and databases), many solutions have been tried, none of them very successfully.  One of the most popular ways to solve the problem is to use separate tables for the supertable and subtable.  Then, whenever one SELECTs from the subtable, a JOIN is used to include the inherited row from the supertable in the result set.  In principle, it sounds right.  After all, this is how C++ handles inheritance.  However, in practice it becomes rather unwieldy.

To start with, JOINing every time you perform a simple SELECT is both inefficient and annoying.  Maintaining two separate rows is also a difficult problem to keep up with.  Inevitably, the data gets a bit out of sync and the whole thing falls apart.  Granted a good ORM will prevent this from happening in the first place, but ORMs are not the exclusive means for accessing database schemata.  Often times your schema will be in use not just by your application, but also by a web site, a demo utility and random DBAs who refuse to enter data in any way other than hand-written SQL.  In short, table inheritance using JOINs is clunky, unmaintainable and really messes you up down the line.

Another, more conservative way to map inheritance is inclining, where every subtable contains all of the fields which would be in a supertable.  Thus, if table A and B inherit from table C, table A and B are created with all of the fields that would have been in table C, and table C isn’t created at all.  Technically speaking, it’s not inheritance, just a slightly more centralized way to specify fields.  However, this technique is much more in line with how a database schema would normally work if designed by hand, and that’s what ORMs are supposed to simplify, right?

The ActiveObjects Approach

As you have have guessed from my comments, I really lean toward the inline strategy.  I think this works best in a practical environment and is far more maintainable down the line.  Thankfully, this strategy is considerably easier to implement than JOINing.  Syntax-wise, inheritance is used like this:

public interface Person extends Entity {
    public String getFirstName();
    public void setFirstName(String firstName);
 
    public String getLastName();
    public void setLastName(String lastName);
}
 
public interface Employee extends Person {
    public String getTitle();
    public void setTitle(String title);
 
    public short getHourlyRate();
    public void setHourlyRate(short rate);
 
    public Manager getManager();
    public void setManager(Manager manager);
}
 
public interface Manager extends Person {
    @OneToMany
    public Employee[] getPeons();
 
    public long getSalary();
    public void setSalary(long salary);
}

A fairly standard, object-oriented interface hierarchy, right?  Logically it makes sense.  This is how ActiveObjects would represent such a hierarchy in the database:

employees
id
firstName
lastName
title
hourlyRate
managerID

 

managers
id
firstName
lastName
title
salary

So basically inheritance in ActiveObjects is just a mechanism which allows the definition of common fields in a single place.  There’s no real polymorphism (that is to say, you can’t INSERT an employee where a person is required, since there is no “people” table).  The idea is: keep it simple stupid.  In fact, ActiveObjects would happily generate a table to correspond to Person if we asked it to, since as far as it’s concerned it’s just an entity like any other.

Of course, there are quite a few serious disadvantages to this approach.  For one thing, without polymorphism or actually centralizing the fields in a common table, inheritance is just an illusion.  From a database standpoint, the tables aren’t really related in any interesting way.  You’re still duplicating fields in the database (though not duplicating data), though since the duplication is all handled by the ORM, it’s not too much of an issue.

For me though, it all comes back to the decision I made to not make the schema over-complicated.  To ensure that no matter what the object hierarchy, it could be represented in the database (assuming it’s valid) without undue weirdness.  Keep the schema simple, make it easy to do stuff outside of the ORM.  The ORM should be a liberating too, not a constraining one.

I realize not everyone agrees with this particular style of table inheritance, so maybe at some point in the future ActiveObjects will allow configuration in this area.  But for the moment, the uncomplicated schema is the way to go.  :-)  Enjoy!

Wide World of Pool Providers: Side-by-Side Comparison

13
Nov
2007

It seems that for any conceivable functionality in Java, there exist a myriad of frameworks which accomplish the task in more-or-less the same way.  ORMs for example; I can count five different Java ORMs without even trying, and I’m sure that number would expand exponentially if I actually sat down and used Google to get a more precise estimate. 

Just like any other function, there seems to be a glut of frameworks which provide JDBC connection pooling.  Choosing between these frameworks can sometimes be a daunting task.  After all, what qualifications do you look at?  Performance?  Licensing?  Documentation?  Connection pooling is such an (apparently) small part of an application’s infrastructure that many development teams devote very little time to this vital selection process.

Due to this management shortsightedness, many projects will simply use whatever pooling library is default for the ORM their using, or (more likely) the first Google hit when searching for “java connection pool”.  Of course, this will get you something which is usable, but rarely will you arrive at a pool provider which is optimal.

To help you to make a more informed decision in this vital aspect of project design, I hereby present some of the lessons I have learned on the subject while working on ActiveObjects.  All benchmarks were run against MySQL 5.0 running on Windows Vista Premium, 2 Ghz Intel Core2Duo, 2 GB DDR2, 7200 RPM SATA drive.  Each test was run five times (executing the DDL each time) with the average runtime taken as the result.  Any obviously poor results (several seconds above the mean) were dropped and re-tested.  All tests were run using the Eclipse JUnit 4 test runner.

commons-dbcp

This is quite possibly the most commonly used pool provider (by default backed by commons-pool), mainly because it used to come up first on Google.  Though, perhaps more important is the fact that commons-dbcp is an Apache sub-project.  This gives it both a less-restrictive license than some other projects, as well as a credibility that comes with being hosted at Apache.  Honestly, if I see a project’s URL contains “apache.org”, I immediately give it the benefit of the doubt, assuming that it will be of reasonable to high quality.  There’s certainly something to be said for reputation…

commons-dbcp is probably the easiest pool provider I’ve seen in terms of API and “just work”-ness.  It has two mechanisms for setting up connections: alternative JDBC URI and a JNDI DataSource implementation.  It’s interesting to note here that the JDBC javadocs state that DataSource is the preferred way to retrieve connections, while the commons-pool javadoc asserts that the alternative URI method passed directly to DriverManager is preferable.  In practice, I find myself using the DataSource method for connection pools, mainly because it reminds me that I’m not dealing with a normal connection creation, but something which is potentially pooled:

BasicDataSource ds = new BasicDataSource();
ds.setDriverClassName(jdbcDriver);
 
ds.setUsername(getUsername());
ds.setPassword(getPassword());
ds.setUrl(getURI());
 
ds.setMaxActive(20);
 
// get connection here
Connection conn = ds.getConnection();
 
// dispose of pool
ds.close();

It’s important to note here that the pool is explicitly disposed. This is always a good idea, even for pools which don’t state the requirement in their documentation.  An undisposed pool can hold database connection open, tying up resources and dragging your database performance through the dirt.  Always, always, always dispose of your connection pools when you’re done with them.

So the API seems pretty intuitive here.  All of the methods do exactly what one would expect.  What’s more, the entire library is extremely well documented.  There’s quite a bit of material on the commons-pool project page discussing how to get started, what best practices to follow, etc.  The public API is javadoc’d, and there are a number of examples available.  I was up-and-running with the framework a few short minutes after I punched in the URL to my address bar.

One thing I haven’t addressed yet is performance.  It’s vitally important that the connection pool chosen run as efficiently as possible.  After all, its whole purpose is to optimize access and reduce the strain on the database in the form of connection create as well as statement compilation.  Obviously all of the really interesting stuff is in this segment of the library.  If this code performs poorly, it would be a very bad idea to try and use the framework for any sort of serious project.

I just happen to have a reasonably comprehensive database benchmark handy in the form of the ActiveObjects JUnit test suite.  ActiveObjects uses a reasonable number of JDBC features (it doesn’t use many conventional Statement(s) or stored procedures).  Since neither the suite nor the library itself changes between benchmarks, we can test arbitrary connection pools easily and receive reasonably accurate results.

Continuing my recent obsession with HTML tables and their use in product reviews, here is the obligatory “five second rundown”:

Documentation Excellent
API Easy and intuitive
License Apache License 2.0
AO Test Suite Run Time 20.6302 seconds

C3P0

C3P0 is another very common connection pool framework, partially because it’s the default pool used with the ever-popular Hibernate ORM.  Unlike commons-pool, C3P0 is actually hosted at SourceForge, that ever popular source of dead open-source projects and over-ambitious specs.  With that said, C3P0 is actually quite respectable as a framework and seems to have avoided the premature fate which befalls most open-source frameworks: developer boredom.

Unfortunately, like so many projects on SourceForge, C3P0 does not have a separate website.  The maintainers opted to stick with the SourceForge project interface as the sole source of “official” information.  Add to that the fact that they decided not to add anything to the “Documentation” section of the page and you arrive at one very frustrating first impression for a new user.  Fortunately there’s a lot of material on using C3P0 (both with and without Hibernate) available around the internet.  Always remember: Google is your friend.

ComboPooledDataSource cpds = new ComboPooledDataSource();
cpds.setDriverClass(jdbcDriver);
 
cpds.setJdbcUrl(getURI());
cpds.setUser(getUsername());
cpds.setPassword(getPassword());
 
cpds.setMaxPoolSize(20);
cpds.setMaxStatements(180);
 
// get connection here
Connection conn = cpds.getConnection();
 
// dispose of pool
try {
    DataSources.destroy(cpds);
} catch (SQLException e) {
}

The API is somewhat similar to that of commons-dbcp.  Both use the DataSource API as a foundation (which is the “right” approach according to the JDBC docs), and both allow roughly the same configuration options on a pool.  At face value, the APIs seem similar to the point that comparison between the two on such a level would be pointless.

One very important “feature” of the C3P0 library is its license: LGPL.  For those of you who don’t know, LGPL is basically identical to the famed GPL 2.0 without the so-called “viral clause”.  GPL pre-3.0 has some legal ambiguity relating to “derivative works” and what qualifies as such.  For this reason, many projects (especially commercial applications written using object-oriented languages) tend to shy away from libraries licensed as such.  LGPL of course doesn’t have this problem, so it has seen moderately better acceptance from the corporate gods.  Unfortunately, it is still a fairly restrictive license relating to other matters such as redistribution.  It is fairly common practice to perform what can be described as “static linking” of JAR files for an application (un-JARring dependencies and then re-JARring them into the main application JAR).  This is something which is prohibited under LGPL, thus restricting deployment options somewhat.  Also, if I remember correctly any non-LGPL application or framework using a LGPL licensed dependency must include a copy of LGPL somewhere in the application (the About or Help section springs to mind).  It was due to this licensing that a company I recently worked for decided against using C3P0 for its application.  Of course, everyone’s requirements are different, but you should still be aware of the possible consequences of using a restrictively licensed framework.

Documentation Lousy
API Easy and intuitive
License LGPL
AO Test Suite Run Time 17.277 seconds

Proxool

Like C3P0, Proxool is an open-source pooling library hosted on SourceForge.  Thankfully, unlike C3P0, Proxool’s maintainers actually took the time to build a full site for the project, containing documentation and examples.  Unfortunately for us, the examples don’t do much good.

Proxool’s documentation is obfuscated and hidden away, making it somewhat difficult to get started with the framework.  Unintuitively enough, the “Quick start” section is of very little help when trying to actually use the library.  Oh it does contain samples, but in my tests I couldn’t get the samples to run successfully.  Add to this the fact that Proxool is a less well-known framework and you lead to some very frustrating experiences trying to get up and running.

To their credit, the Proxool maintainers have written quite a bit of documentation which covers a great deal of the framework functionality.  Organizing a project page intuitively is very hard, it’s just a shame that would-be adopters of the framework have to pay the penalty. 

So to make things easier for others like myself looking to try the framework, here’s the basic setup code for a Proxool pool:

Class.forName(jdbcDriver);
 
Properties props = new Properties();
props.setProperty("proxool.maximum-connection-count", "20");
props.setProperty("user", getUsername());
props.setProperty("password", getPassword());
 
String driverUrl = getURI();
String url = "proxool.mypool:" + jdbcDriver + ":" + getURI();
 
ProxoolFacade.registerConnectionPool(url, props);
 
// get connection here
Connection conn = DriverManager.getConnection("proxool.mypool");
 
// dispose of pool
try {
    ProxoolFacade.removeConnectionPool("mypool");
} catch (ProxoolException e) {
}

Hardly intuitive I’d say.  Nevertheless, the above code seems to get the job done.

One of the debatable advantages to the Proxool library is that it allows developers to take advantage of connection pooling simply by using a special JDBC URI prefix (commons-dbcp allows this too).  I’m not using that syntax in the above example mainly because I think that developers are better served remembering when they are or are not using a pool.  Also, I never could quite get the syntax working (again, poorly structured documentation).

One interesting feature of Proxool that’s worth mentioning is that it allows developers access to things like pool stats, event listeners and so on.  I believe these features are completely unique to Proxool, and while they’re not very interesting in a small test application, imagine the power which can be unleashed in a real-world application.  Exposing this information through something like JMX could make tracing and debugging of database bottlenecks on a production server significantly easier.

Documentation Frustrating
API Poor
License Apache License
AO Test Suite Run Time 18.6406 seconds

Benchmark Comparison

In terms of raw performance, C3P0 comes out ahead by almost a second and a half.  For a short-running test suite like that of ActiveObjects, that’s a fairly impressive difference.  That translates into hours of clock time saved on a database-intensive application of the course of a few weeks.  In my book, that’s something seriously worth considering.

Proxool came in a solid second place, at eighteen and a half seconds.  It’s definitely slower than C3P0, but it’s a full two seconds faster than commons-dbcp.  Considering that Proxool is licensed under the far less restrictive Apache License, it may be worth sacrificing the odd millisecond per query, depending on the opinion of your legal department.

commons-dbcp was the slowest of the three benchmarked at a disappointing twenty and one half seconds.  I’m not entirely sure why DBCP is so much slower in its default, commons-pool backed implementation.  However, the fact remains that performance-wise, it isn’t even worth comparing with C3P0.  Seems I need to make some changes in the classpath of some of my projects…

Throughout the whole benchmarking process, I was constantly reminding why Vista is so notoriously difficult as a host OS for application benchmarks.  The results were constantly fluctuating dramatically up and down, based on how much Vista had superloaded, indexing state, open apps, etc.  In short, Vista was so frustratingly difficult to deal with in the testing process that the test results should be treated with some skepticism.  After all, it’s hard to say that this is empirical, hard evidence when I’m throwing away three quarters of the test results due to vast deviation from the mean.

Conclusion

I must (grudgingly) admit that C3P0 is probably the best choice for most projects.  I say grudgingly because the extreme lack of documentation really bothers me.  Granted, Proxool, the next closest in performance, only has an advantage in licensing; its documentation is no better than C3P0’s.  Proxool of course has the added disadvantage of having a difficult API, as well as less popularity, therefore fewer articles and samples available around the web.

So if you’re a license purist, and you want an intuitive API at the expense of performance, commons-dbcp is the way to go.  However, if you’re willing to work within the restrictions of the LGPL license and you know how to use Google effectively, C3P0 would be the preferred choice, given its higher performance and excellent configurablility.

Update: I didn’t have time to run the benchmarks in any sort of rigorous way (see aforementioned whining about Vista’s benchmark flakiness), but preliminary runtimes indicate that DBPool is an even better framework, performance-wise.  It has a less restrictive license than C3P0, and seems to have a second to a second and a half edge in runtime. Again, these are just quick numbers I grabbed as I was adding support for the provider to ActiveObjects, but I thought it was worth mentioning.

ActiveObjects 0.6.1 Bug Fix Release

7
Nov
2007

Well, I did it again.  I pushed out 0.6 with a critical (and fairly obvious) bug.  Basically, it involved the way I was handling column names and MySQL in result sets.  Thus, 0.6 probably won’t work with the 5.1 version of the MySQL JDBC connector.  :-S  My bad.

Anyway I’ve fixed the bug (thanks to Zach Cox) and included the fix in a minor release on the site.  So if you’re interested in trying ActiveObjects, you really should use the (now available) 0.6.1 release rather than 0.6.  Enjoy!

ActiveObjects 0.6 Released

30
Oct
2007

As a minor side-bar in this (hopefully) noise-less blog, I’d like to announce the release of ActiveObjects 0.6.  If you could care less about ActiveObjects and/or random announcements about it, please feel free to completely ignore this post.

ActiveObjects 0.6 is the most stable release yet (hopefully).  With this release, we see the rise of RawEntity, a superinterface to Entity which allows for greater customization, particularly in the area of primary keys.  Most developers will never need to even be aware of this interface, but for those that have such requirements, it should be very helpful.  Likewise, this release also allows for arbitrary types to be persisted into the database, through the use of custom classes which manage the mapping between Java type and database type.  (hint: this even allows for database-specific types such as PostgreSQL’s MATRIX if you really want them)

Most importantly, 0.6 is the release where I actually buckled down and started writing some documentation.  What’s available on the project page right now is still a little sparse, but rest assured this will be rectified soon (not sure when, but soon).  The main focus for the moment has been javadocing the public API.  This is far from complete at the moment, but all the important (and lengthy) classes are done (specifically, everything in the net.java.ao package).  With this documentation, it should hopefully be somewhat easier to use ActiveObjects in a project without resorting to desperate Google searches at the wee hours of the morning.

Most of the interesting stuff in this release I’ve already covered in other posts on this blog, so I won’t bore you by repeating all of it.  Suffice it to say, if you’ve been waiting for a more stable release to start playing with ActiveObjects, this is it.  I won’t guarantee that the API won’t change at all leading up to version 1.0, but I can say that most of the earth-shattering stuff is behind us.  Documentation is in place, and we’ve got a large (and growing) number of tests which are run to ensure quality and stability in the core functionality.  Download it, try it out, break it, file bugs, you know the drill.  I welcome all suggestions, comments, questions and pro-Hibernate rants.

Download activeobjects-0.6 from java.net