Skip to content

Table Inheritance with ActiveObjects

19
Nov
2007

The main obstacle in building an ORM is deciding how to map between a class hierarchy and a database table set.  While this may seem fairly clear cut in many situations, it’s unfortunately not so easy once you get into more complex issues like inheritance.

At a basic level, tables are classes.  It makes sense, right?  A class defines a structure which is instantiated and populated with data.  A table defines a structure into which data is inserted, one set per row.  Superficially, these two concepts sound more-or-less identical.  The differences creep in when you look at things like a class hierarchy.  Classes have the ability to inherit attributes from a superclass.  Thus, if class A inherits from class B, class A has everything that class B has (simplistically put). 

Unfortunately, database tables have no such capability.  You can’t define a table which has everything a “supertable” has.  Nor can you select data out of a “subtable” as if it were a supertable (polymorphism).  This lack of capability creates a disconnect between database structure and object-oriented class structure.

As with all of these “little differences” (between languages and databases), many solutions have been tried, none of them very successfully.  One of the most popular ways to solve the problem is to use separate tables for the supertable and subtable.  Then, whenever one SELECTs from the subtable, a JOIN is used to include the inherited row from the supertable in the result set.  In principle, it sounds right.  After all, this is how C++ handles inheritance.  However, in practice it becomes rather unwieldy.

To start with, JOINing every time you perform a simple SELECT is both inefficient and annoying.  Maintaining two separate rows is also a difficult problem to keep up with.  Inevitably, the data gets a bit out of sync and the whole thing falls apart.  Granted a good ORM will prevent this from happening in the first place, but ORMs are not the exclusive means for accessing database schemata.  Often times your schema will be in use not just by your application, but also by a web site, a demo utility and random DBAs who refuse to enter data in any way other than hand-written SQL.  In short, table inheritance using JOINs is clunky, unmaintainable and really messes you up down the line.

Another, more conservative way to map inheritance is inclining, where every subtable contains all of the fields which would be in a supertable.  Thus, if table A and B inherit from table C, table A and B are created with all of the fields that would have been in table C, and table C isn’t created at all.  Technically speaking, it’s not inheritance, just a slightly more centralized way to specify fields.  However, this technique is much more in line with how a database schema would normally work if designed by hand, and that’s what ORMs are supposed to simplify, right?

The ActiveObjects Approach

As you have have guessed from my comments, I really lean toward the inline strategy.  I think this works best in a practical environment and is far more maintainable down the line.  Thankfully, this strategy is considerably easier to implement than JOINing.  Syntax-wise, inheritance is used like this:

public interface Person extends Entity {
    public String getFirstName();
    public void setFirstName(String firstName);
 
    public String getLastName();
    public void setLastName(String lastName);
}
 
public interface Employee extends Person {
    public String getTitle();
    public void setTitle(String title);
 
    public short getHourlyRate();
    public void setHourlyRate(short rate);
 
    public Manager getManager();
    public void setManager(Manager manager);
}
 
public interface Manager extends Person {
    @OneToMany
    public Employee[] getPeons();
 
    public long getSalary();
    public void setSalary(long salary);
}

A fairly standard, object-oriented interface hierarchy, right?  Logically it makes sense.  This is how ActiveObjects would represent such a hierarchy in the database:

employees
id
firstName
lastName
title
hourlyRate
managerID

 

managers
id
firstName
lastName
title
salary

So basically inheritance in ActiveObjects is just a mechanism which allows the definition of common fields in a single place.  There’s no real polymorphism (that is to say, you can’t INSERT an employee where a person is required, since there is no “people” table).  The idea is: keep it simple stupid.  In fact, ActiveObjects would happily generate a table to correspond to Person if we asked it to, since as far as it’s concerned it’s just an entity like any other.

Of course, there are quite a few serious disadvantages to this approach.  For one thing, without polymorphism or actually centralizing the fields in a common table, inheritance is just an illusion.  From a database standpoint, the tables aren’t really related in any interesting way.  You’re still duplicating fields in the database (though not duplicating data), though since the duplication is all handled by the ORM, it’s not too much of an issue.

For me though, it all comes back to the decision I made to not make the schema over-complicated.  To ensure that no matter what the object hierarchy, it could be represented in the database (assuming it’s valid) without undue weirdness.  Keep the schema simple, make it easy to do stuff outside of the ORM.  The ORM should be a liberating too, not a constraining one.

I realize not everyone agrees with this particular style of table inheritance, so maybe at some point in the future ActiveObjects will allow configuration in this area.  But for the moment, the uncomplicated schema is the way to go.  :-)  Enjoy!

Custom Data Types in ActiveObjects

17
Oct
2007

ORMs really interest me, so naturally I read a lot of material regarding ORMs of all kinds, especially Hibernate and ActiveRecord.  One of the more interesting reviews I read recently complained about the rigidity of the type system in the Rails ORM.  According to the author’s examination of the code, ActiveRecord just uses a monolithic switch/case statement to determine the appropriate Ruby type from the SQL type in the result set.  This may make sense from a simplicity standpoint, but it may not be the best approach when it comes to flexibility.

The problem with this approach is that it’s impossible to easily add new types to the ORM.  Granted, the framework authors could do it by modifying the switch/case statement(s) - and the approach does usually require more than one statement - and releasing a whole new version of the framework.  This is not a significant issue as the framework authors already have access to the full library sources.  The real trial is with third-party developers who require custom data types.

An alternative approach (suggested in the article) is to implement a series of type delegates inheriting from a common superclass, or possibly using a mixin as allowed by Ruby.  These type classes would each be responsible for a single type, handling the mapping both to and from the language-native type to the database type.  This would allow for both easy addition and modification of core types by the framework authors, but also trivial support for arbitrary types as implemented by third-party developers.

Not one to shirk good advise when I hear it, I’ve decided to go with this approach to types in ActiveObjects.  Formerly, I must admit I had gone with the multiple, giant switch/case statements.  This seemed to make sense when I first implemented the framework, but it developed, it became apparent that this was inadequate, especially if third-party types are desired.  This decision led to the refactoring of the type system and subsequent creation of the TypeManager class.

TypeManager is basically the singleton manager for the entire type system.  It maintains the list of available DatabaseType(s) and can resolve both Java classes and SQL types to the appropriate delegate.  A number of core types (VarcharType, IntegerType, etc) are added to the singleton instance of TypeManager, ensuring that basic functionality works without any extra effort on the part of the developer.  If a type other than the core types is needed, all that is necessary is to add the type delegate instance to the TypeManager prior to the type’s usage in either migrations or data access.  Thusly:

public interface Company extends Entity {
    public String getName();
    public void setName(String name);
 
    public Class<?> getJavaType();
    public void setJavaType(Class<?> type);
}
 
public class ClassType extends DatabaseType<Class<?>> {
 
    public ClassType() {
        super(Types.VARCHAR, 255, Class.class);
    }
 
    @Override
    public Class<?> convert(EntityManager manager, ResultSet res, 
                Class<? extends Class<?>&gt; type, String field) throws SQLException {
        try {
            return Class.forName(res.getString(field));
        } catch (Throwable t) {
            return null;
        }
    }
 
    @Override
    public void putToDatabase(int index, PreparedStatement stmt, 
                Class<?> value) throws SQLException {
        stmt.setString(index, value.getName());
    }
 
    @Override
    public Object defaultParseValue(String value) {
        try {
            return Class.forName(value);
        } catch (Throwable t) {
            return null;
        }
    }
 
    @Override
    public String valueToString(Object value) {
        if (value instanceof Class) {
            return ((Class<?>) value).getName();
        }
 
        return super.valueToString(value);
    }
 
    @Override
    public String getDefaultName() {
        return "VARCHAR";
    }
}
 
// ...
TypeManager.getInstance().addType(new ClassType());
 
Company[] stringCompanies = manager.find(Company.class, "javaType = ?", String.class);
for (Company c : stringCompanies) {
    System.out.println(c.getName() + " former held type " + c.getJavaType().getName());
 
    c.setJavaType(Exception.class);
    c.save();
}

The most complicated bit of the example above is the database type itself.  Yet even this delegate isn’t too horrible.  The ClassType class first specifies in its constructor which types it corresponds to, both database and Java.  Multiple Java class types can be specified, allowing for cases like IntegerType which maps to both Integer.class and int.class.

The rest of the database type is fairly self-explanatory.  There are methods to read the Java value out of a JDBC ResultSet, put the Java value back into a JDBC PreparedStatement, as well as three methods to handle some of the non-database type-sensitive operations, such as parsing a String value into a type-specific value and visa-versa.  These database non-specific conversions are required for things like parsing the value of a @Default or an @OnUpdate annotation.  Finally, getDefaultName() allows the default DDL rendering of the type to be specified.  This can be overridden in the DatabaseProvider implementation for that particular database, but the use of getDefaultName() allows for third party types that the database provider developers may not have foreseen.  Thus, it effectively opens the door to third-party types in migrations.

Of course, no example would be complete without another one to complement it!  Here’s how we could create a type delegate for the java.awt.Point class:

public class PointType extends DatabaseType<Point> {
    private static final Pattern PATTERN = Pattern.compile("x=(\\d+),y=(\\d+)");
 
    protected PointType() {
        super(Types.VARCHAR, 45, Point.class);
    }
 
    @Override
    public Point convert(EntityManager manager, ResultSet res, 
            Class<? extends Point> type, String field) throws SQLException {
        return (Point) defaultParseValue(res.getString(field));
    }
 
    @Override
    public void putToDatabase(int index, PreparedStatement stmt, Point value) 
                throws SQLException {
        stmt.setString(index, valueToString(value));
    }
 
    @Override
    public Object defaultParseValue(String value) {
        Point back = null;
        Matcher matcher = PATTERN.matcher(value);
 
        if (matcher.find()) {
            back = new Point();
            back.x = Integer.parseInt(matcher.group(1));
            back.y = Integer.parseInt(matcher.group(2));
        }
 
        return back;
    }
 
    @Override
    public String getDefaultName() {
        return "VARCHAR";
    }
}

One thing of note here which has changed from the previous example of ClassType is that the second parameter to the super constructor is now 45, instead of 255.  This parameter is actually the default precision of the SQL type when rendered into the database.  If the SQL type doesn’t have a precision or should just take the database default, a negative value should be specified for this parameter.  Another item of note is that we’re delegating work between methods in a way that I simply didn’t do for ClassType.  Because the rendering of the type in the database is in VARCHAR (String) form, we can rely upon our default String conversion methods to render into the database.  As an aside, the superclass implementation of valueToString(Object) uses the toString() method for that particular value.

As you can see, the type system in ActiveObjects is incredibly powerful and capable of satisfying many use-cases that were impossible in previous versions or other ORMs.  Hopefully this brief glimpse into advanced uses of the type system will aid you in databasing efforts.

Custom Primary Keys with ActiveObjects

15
Oct
2007

One of the main complaints I’ve heard leveled against ActiveObjects is that it’s just not suitable for mapping to legacy schemas.  More generically, concerns have been mooted that it enforces naming conventions and field conventions which aren’t suitable/preferable for some projects.  I suppose at first both of these were true.  After all, ActiveObjects’s entire premise was convention over configuration, and this requires some restrictions by default.  However, I don’t think it’s entirely accurate any longer.

Over the last few months, I’ve added several features which satisfy three primary goals:

  • Customize the table name convention
  • Customize the field name convention
  • Allow for primary key fields (and types) other than id INTEGER

The first two goals were easily met through the addition of TableNameConverter and FieldNameConverter.  These two classes are used by every feature within ActiveObjects, from migrations to simple data access, to determine the database table and field names from the class and method names respectively.  The canonical example of this is table name pluralization, which can be accomplished in the following way:

EntityManager manager = new EntityManager(
    "jdbc:mysql://localhost/test", "username", "secret");
manager.setTableNameConverter(new PluralizedNameConverter());

Not too horrible.  The second use-case is assigning a different field name convention than the default camelCase.  For example, some people really like the ActiveRecord (Rails) field naming convention.  (e.g. “first_name” as opposed to “firstName”)  This can easily be accomplished by specifying a field name converter:

EntityManager manager = new EntityManager(
    "jdbc:mysql://localhost/test", "username", "secret");
 
// lower_case convention
manager.setFieldNameConverter(new UnderscoreFieldNameConverter(false));

Custom table and field name converters are also possible, allowing for a great deal of flexibility in name conventions.  Additionally, it’s always possible to specify field and table names directly in the entities, using the @Accessor, @Mutator and @Table annotations respectively.

Custom Primary Keys

The most challenging goal (from a library standpoint) is to allow for primary key fields other than “id”.  This is partially such a challenge because it had been hard coded literally everywhere in ActiveObjects that the “id” field is the field to use in any sort of SELECT, JOIN, INSERT, UPDATE, etc.  In short, changing this required finding all of these instances and converting the code to query a centralized source for the data.  A few days of fiddling with Eclipse’s text search accomplished this without inordinate pain, but the hard part was coming.

The question remained: how to specify the primary key within the entity itself?  After all, it’s been hard coded and sort of magically “worked” based on the method definition in the Entity superinterface.  There had been a syntax to specify a second PRIMARY KEY for the schema migration, but ActiveObjects didn’t treat these fields any differently, and this sort of syntax wouldn’t really cut it if we were trying to completely override the existing getID() method in the superinterface.

The solution is to refactor all of the interesting functionality in Entity up into a super-superinterface, RawEntity.  Thus the only method defined within Entity would be getID(), annotated appropriately to be recognized as a PRIMARY KEY field.  This would do away with all the magic tricks under the surface which assumed the existence of the getID() method.  ActiveObjects can easily parse the class to find the PRIMARY KEY field amongst the methods, both defined and inherited.  The only compromise which must be made is only one PRIMARY KEY can now be allowed per table.  This isn’t such an issue, since 99% of the time, that’s all you need anyway.  Usually that remaining 1% can be more properly accomplished using UNIQUE and some sort of auto-generation of values.

Since we’ve refactored interesting functionality up into RawEntity and kept getID() within Entity, no legacy code needs to be changed.  Any entities previously written against ActiveObjects will run without modification or any behavior changes.  We are merely allowed the flexibility of specifying our own primary keys.  So, without further ado, the obligatory example:

public interface Person extends Entity {
    public String getFirstName();
    public void setFirstName(String firstName);
 
    public String getLastName();
    public void setLastName(String lastName);
 
    public Company getCompany();
    public void setCompany(Company company);
 
    public House getHome();
    public void setHome(House home);
}
 
public interface Company extends RawEntity<String> {
 
    @PrimaryKey
    @NotNull
    @Generator(UUIDValueGenerator.class)
    public String getCompanyKey();
 
    public String getName();
    public void setName(String name);
 
    @OneToMany
    public Person[] getEmployees();
}
 
public interface House extends RawEntity<Integer> {
 
    @PrimaryKey
    @NotNull
    @AutoIncrement
    public int getHouseID();
 
    // ...
 
    @OneToMany
    public Person[] getOccupants();
}
 
public class UUIDValueGenerator implements ValueGenerator<String> {
    public String generateValue(EntityManager em) {
        // generate uuid
        return uuid;
    }
}
 
// ...
Person p = manager.get(Person.class, 1);
Company c = manager.get(Company.class, "abff999dd99ddf0a225f");

Maybe a bit longer of an example than you were expecting, but it does cover the material well.  What’s happening here is the Person entity has a standard, “id” primary key.  This follows the same convention that ActiveObjects has been enforcing since the beginning of time (or at least since I started the project).  Company and House are the interesting entities here.

House defines a getHouseID() method of type int which is marked as a PRIMARY KEY as well as being auto-incremented by the database (SERIAL on PostgreSQL, AUTO_INCREMENT on MySQL, etc).  This is the same sort of declaration that you would find if you looked in the source for Entity.  The difference is that House will not contain the “id” field and its PRIMARY KEY will be “houseID”.  The really interesting entity here Company.

Company defines a primary key that is not only a different field, but also an entirely different type.  Also, its value is generated automatically not by the database, but by the application itself.  This is a fairly common use-case in those crazy databases which use UUIDs as primary keys.  Not only does this field define “companyKey” as a different type than INTEGER, but it also ensures that the “companyID” FORIEGN KEY field in the “person” table is also of type VARCHAR.

Another item of note in this example is that the RawEntity interface is parameterized.  This is to allow the get(...) method in EntityManager to stay type-checked, ensuring that the values passed are actually valid primary key values for the entity in question.  Of course, there’s nothing that can be done to ensure that the actual method definition of the primary key is of the proper type.  However, at some point the developer must be trusted to make sure their entity model doesn’t violate the dictates of logic.

Conclusion

With this latest addition to the ActiveObjects feature set, it should be possible to use the ORM with any schema whatsoever.  While AO may still be an implementation of the active record pattern, and thus less powerful than solutions such as Hibernate, there should be no problems applying AO to just about any sane use-case.

ActiveObjects: Indexing vs Searching

1
Oct
2007

So in the intervening time since I last updated you on ActiveObjects, I’ve been busy refactoring and repurposing some of the core.  I’ve added a full JUnit test suite, which definitely helps my own confidence about the stability of the source.  Also, there’s a whole bunch of new features that have come down the pipe which hopefully I’ll get to address in the next few posts.  So, without further ado…

The change which is probably going to cause you the most grief is the switch from @Index to @Searchable, and the addition of the @Indexed annotation.  Yes, you really did read that correctly; and no @Indexed isn’t even related to the functionality provided by the old @Index annotation.

The old @Index annotation used to handle tagging of entity methods to mark them as to be added to the Lucene full-text search index (see this post for more details).  This was a little confusing for a number of reasons, not the least of which my failure to remember the tense of the annotation name (see the comments on the indexing post).  By convention, most Java annotations are declarative in name.  Thus, the name should not be the present tense “index” but the past tense “indexed”.  So my first thought was to just refactor the annotation, but then I came into a slightly hairier name-clash.

Database Field Indexing

One of the most common techniques for optimizing your database’s read (SELECT) performance is to create indexes on certain fields.  When a field is indexed, the database will maintain some separate hash tables to enable very fast selection of rows based on the field in question.  This is a really good thing for almost all foreign keys, for example:

SELECT ID FROM people WHERE companyID = ?

Here we’re SELECTing the id field from the people table where companyID matches a certain value.  The database can execute this query fairly quickly.  In fact, the only bottle-neck is finding all of the rows which match the specified companyID value.  In a table containing hundreds of thousands of rows, one can see how this could be a problem.

The problem goes away (sort of) with the use of field indexing.  Instead of having to linearly search through the table for rows matching the companyID, the database can perform a quick hash lookup in an index and get a set of rowid(s) based on the companyID.  Simple, efficient, and incredibly scalable.  Practically, the DBMS wouldn’t take any longer to execute such a query against a table of 100,000,000 rows than it does to execute the same query against a table of 100 rows.  So, this is a really good thing right?

Well, indexes have their drawbacks.  I won’t go into all of the reasons not to use indexes, the following two points will likely suffice:

  • Indexing slows down UPDATEs and INSERTs
  • Indexing adds to your table’s storage requirements (all those hashes have to be put somewhere)

So perfect database performance isn’t attainable just by indexing every field, one has to be quite judicious about it.  In fact, choosing the fields to be indexed is really as much of an art as it is a science.

This long and tangled introduction actually did have a point…  ActiveObjects didn’t have any support for field indexing.  I hadn’t really considered the possibility, so I didn’t factor it into my design.  In retrospect, this was probably a bad idea.  So making up for lost time, I’ve now introduced field indexing into the library!

public interface Person extends Entity {
 
    @Indexed
    public String getName();
    @Indexed
    public void setName(String name);
 
    public int getAge();
    public void setAge(int age);
 
    public Company getCompany();
    public void setCompany(Company company);
}

When ActiveObjects generates the migration DDL for this table (running against MySQL), it will look something like this:

CREATE TABLE people (
    ID INTEGER NOT NULL AUTO_INCREMENT,
    NAME VARCHAR(255),
    age INTEGER,
    companyID INTEGER,
    CONSTRAINT fk_people_companyID FOREIGN KEY (companyID) REFERENCES companies(ID),
    PRIMARY KEY(ID)
);
CREATE INDEX NAME ON people(NAME);
CREATE INDEX companyID ON people(companyID);

This is where the aforementioned @Indexed annotation comes in.  As you can see from the resultant DDL, adding a field index is as simple as tagging the corresponding method.  Also, foreign keys are automatically indexed, to ensure maximum performance in SELECTion of relations.

So that’s the good news, the bad news is that index creation doesn’t play too nicely with migrations.  Everything works of course, and if you’re actually adding a table or field, the corresponding index(es) will also be created.  Likewise if you drop a table or field, the corresponding index(es) will also be dropped.  However, that’s about the limit of the migrations support for indexes at the moment.  JDBC has an unfortunately limitation which prevents developers from getting a list of indexes within a database.  Since I have no way (within JDBC) of finding pre-existing indexes, they must be excluded from the schema diff and thus are CREATEd or DROPped irregardless of the existing schema.  I do have a plan to fix this (along with migrations on Oracle), but I seriously doubt that it’ll be included in the 1.0 release, owing to the changes required.

Refactoring End Result

The final result of the refactoring and the adding of field indexing is that the annotation formerly named @Index is now the @Searchable annotation.  Likewise (and keeping with convention), the formerly IndexingEntityManager is now named SearchableEntityManager.  Both of these types remain in the net.java.ao package.  To allow for field indexing, the @Indexed annotation was added to the net.java.ao.schema package, owing to the fact that it only effects schema generation and doesn’t change runtime behavior in the framework at all.

Hopefully these changes won’t be too confusing (now that you’re aware of them) and will be a welcome addition to the ActiveObjects functionality.  As always, I welcome comments, suggestions and criticisms!

An Easier Java ORM Part 4

30
Jul
2007

In keeping with my ActiveObjects series, here’s part 4. In this part, we’ll look at schema generation/migration and the ever-interesting topic of pluggable name converters and English pluralization.

One of ActiveObjects’s main concepts is that you (the developer) should never have to worry about the semantics of database design. You should simply be given the tools to design your models in a natural and object-oriented way, and the database just sort-of takes care of itself. To allow this sort of simplicity, the database schema has to be automatically generated, leaving nothing to the developer in this area. Fortunately, ActiveObjects does poses this capacity.

Schema Generation

In a nutshell, schema generation in ActiveObjects works by parsing the specified entity interfaces. First, a dependency tree is built, ensuring that the schema generation occurs in the proper order satisfying all dependent tables. ActiveObjects does generate all foreign keys for you, ensuring data integrity and maximum performance. This of course has the unfortunate side effect that everything must be inserted in the proper order; hence the tree.

Next, the dependency tree is passed through a loop which iterates through it and invokes DatabaseProvider#render(DDLTable), which generates the database-specific DDL statements necessary to create the entity-corresponding schema.

This all sounds fine-and-dandy on paper (or in this case, screen), but when you actually try to implement it in a real-world senario it gets a bit sticky. For one thing, Java’s types are nowhere near as robust as the ANSI SQL types. Additionally, almost every DDL allows developers to put certain restrictions on fields such as default values, auto incrementing or even forcing table-unique values. The solution, avoiding XML and other non-Java meta-programming, is to use annotations:

public interface Person extends SaveableEntity {
    public String getFirstName();
    public void setFirstName(String firstName);
 
    @Unique
    @SQLType(precision=128)
    public String getLastName();
 
    @Unique
    @SQLType(precision=128)
    public void setLastName(String lastName);
 
    @SQLType(Types.DATE)
    public Calendar getBirthday();
    @SQLType(Types.DATE)
    public void setBirthday(Calendar birthday);
 
    @Accessor("url")
    public URL getURL();
    @Mutator("url")
    public void setURL(URL url);
}
 
// ...
manager.migrate(Person.class);    // generates and executes the appropriate DDL

It might seem a bit ugly and annotation-ridden, but it does the job well and in pure-Java. There are more annotations which could be used (@OnUpdate, @Default, etc…), but these suffice to give a basic example.

You’ll notice right off the bat that each annotation has to be applied to both the accessors and the mutators. This is for two reasons. First, Java reflection doesn’t guarantee any particular ordering in which it will return the methods for a Class<?>. Thus, either the accessor or the mutator could be reached first and used to generate that field’s DDL, meaning the metadata needs to be applied to both or it could possibly be ignored by the schema generator. Second, ActiveObjects allows for read-only or write-only fields, meaning you don’t need the full accessor/mutator pair to access the database, one will suffice. This could be useful for application static data or for fields you just-plain don’t want the application to be able to mutate directly. Because ActiveObjects doesn’t assume an accessor/mutator pair, it can’t just wait for both the accessor and the mutator to be parsed. Thus, meta is required on both.

The other point of interest in this example is the use of the @Accessor and @Mutator annotations. From very early on, ActiveObjects has allowed developers to specify non-conventional method-names as database fields. In this case, ActiveObjects would normally assume the getURL method corresponded to to the “uRL” field (case intentional). This is because AO will assume the default Java get/set/is convention and recase the method accordingly. Since we obviously don’t want that for the “url” field, we use the @Accessor and @Mutator annotations to override ActiveObjects’s field name parser.

Pluggable Name Converters

By default, ActiveObjects uses a very simple set of heuristics to determine the table name from the entity class name. Essentially, this heuristics boil down to a simple camelCase implementation. For example, the “Person” entity would correspond to the “person” table. A “BillingAddress” entity would be mapped to “billingAddress” and so on. This is nicely conventional, but certainly not everyone would agree with this style of table naming. This is why ActiveObjects provides a mechanism to override the table name conversion and specify a custom algorithm.

The name converter interface is pretty simple. It has a single method (getName(Class<? extends Entity>):String) which is where the actual “meat” of the conversion happens. It also defines four algorithmically optional methods, which are intended to be used by developers as a quick-and-easy way to specify explicit mappings (for example, sometimes you want to override the algorithm for a specific class or name pattern without adding the @Table annotation to the entity). These methods are important, but could be left as stubs for a custom name-converter used “in house”.

To make it easier to write custom name converters, an abstract superclass has been created which handles most of the “boiler plate” functionality. I would recommend that this superclass (AbstractNameConverter) be used in lieu of a direct implementation of PluggableNameConverter. The only difference is that instead of overriding the getName(Class<? extends Entity>):String method from the superinterface, the method to override is getNameImpl(Class<? extends Entity>):String. This is the method called by AbstractNameConverter if the entity class in question fails to match an existing mapping.

Specifying the name converter to use is as simple as a single method call to EntityManager:

manager.setNameConverter(new MyNameConverter());

From that moment onward, all operations handled by that EntityManager instance (including schema generation) will use the specified pluggable name converter. This allows us to do some interesting things which would be otherwise out of reach…

English Pluralization

Table name mappings like {Person => person} and {BusinessAddress => businessAddress} may be conventional and nicely deterministic, but most people like database designs which reflect the underlying data a bit better. Since tables by definition contain multiple rows, it only makes sense that their names should be pluralized. Since we now have a mechanism to override the default name converter, we should be able to create an implementation which handles this pluralization for us for an arbitrary English word.

To accomplish this, ActiveObjects defines a class PluralizedNameConverter in the “net.java.ao.schema” package which extends the default CamelCaseNameConverter. The actual implementation within the name converter is fairly simple. All the algorithm needs to do is load an ordered .properties file (included with ActiveObjects) which contains all of the pluralization mappings as regular expressions. (e.g. “(.+)={1}s”) These mappings are then added to the name mappings implemented in AbstractNameConverter, which handles the semantic details of mapping the generated (in CamelCaseNameConverter) singular CamelCase names to their pluralized equivalents. Thus, PluralizedNameConverter doesn’t really do much work at all, the real action is in the abstract superclass as it handles the actual logic defined in englishPluralRules.properties. This is where the real interest lies. However, detailing all of the ins and outs of English pluralization rules would extend this article even farther beyond the breaking point; so I will have to revisit the topic in a future post.

To use the English pluralization for your own projects, simply initialize your EntityManager instance in the following way:

EntityManager em = new EntityManager(jdbcURI, username, password);
em.setNameConverter(new PluralizedNameConverter());
// ...

Now, the Person class will map to the “people” table. Likewise, the BusinessAddress entity will correspond to the “businessAddresses” table. Just like magic! :-)

It’s not a silver bullet, and I’m sure you’ll come across situations where the pluralization will be invalid. This is one of the reasons why manually adding mappings to name converters is possible. My only request is that you send your mappings my way so that I can include them in the properties file, allowing others to benefit from your wisdom.

However, despite all the rules (both manual and default) which can be specified, automatic English pluralization is still a very inaccurate science. Thus, the decision was made not to make pluralization the default name conversion method for ActiveObjects, in contrast to the same decision with ActiveRecord, which is pluralized by default. Hopefully, by not forcing pluralization upon you, I’ve made database design with ActiveObjects a little less stressful than it otherwise could have been. :-)

Conclusion

In conclusion: I write blog posts that are way too long. I hope this taste of automated schema generation and name conversion algorithms was helpful and useful to you in your quest for that ever-elusive, intuitive ORM.

(P.S.)

One of the things I’m currently working on for the upcoming ActiveObjects 0.4 release is the concept of schema migrations. Migrations are a little different than a straight schema generation as they don’t eliminate pre-existing data, but merely convert the existing tables to match the current schema definition. This is a mind-bogglingly useful feature in database refactoring, a common activity early in the application design process. Unfortunately, it’s also rather hard to do correctly. If you’d like to check up on my progress, criticize my coding style, or just play with the latest-and-greatest version, feel free to checkout ActiveObjects from the SVN: svn co https://activeobjects.dev.java.net/svn/activeobjects/trunk/ActiveObjects