Skip to content

ActiveObjects: Indexing vs Searching


So in the intervening time since I last updated you on ActiveObjects, I’ve been busy refactoring and repurposing some of the core.  I’ve added a full JUnit test suite, which definitely helps my own confidence about the stability of the source.  Also, there’s a whole bunch of new features that have come down the pipe which hopefully I’ll get to address in the next few posts.  So, without further ado…

The change which is probably going to cause you the most grief is the switch from @Index to @Searchable, and the addition of the @Indexed annotation.  Yes, you really did read that correctly; and no @Indexed isn’t even related to the functionality provided by the old @Index annotation.

The old @Index annotation used to handle tagging of entity methods to mark them as to be added to the Lucene full-text search index (see this post for more details).  This was a little confusing for a number of reasons, not the least of which my failure to remember the tense of the annotation name (see the comments on the indexing post).  By convention, most Java annotations are declarative in name.  Thus, the name should not be the present tense “index” but the past tense “indexed”.  So my first thought was to just refactor the annotation, but then I came into a slightly hairier name-clash.

Database Field Indexing

One of the most common techniques for optimizing your database’s read (SELECT) performance is to create indexes on certain fields.  When a field is indexed, the database will maintain some separate hash tables to enable very fast selection of rows based on the field in question.  This is a really good thing for almost all foreign keys, for example:

SELECT ID FROM people WHERE companyID = ?

Here we’re SELECTing the id field from the people table where companyID matches a certain value.  The database can execute this query fairly quickly.  In fact, the only bottle-neck is finding all of the rows which match the specified companyID value.  In a table containing hundreds of thousands of rows, one can see how this could be a problem.

The problem goes away (sort of) with the use of field indexing.  Instead of having to linearly search through the table for rows matching the companyID, the database can perform a quick hash lookup in an index and get a set of rowid(s) based on the companyID.  Simple, efficient, and incredibly scalable.  Practically, the DBMS wouldn’t take any longer to execute such a query against a table of 100,000,000 rows than it does to execute the same query against a table of 100 rows.  So, this is a really good thing right?

Well, indexes have their drawbacks.  I won’t go into all of the reasons not to use indexes, the following two points will likely suffice:

  • Indexing slows down UPDATEs and INSERTs
  • Indexing adds to your table’s storage requirements (all those hashes have to be put somewhere)

So perfect database performance isn’t attainable just by indexing every field, one has to be quite judicious about it.  In fact, choosing the fields to be indexed is really as much of an art as it is a science.

This long and tangled introduction actually did have a point…  ActiveObjects didn’t have any support for field indexing.  I hadn’t really considered the possibility, so I didn’t factor it into my design.  In retrospect, this was probably a bad idea.  So making up for lost time, I’ve now introduced field indexing into the library!

public interface Person extends Entity {
    public String getName();
    public void setName(String name);
    public int getAge();
    public void setAge(int age);
    public Company getCompany();
    public void setCompany(Company company);

When ActiveObjects generates the migration DDL for this table (running against MySQL), it will look something like this:

    NAME VARCHAR(255),
    age INTEGER,
    companyID INTEGER,
    CONSTRAINT fk_people_companyID FOREIGN KEY (companyID) REFERENCES companies(ID),
CREATE INDEX companyID ON people(companyID);

This is where the aforementioned @Indexed annotation comes in.  As you can see from the resultant DDL, adding a field index is as simple as tagging the corresponding method.  Also, foreign keys are automatically indexed, to ensure maximum performance in SELECTion of relations.

So that’s the good news, the bad news is that index creation doesn’t play too nicely with migrations.  Everything works of course, and if you’re actually adding a table or field, the corresponding index(es) will also be created.  Likewise if you drop a table or field, the corresponding index(es) will also be dropped.  However, that’s about the limit of the migrations support for indexes at the moment.  JDBC has an unfortunately limitation which prevents developers from getting a list of indexes within a database.  Since I have no way (within JDBC) of finding pre-existing indexes, they must be excluded from the schema diff and thus are CREATEd or DROPped irregardless of the existing schema.  I do have a plan to fix this (along with migrations on Oracle), but I seriously doubt that it’ll be included in the 1.0 release, owing to the changes required.

Refactoring End Result

The final result of the refactoring and the adding of field indexing is that the annotation formerly named @Index is now the @Searchable annotation.  Likewise (and keeping with convention), the formerly IndexingEntityManager is now named SearchableEntityManager.  Both of these types remain in the package.  To allow for field indexing, the @Indexed annotation was added to the package, owing to the fact that it only effects schema generation and doesn’t change runtime behavior in the framework at all.

Hopefully these changes won’t be too confusing (now that you’re aware of them) and will be a welcome addition to the ActiveObjects functionality.  As always, I welcome comments, suggestions and criticisms!


  1. Is it a mistake that the @Indexed was declared for the setName method or is it required to put the annotation to both the getter and the setter of the field to be indexed?

    johan Monday, October 29, 2007 at 11:07 am

Post a Comment

Comments are automatically formatted. Markup are either stripped or will cause large blocks of text to be eaten, depending on the phase of the moon. Code snippets should be wrapped in <pre>...</pre> tags. Indentation within pre tags will be preserved, and most instances of "<" and ">" will work without a problem.

Please note that first-time commenters are moderated, so don't panic if your comment doesn't appear immediately.