Skip to content
Print

An Easier Java ORM: Indexing

6
Aug
2007

In continuing with my series on ActiveObjects, this post delves into the eternal mysteries of search indexing and Lucene integration. Most modern web applications not only store data in a database, but also in an index of some kind to allow fast and efficient searching. Java’s Lucene framework provides an excellent mechanism for this functionality, however it can be somewhat cryptic and hard to use. To ease this pain, ActiveObjects provides auto-magical Lucene integration for specified fields, making it trivial to index and search for entities.

Unless there is great public outcry, I intend this to be the last of my “Easier Java ORM” series (with the exception a roundup post for linking purposes). As fun as it is being self-promoting and pushing my favorite open source project, I feel a slight twinge of guilt every time I flood your feed agregator with more information on a library in which you may or may not have interest. I’ll probably still post about ActiveObjects from time to time, but only on occasions when there is something of special note.

Indexing

Of course, we can’t even begin to talk about searching for entities unless there is some data from the entity added to the index. The actual creation and maintenance of the index is usually considered the hardest part of working with Lucene. In ActiveObjects, it requires two separate steps.

Firstly, you must decide which fields and which entities you wish to index. Let’s say that we have a simple blog schema as follows:

public interface UserModifiedEntity extends SaveableEntity {
    @Default("CURRENT_TIMESTAMP")
    public Calendar getDate();
    @Default("CURRENT_TIMESTAMP")
    public void setDate(Calendar calendar);
 
    @Default("false")
    public boolean isDeleted();
    @Default("false")
    public void setDeleted(boolean deleted);
}
 
public interface Post extends UserModifiedEntity {
    public String getTitle();
    public void setTitle(String title);
 
    @SQLType(Types.CLOB)
    public String getText();
    @SQLType(Types.CLOB)
    public void setText(String text);
 
    @OneToMany
    public Comment[] getComments();
}
 
public interface Comment extends UserModifiedEntity {
    public Post getPost();
    public void setPost(Post post);
 
    public String getCommenter();
    public void setCommenter(String name);
 
    @SQLType(Types.CLOB)
    public String getText();
    @SQLType(Types.CLOB)
    public void setText(String text);
}

In this schema, we have both Post and Comment entities. Both entity types extend UserModifiedEntity, which contains some fields which will be common to both resulting tables. Both Comment and Post also have “text” fields, containing the actual meat of each entity’s value.

Now, for our blog’s search engine, we’re going to want to do something a bit more precise than search for all values contained in any entities. Actually, at this point, ActiveObjects wouldn’t index any values whatsoever. We need to tag the fields we want to add to the index with the @Indexed annotation. Let’s assume that we don’t need to search on comments at all, just posts. The modified Post entity might look something like this:

public interface Post extends UserModifiedEntity {
    @Index
    public String getTitle();
    @Index
    public void setTitle(String title);
 
    @Index
    @SQLType(Types.CLOB)
    public String getText();
 
    @Index
    @SQLType(Types.CLOB)
    public void setText(String text);
 
    @OneToMany
    public Comment[] getComments();
}

That takes care of step one in the indexing procedure. ActiveObjects now has everything it needs to know relating to what it should index. Now we need to inform it to actually perform the indexing, and where to store the result. This is all handled using a special EntityManager subclass: IndexingEntityManager.

// ...
IndexingEntityManager manager = new IndexingEntityManager(jdbcURI, username, password, 
        FSDirectory.getDirectory("~/lucene_index"));
 
Post post = manager.create(Post.class);
post.setTitle("My Cool Post");
post.setText("Here's some test text that I'll use to test the search indexing.  "
        + "It's really amazing what you can do with so little code...");
post.save();

As you can see, we’re using an instance of IndexingEntityManager to access and create all of our entity instances (all one of them). This is all that is necessary to cause ActiveObjects to handle the indexing for these entities.

Oh, FSDirectory is actually a Lucene class (sub-classing Directory) which is used to tell the Lucene backend where to store the index. Since we’re actually using the Lucene Directory abstraction classes, the index could just as easily be stored in memory, or even in another database.

Searching

Obviously, an index isn’t all that useful if you can’t do anything with it. Since our goal from the start was to provide search capabilities to our rather limited blog, we need to have a way of accessing the Lucene indexing and performing a search. Again, ActiveObjects makes this incredibly easy:

// ...code from above
Post[] results = manager.search(Post.class, "test search terms");
 
System.out.println("Search results:");
for (Post post : results) {
    System.out.println("   " + post.getTitle());
}

The search method delegates its call down to the Lucene engine, which parses the search terms and runs through the index searching for any key-value sets (or Document(s), as Lucene refers to them) which match in the “title” or “text” fields. By default, ActiveObjects runs the search against all index fields in the specified entity type. Since this is usually the behavior people want when using Lucene, it is a sane default.

If the mindless defaults aren’t good enough for your application, you are quite free to use the Lucene index directly. IndexingEntityManager provides accessors for the Directory containing the index, as well as the Analyzer in use. (getIndexDir() and getAnalyzer()) Of course, you can also extend IndexingEntityManager and provide your own search() implementation.

Removing from the Index

Almost as important as adding entities to an index is removing them. We don’t want our searches to pull back deleted posts. IndexingEntityManager can handle this task for us automatically, to a point. The problem is that in our case, we’re not actually deleting the posts as such. We’re simply setting a flag in the row which indicates the post is deleted. We’re supplying all of the logic (theoretically) to ignore deleted posts and comments.

If we were using the EntityManager#delete(Entity…) method, we would be DELETEing the rows properly and then IndexingEntityManager could automatically remove the relevant Document(s) from the index. However, since we’re not doing this, we need a bit more logic. For simplicity’s sake, we’re going to put this logic into a defined implementation for the UserEditableEntity interface:

@Implementation(UserEditableEntityImpl.class)
public interface UserEditableEntity extends SaveableEntity {
    // ...
}
 
public class UserEditableEntityImpl {
    private UserEditableEntity entity;
 
    public UserEditableEntityImpl(UserEditableEntity entity) {
        this.entity = entity;
    }
 
    public void setDeleted(boolean deleted) {
        if (deleted && !entity.isDeleted()) {
            // deleting the entity, remove it from index
            ((IndexingEntityManager) entity.getEntityManager()).removeFromIndex(entity);
        } else if (!deleted && entity.isDeleted()) {
            // we're un-deleting the entity here
            ((IndexingEntityManager) entity.getEntityManager()).addToIndex(entity);
        }
 
         entity.setDeleted(deleted);
    }
}

Now, whenever we call setDeleted(boolean) on a Post or Comment instance, it will be removed from the index (if we’re deleting the entity), or re-added to the index (if we’re un-deleting it). In the case of Comment, it has no @Indexed methods, so IndexingEntityManager will more or less ignore the call to addToIndex(Entity) (it actually will iterate through all of the methods to find any @Indexed).

Related Content

Many sites have need of a “related content” algorithm. This is most often seen in blogs which show a list of “related posts”. Since ActiveObjects auto-magically handles indexing and searching, it only makes sense that it provide some mechanism for accessing related entities based on their indexed values. This is handled using the RelatedEntity super-interface.

Let’s assume that we want to be able to find related posts to a given Post instance. The only thing we need to do is make sure that the Post interface also extends RelatedEntity:

public interface Post extends UserEditableEntity, RelatedEntity<Post> {
    // ...
}

Now we can call:

Post post = // ...
Post[] related = post.getRelated();
 
System.out.println("Posts related to " + post.getTitle() + ":");
for (Post relate : related) {
    System.out.println("   " + relate.getTitle());
}

Alright, caveat time… First off, this does depend on the Lucene Queries contrib library, specifically the MoreLikeThis class. Secondly, I’m not entirely sure that this is working right. :-) I’ve yet to actually get it to return any related values whatsoever in my test bed. This could be due to the way I’m indexing, or possibly the way I’m using MoreLikeThis; I’m not sure. If it works for you, let me know! Also, if you have any experience with the MoreLikeThis functionality, I’d appreciate any pointers you may have.

Well, that about sums it up for indexing in ActiveObjects. Hopefully, this simplifies your data backend code still some more and eases your pain in dealing with Lucene.

Comments

  1. oh, please, don’t stop posting about ActiveObject, that is even better than a “getting started”

    noname Tuesday, August 7, 2007 at 12:05 pm
  2. Hey you shouldnt stop posting about lucene, i am a big fan of it and i like reading new stuff about it. please post something else – i will bookmark you in the mean time and wait for it

    mk Monday, August 20, 2007 at 12:09 pm
  3. @mk

    Actually, this is the first time I’ve posted anything about Lucene. I haven’t used it much, so I’m not terribly well versed in it. It’s interesting though, so maybe there’ll be something down the road…

    daniel Monday, August 20, 2007 at 1:02 pm
  4. Is the annotation “Indexed” correct? There is the source ‘Index.hava’.

    mari Monday, September 3, 2007 at 6:02 am
  5. @mari

    Ah, it seems you’re right. Weird mistake to make… I’m correcting the article. Thanks!

    daniel Monday, September 3, 2007 at 12:25 pm
  6. Are there any way to build conventional database indexes using AO?

    Francis Tuesday, September 18, 2007 at 11:47 pm
  7. There is now. :-) I’ve refactored things a bit, and now the full-text search annotation is @Searchable, while the database-index annotation is @Indexed. (and yes, it is present-perfect tense this time)

    BTW, sry it took so long for your comment to appear. For some reason wordpress didn’t properly notify me that it was pending.

    daniel Thursday, September 27, 2007 at 7:29 am
  8. Great work! I was slightly confused when I firstly found the @Index is for full text search. @Searchable and @Indexed is definitely a good move.

    Don’t worry, my WordPress work the same way so I can understand. ;-)

    Francis Monday, October 8, 2007 at 11:24 pm
  9. Thanks for this useful post. I am a big fan of ActiveObjects. This piece of library helps me a lot. Very simple and straightforward.

    James Alvarez Saturday, February 20, 2010 at 3:41 pm

Post a Comment

Comments are automatically formatted. Markup are either stripped or will cause large blocks of text to be eaten, depending on the phase of the moon. Code snippets should be wrapped in <pre>...</pre> tags. Indentation within pre tags will be preserved, and most instances of "<" and ">" will work without a problem.

Please note that first-time commenters are moderated, so don't panic if your comment doesn't appear immediately.

*
*