Skip to content

WTP’s Crazy (and undocumented) Setting Change

12
Oct
2007

I’ve been working recently on a Wicket-based project for a company called Teachscape.  Not only is it based on Wicket, but the project is also designed to be run within Jetty, as opposed to the traditional Tomcat or Glassfish deployment.  This makes local development a lot easier to handle, since you just fire a main-class (which instantiates and starts Jetty) and away you go.  Well, theoretically anyway…

Since it’s a Wicket-based project, all of the HTML files are thrown in with the Java sources and must be on the classpath along with the compiled classes.  The problem is, lately my copy of Eclipse wasn’t doing that properly.  Other Wicket projects on my system (using WTP) still seemed to work fine, but the Jetty-based project just didn’t want to run.  It kept complaining about not being able to find the markup for a specific page.  This is Wicket’s way of telling the developer that they probably forgot a HTML file somewhere.

Now I was able to manually verify that the file did exist in the appropriate directory, named correctly.  It was in an Eclipse source dir which had no exclusion or inclusion rule which would exclude it (either explicitly or implicitly).  In short, it should be on the classpath.  Being savvy to Eclipse’s occasional classpath oddities, I fired a quick clean of the project and tried again.  No luck.  My next step was to do a clean checkout, re-gen the project meta files using Maven, and finally build and run.  Once again, no dice.  This was when I decided to spelunke a bit into the actual build output directory for the project.  When in doubt, check it out by hand, right?

It turns out that none of the HTML files for any of the pages were being copied to the target directory.  I found this more than a little odd, since I knew it was working before and (as I said) none of the source filters should have excluded anything.  So I went in and changed the filters so that only **/*.html files should be included in the build.  The result?  An empty target directory.  Not encouraging.

I tried copying over configurations, editing the .classpath file directly, even looking at the help documentation for JDT, still no luck.  It was the first problem I’ve had with Eclipse in years which I haven’t been able to solve reasonably quickly or work-around.  In short, I was stuck.

Out of sheer desperation, I started browsing through some of the Eclipse JDT and WTP preferences.  I figured that WTP had to have something to do with all of this, since the builder was obviously treating HTML files in a special way and WTP is the only plugin I have installed which might do that.  I came up empty on the WTP front, but when looking through the JDT builder prefs, I found this nugget:

wtp-screwup

What?!  I honestly cannot think of any valid reason why I would want those resources excluded.  WTP doesn’t require it.  After removing these two lines (SVG files probably should be included in the build too), all of my WTP projects still run fine.  In fact, since this preference was obviously added by WTP, it got me thinking: all of this worked fine not two days ago, what happened?

The answer is: I updated WTP.  It seems the latest update of Eclipse WTP adds this preference into your JDT settings whether you want it or not.  Not only that, but there seems to be no warnings, no documentation of any kind which would indicate its purpose or that it even made the change!  I ask you: why was this necessary?

I lost literally hours of time trying to track this down.  Granted, if I was more familiar with the “Output folder” preferences in the JDT compiler settings, maybe I would have figured it out sooner.  But the point is not my less-than-perfect familiarity, the point is that this change seems to require the developer to have advanced knowledge of the Eclipse preference system, just to track down an apparent bug in their own project config.  Bad move, WTP, very bad.

…now that the flames have died down: what is the purpose behind the exclusion of these resources anyway?  If they don’t hurt anything in a WTP project, and they certainly wouldn’t mess with anything in a Java project, why exclude them?  Also, in all fairness I really don’t know for certain that it was WTP that made the change.  I updated the entire Europa train at the same time.  Theoretically, any one of the projects could have made the undesirable modification.  WTP just seemed the most likely since one of the exclusions was *.html, though PDT would be an equally valid guess.

Is a Separate Text Search Engine a Bad Idea?

11
Oct
2007

I was reading this blog entry a few days ago, and it started me thinking about full-text searching.  That wasn’t the main topic of the post, but I think the little side-trek into the field was interesting enough to merit some thought.  Right smack in the middle, Jamie goes on a bit of a rant about the pain of what is effectively two, separate databases (for example, MySQL and Lucene):

A fellow Rails developer asked me in all seriousness why I wasn’t abandoning the full text search functionality of TSearch2 and just using a completely separate, redundant database product designed exclusively for full text search. Seriously, that is considered the “easy” approach: one database for full text search, and another for ACID/OLTP/CRUD. Honestly if I were going to go down that road I would try hard to just abandon the SQL RDMBS and put everything in the other database, since Lucene and its imitators are capable of far more than just find-text-in-document queries. The pain of duplicating everything, using two query languages, two document representations (in addition to the object representation in Ruby) and writing application-tier query correlation makes the double-DB approach seem very unwise.

There is some validity to this thought.  After all, duplication in software usually means you’re doing something wrong – or at least, there could be an easier way.  Even ignoring this precept, it’s just common sense that keeping data synchronized concurrently between two data sources as complex as a relational database and a full-text index is not an easy task.  Granted, some ORMs can handle this task for you (actually, I can only think of Hibernate and ActiveObjects having this feature), but the principle is the same.  And even if everything is neatly and auto-magically synced, there’s always a danger of something getting out of place, and then you’re stuck with a transient stale data issue that’s difficult to track down.

The author of the post mentions that he favors the full-text search capabilities of PostgreSQL, the popular open-source database and competitor to MySQL.  This does have the advantage that you’re putting all the data in one place, handling everything with a single query language (SQL), and reducing the technologies your software depends upon.  This inarguably makes things a whole lot easier.

The main problem as I see it is this is putting a ton of unnecessary strain on the database.  In most modern server-side applications, the bottleneck is in the database (usually caused by too much badly written SQL).  There are whole mountains of documentation which offers suggestions on how to alleviate this problem.  Indexes, database clustering and a carefully chosen ORM can go a long way.  Unfortunately, tacking on full-text indexing seems like a step in the wrong direction.

Lucene is very good at what it does.  It’s indexing and storage performance is second to none.  In fact, it’s so fast that a lot of companies use it as a quick-and-dirty storage dumping ground for raw data, knowing that it will be much faster and more scalable than a relational database.  Why not take advantage of this incredible power and take one more item off of your database’s back?  This is all not to mention the fact that a Lucene index query is probably a lot faster than an SQL query grabbing data from a PostgreSQL full-text index.

So what about the flip side of things?  Why not just put all the data into Lucene (or clone) and eschew relational databases altogether?  Well as I mentioned above, a lot of companies do this for simple things.  Lucene is fantastic at both scalability, and very fast indexing and querying of large blocks of text.  Where it begins to trip up is when you turn it loose on other data types.  Don’t get me wrong, Lucene is an amazing piece of technology.  But just like PostgreSQL isn’t a full-text search engine, Lucene isn’t an RDBMS.  Each component of the infrastructure needs to handle what it’s best at.  In fact, this is really a large aspect of scalability.  Ensuring that every technology is utilized to its fullest potential and no more is crucial to a high-volume application.

Final verdict?  I think I’m sticking with MySQL and Lucene working in tandem, each doing what they do best.  ActiveObjects makes the synchronization almost completely transparent, so it’s not like I’m loading myself down with unnecessary work from a code standpoint.  Seems like a good solution to me; and since most of the industry agrees, it’s probably a safe bet for you too.

Is Windows Really the Best Development OS?

8
Oct
2007

I realize that the stated topic is a classic example of flame-bait, but it’s still a question which deserves some serious consideration.  Is Windows really the best OS for a developer to use?  And when I say developer, I mean someone like me who does a nice assortment of Java, Ruby, C++, etc.  Obviously, someone who does .NET is probably using Windows (see my previous post), and someone writing Objective-C applications will nine-times-out-of-ten use MacOS X.

Java’s a bet less OS specific though.  By it’s very nature, it’s cross-platform.  Not only the language itself (like C++ is cross-platform), but the compiled binaries.  I don’t have to worry about having a Mac handy to compile a snapshot of an application for my boss, I can just send him a JAR I assembled on my laptop running Vista.  In this respect, I can consider myself completely liberated from OS-level concerns.

Tools are also not really an issue.  After all, the three best Java development tools (Eclipse, NetBeans and IntelliJ, in that order) are all based on Java.  I don’t have to worry about learning a new application to do development, or concern myself with rebuilding all my settings in an unfamiliar environment.  I can setup a fresh machine running any OS with Eclipse and my favorite formatting and syntax highlighting configuration in under an hour (download time included).  In fact, I’ve availed myself of this fact many times.

So if tools and language aren’t a concern, what is?  Well, it turns out that tools really are worth examining, and I mean deeper than face-value IDE.  For example, MacOS X supports the fantastic editor TextMate.  jEdit is a worthy substitute which runs on any platform, but it’s just not as polished as TextMate.  Also worthy of consideration, and indeed a far larger issue for me, is the non-existence of a decent shell on Windows.  PowerShell isn’t bad, but I’m sorry, it’s no bash.  I had to make some (reasonably) generic changes to a set of config files on my server the other day.  Since I was on Linux, all it took was a simple for loop, a couple greps piped to sed and then back into the files, and I was home free.  If I was on Windows, it would have taken me at least ten minutes of tedious open-file, copy/paste/reshuffle, save, close, open next file, etc.  In short, Linux saved me quite a bit of time out of my day, just by having a superior shell.  Mac offers the same advantage, though its version of the GNU utils isn’t as up-to-date as my Gentoo Linux server.

Another thing to consider is space-efficiency.  I have a reasonably high-resolution screen, but even with such a dazzlingly large workspace, I hate to waste even a single 1px line.  This includes things like fonts and how legibly they render at low sizes.  I save literally inches of space in Eclipse by setting the editor font size to 10pt, and that’s just a single application.  Scaling the issue up to an entire WS just compounds both the benefits and the consequences.  In both of these areas, Windows (especially Vista) seems to excel beyond the competition.  I hate to say it, but I think Windows got something right on which Mac and Gnome (Linux) are missing the mark.  Just consider the following screenshots:

vista

gnome

Notice how even though the Vista window title bars are a bit larger, the fonts are a shade smaller.  In Gnome, I have to leave the font-size that high, because the window titles become unreadable and ugly at any lower level.  Also notice how the menu height on Gnome is significantly larger than Vista.  Even more importantly, the tab size within Eclipse is almost a third larger than the corresponding tab in Windows (due to the larger font size).  The toolbar is taller, and the fonts are just a shade larger for the same size and DPI (Consolas vs Monospace).  All in all, there are almost 20-30 pixels of height wasted in Gnome vs Windows.  Granted, Mac doesn’t waste quite as much space, but in my experience, most things are just a shade less compact than on Windows.

The larger issue I have with Mac is extreme mouse-oriented nature.  This makes it great for beginners, but even with Quicksilver, I still find myself reaching for the mouse more often than I’d like.  Gnome isn’t bad for this, having most of the same keyboard features as Windows (especially with Deskbar), but it’s just not as slick as Vista with the QuickSearch.  And yes, I do know the shortcuts for both Mac and Linux as well as I do for Windows, as I’ve been using both platforms for years.  In fact, my first computer was a Mac, and that was all I used for a long time.

So all in all, the question is: how do the pros and cons match up?  I would love to have the real shell, real gcc and real permissions system that Linux offers, but I would hate to give up the font renderer and slick power-user features of Windows.  I could go with a Mac, but the keyboard is important.  And on top of that, so many cross-platform applications just aren’t up to snuff on Mac, failing to do things “the Mac way”.

So with great disappointment, I’m afraid I’ll have to stick with Windows for the time being.  However, as my friend Lowell points out, I can just as easily build myself a second machine which runs Linux primarily.  This way, I can get all of the benefits of Linux as a primary machine, but still retaining the power and application support of Windows.  Maybe this is the best way to go.  Hopefully this will work out as the best balance overall.

MonoDevelop: The .NET Developer’s Linux Outlet

4
Oct
2007

I’ve done my fair share of .NET development.  I’ve never actually enjoyed it, nor would I want to make a living out of it, but I have done some.  Every time, I’ve been forced to work on Windows to do any serious project.  Granted, jEdit can get you awfully far in terms of source editing, but unfortunately (?) it’s no IDE.  Really, the only way to do serious .NET development is to use VisualStudio.

Now, for a number of reasons (none of which are important right now), I’m already using Windows as my primary OS.  However, I don’t like being boxed into one OS or another.  I try to keep my options open.  If I ever could cut those final ties to Windows, I’d love to switch to Linux or Mac.  Also, I just don’t like feeling forced to do something in a certain way.  With Java, I can write the code in Eclipse, NetBeans, jEdit or Notepad for all my employer cares, just as long as it gets done.  With .NET, I really don’t have any choice but to use Windows.

Well, until nowish.  MonoDevelop recently announced the release of 1.0 beta 1.  From what I’ve read, things are still comparatively unstable, but the features are all there and bug fixing is proceeding apace.  Also, for the first time it seems that they’re offering some binary packages, allowing users to install easily rather than wrestling with the sources for hours and hours (which is what happened to me last time I tried MonoDevelop).

monodevelop Actually, the bigger news for me is the addition of all of the “serious coding” features.  Things like content assist, searching, error-underlining, etc.  These are huge when working on a non-trivial project.  In fact, these are precisely the reason I tied myself to Windows and VisualStudio for .NET development rather than just using jEdit or VIM on Linux.  Last time I tried MonoDevelop (back in like, 0.2), it really wasn’t more than a glorified text editor with syntax highlighting.  Now, it’s a full-fledged IDE.

As far as I’m concerned, MonoDevelop has reached the point where it can be considered as a serious VisualStudio on Linux.  In fact, from what I’ve seen it’s at a level where .NET developers need no longer consider themselves tied to Windows just for the tools.

Of course, the big problem is MonoDevelop is a tool to write code that runs in Mono (hence the name), not really .NET.  Technically, the two platforms are very very close, but .NET has some libraries and provides certain functionality that Mono just doesn’t emulate yet (things like the win32 API).  Also, Mono is a black-box port, so there are bound to be some inconsistencies in behavior here and there.  As a result, you can probably write your .NET application on Linux using Mono, but you had better test it running on Windows and the CLR.  Otherwise you can never really be sure that your app is doing what you want it to on Windows.

But on the whole, I think this is great news!  MonoDevelop gives .NET developers a nice (and free) alternative to VisualStudio, not to mention the benefit of unfettering these developers from the Windows platform.  Just one more way to thumb your nose at the boys in Redmond and support FOSS.

Update: The How-To Geek has some excellent instructions on how to install the latest version of MonoDevelop on Ubuntu.

ActiveObjects: Indexing vs Searching

1
Oct
2007

So in the intervening time since I last updated you on ActiveObjects, I’ve been busy refactoring and repurposing some of the core.  I’ve added a full JUnit test suite, which definitely helps my own confidence about the stability of the source.  Also, there’s a whole bunch of new features that have come down the pipe which hopefully I’ll get to address in the next few posts.  So, without further ado…

The change which is probably going to cause you the most grief is the switch from @Index to @Searchable, and the addition of the @Indexed annotation.  Yes, you really did read that correctly; and no @Indexed isn’t even related to the functionality provided by the old @Index annotation.

The old @Index annotation used to handle tagging of entity methods to mark them as to be added to the Lucene full-text search index (see this post for more details).  This was a little confusing for a number of reasons, not the least of which my failure to remember the tense of the annotation name (see the comments on the indexing post).  By convention, most Java annotations are declarative in name.  Thus, the name should not be the present tense “index” but the past tense “indexed”.  So my first thought was to just refactor the annotation, but then I came into a slightly hairier name-clash.

Database Field Indexing

One of the most common techniques for optimizing your database’s read (SELECT) performance is to create indexes on certain fields.  When a field is indexed, the database will maintain some separate hash tables to enable very fast selection of rows based on the field in question.  This is a really good thing for almost all foreign keys, for example:

SELECT ID FROM people WHERE companyID = ?

Here we’re SELECTing the id field from the people table where companyID matches a certain value.  The database can execute this query fairly quickly.  In fact, the only bottle-neck is finding all of the rows which match the specified companyID value.  In a table containing hundreds of thousands of rows, one can see how this could be a problem.

The problem goes away (sort of) with the use of field indexing.  Instead of having to linearly search through the table for rows matching the companyID, the database can perform a quick hash lookup in an index and get a set of rowid(s) based on the companyID.  Simple, efficient, and incredibly scalable.  Practically, the DBMS wouldn’t take any longer to execute such a query against a table of 100,000,000 rows than it does to execute the same query against a table of 100 rows.  So, this is a really good thing right?

Well, indexes have their drawbacks.  I won’t go into all of the reasons not to use indexes, the following two points will likely suffice:

  • Indexing slows down UPDATEs and INSERTs
  • Indexing adds to your table’s storage requirements (all those hashes have to be put somewhere)

So perfect database performance isn’t attainable just by indexing every field, one has to be quite judicious about it.  In fact, choosing the fields to be indexed is really as much of an art as it is a science.

This long and tangled introduction actually did have a point…  ActiveObjects didn’t have any support for field indexing.  I hadn’t really considered the possibility, so I didn’t factor it into my design.  In retrospect, this was probably a bad idea.  So making up for lost time, I’ve now introduced field indexing into the library!

public interface Person extends Entity {
 
    @Indexed
    public String getName();
    @Indexed
    public void setName(String name);
 
    public int getAge();
    public void setAge(int age);
 
    public Company getCompany();
    public void setCompany(Company company);
}

When ActiveObjects generates the migration DDL for this table (running against MySQL), it will look something like this:

CREATE TABLE people (
    ID INTEGER NOT NULL AUTO_INCREMENT,
    NAME VARCHAR(255),
    age INTEGER,
    companyID INTEGER,
    CONSTRAINT fk_people_companyID FOREIGN KEY (companyID) REFERENCES companies(ID),
    PRIMARY KEY(ID)
);
CREATE INDEX NAME ON people(NAME);
CREATE INDEX companyID ON people(companyID);

This is where the aforementioned @Indexed annotation comes in.  As you can see from the resultant DDL, adding a field index is as simple as tagging the corresponding method.  Also, foreign keys are automatically indexed, to ensure maximum performance in SELECTion of relations.

So that’s the good news, the bad news is that index creation doesn’t play too nicely with migrations.  Everything works of course, and if you’re actually adding a table or field, the corresponding index(es) will also be created.  Likewise if you drop a table or field, the corresponding index(es) will also be dropped.  However, that’s about the limit of the migrations support for indexes at the moment.  JDBC has an unfortunately limitation which prevents developers from getting a list of indexes within a database.  Since I have no way (within JDBC) of finding pre-existing indexes, they must be excluded from the schema diff and thus are CREATEd or DROPped irregardless of the existing schema.  I do have a plan to fix this (along with migrations on Oracle), but I seriously doubt that it’ll be included in the 1.0 release, owing to the changes required.

Refactoring End Result

The final result of the refactoring and the adding of field indexing is that the annotation formerly named @Index is now the @Searchable annotation.  Likewise (and keeping with convention), the formerly IndexingEntityManager is now named SearchableEntityManager.  Both of these types remain in the net.java.ao package.  To allow for field indexing, the @Indexed annotation was added to the net.java.ao.schema package, owing to the fact that it only effects schema generation and doesn’t change runtime behavior in the framework at all.

Hopefully these changes won’t be too confusing (now that you’re aware of them) and will be a welcome addition to the ActiveObjects functionality.  As always, I welcome comments, suggestions and criticisms!