Skip to content
Print

JRuby: The Future of Scalable Rails?

14
Jun
2007

So I was talking earlier today with my good friend, Lowell Heddings, regarding certain annoyances we had with web frameworks. The conversation started talking about the difficulties of developing PHP applications due to the lack of a debugger, but (as conversations on web frameworks are wont to do) eventually the migrated to Rails.

I mentioned how I’d always been a bit distrustful of a web framework which ran on the one-request-per-persistent-process model. My reasons for distrusting this sort of framework were mainly related to performance and scalability, but Lowell brought up an interesting point that I hadn’t considered before: process-shared sessions.

See, because Rails runs as a parallel share-nothing process (i.e. the mongrel instances don’t share memory state with one-another), trivial in-memory data can be a bit of a problem. Also, caching to disk can be a bit problematic since there are multiple processes attempting to access the on-disk data simultaneously. I’m sure many clever solutions have been mooted to solve this problem, but (I think) a new one occurred to me as we were discussing the problem. (caveat: I haven’t fleshed out this solution at all with any code. I’m posting it because Lowell thought it was an idea worth sharing) :-)

My solution to the problem drops back into one of the hottest topic in Ruby today: JRuby on Rails. JRuby allows you to run Rails applications in a Java-based and integrated environment, even to the extent of using existing Java tools, libraries and process containers. An ancillary project to JRuby even allows you to package up your Rails application within a WAR and host it directly within a Java application server like Tomcat or Glassfish.

Packaging Rails as a WAR obviously necessitates a bit of configuration that wouldn’t normally go into the deployment of a Rails application. For example, if you want to serve multiple requests with a Rails app concurrently, you would run multiple Mongrel instances and use an Apache mod_proxy configuration which would proxy requests to available server processes. Java web application, while they do run on the persistent-process model, are designed to be multi-threaded, rather than multi-process. Thus, Java web applications automatically scale to involve concurrent requests since all that is required is the spawning of a new thread within the application server.

Rails however, is designed to be hosted as a separate process and would have to be extensively modified to support this kind of scaling directly. The solution found by the JRuby-extras project is to allow multiple Rails instances to be controlled by a single Rails app WAR. The number of instances is controlled by a configuration option within the web.xml config file within the WAR. Thus, the JRuby WAR will spawn a new instance of the JRuby interpreter for each Rails process (as Rails expects), all hosted in separate threads within the same JVM instance, controlled by the Java app server. Thus, instead of going to all the hassle of configuring a new Mongrel instance and adding a mod_proxy rule to scale your Rails app, all that is necessary is to change a value in an XML file and to redeploy.

This single-process encapsulation of Rails in this way allows us to provide a solution for Lowell’s shared data problem. Instead of storing shared data (like application sessions or cached values) within the Rails process itself, the Rails application should use a Java class (hosted within the same WAR) to store the data. Thus, all shared application data should be stored at lower level than Rails, within the Java process itself. Java has some very solid concurrency APIs which would allow this sort of shared state without data corruption.

JRuby-on-Rails Diagram

As the application scaled and the shared data requirements increased, the SharedCache class (as we’ll christen it) could be modified to cluster, using Terracotta. The Rails WAR itself is already transparently clusterable through Java application servers like JBoss. As Lowell put it, it’s like an infinitely scalable memcached, without all the fuss.

Well, it’s a thought anyway…

Comments

  1. Hey Daniel.

    Great post and interesting ideas.

    As you might already know I blogged about clustering JRuby using Terracotta (http://jonasboner.com/2007/02/05/clustering-jruby-with-open-terracotta/), it is still very much in its early stage (just showing that it is possible). But I would be interested in taking it further, to f.e. cluster something like Rails. If you are interested in talking some more about this and/or contribute – please drop me an email (jonas AT [my domain] DOT com).

    Thanks, Jonas.

    Jonas Friday, June 15, 2007 at 1:26 am
  2. @Jonas
    I did read your post a while back actually with great interest. To be perfectly accurate, it was included as part of the conversation with Lowell. :-) I think that clustering the JRuby instance would certainly be a very important part of scaling a Rails app effectively. It just doesn’t solve the shared memory problem so it’s (unfortunately) not the only step.

    As for interest in your work with the JRuby clustering… I’m very interested. Emailing you PM. :-)

    Daniel Spiewak Friday, June 15, 2007 at 1:31 am
  3. You don’t even need to necessarily re-deploy to change the number of available Rails instances. The latest version of Goldspike allows for minIdle as well as maxActive to control the pool size, so you could set maxActive high, and the pool will grow as needed to serve requests.

    Nick Sieger Friday, June 15, 2007 at 8:11 am
  4. @Nick
    Oh now seriously, that’s about as cool as it gets. I didn’t know about that or I would have included it in the article. Way, way, way cool! :-)

    Daniel Spiewak Friday, June 15, 2007 at 9:50 am
  5. Now this is incredibly interesting, because in fact if I am reading the implementation correctly, it’s not required to cluster JRuby at all, but just this class/implemenation providing virtual storage.

    In other words, this implementation can go forward today, without having to worry about the complexities of clustering JRuby (which while cool to the nth degree, is still in the experimental phase, no offense to Jonas).

    Could we start this project as a labs at http://www.terracotta.org/confluence/display/labs/Home?

    If there is any interest, please email me at tgautier – at – terracotta.org.

    Taylor Gautier Friday, June 15, 2007 at 12:38 pm
  6. @Taylor
    Yeah, this probably would allow you to avoid clustering JRuby for a little while, but this only takes care of memory/storage requirements. If you’re starting to run out of processor space for the extra Rails instances, you still probably need to cluster the JRuby instance.

    Daniel Spiewak Friday, June 15, 2007 at 1:16 pm
  7. I wasn’t arguing that you don’t need more Rails instances, that’s exactly what this solution allows – to run each JRuby instance in it’s own interpreter space, roughly equivalent to threads / jvm. When you run out of processing power for this model, you need to add more nodes, and it’s at this point that the clustered version of the SharedCache via Terracotta kicks in (until that point, concurrent access to the SharedCache is mediated by Java’s built in concurrency controls).

    Actually clustering the data / using clustered concepts inside of the JRuby instances is really interesting, but as far as I understand it, it’s not the model that Rails is built on so it’s not necessary to build scale-out for a Rails app.

    That’s what you were saying when you stated that Rails is a parallel shared-nothing model, right?

    Taylor Gautier Friday, June 15, 2007 at 6:45 pm
  8. @Parallel shared nothing: Yeah, pretty much. I just wasn’t able to put it so eloquently. :-)

    Hmm, I hadn’t actually thought too much about the possibilities of separate JRuby instances using the same SharedCache, but I suppose that would really be a useful thing to have. Easier too than clustering the JRuby instance since you’re actually doing it in raw Java. Seems there’s a lot more to this concept than I initially considered. :-)

    Maybe you’re right and this would be worth a full-blown project. When I was writing the article, I had in mind just a singleton class with thread safety built in. The Terracotta thing was mainly because it seemed to be a really neat way to scale the shared data-handler further. But you’re right that this would be far more useful as a more capable solution that was designed to cluster.

    Daniel Spiewak Friday, June 15, 2007 at 7:24 pm
  9. I’m confused again – I could swear that’s what you meant when you said:

    “As the application scaled and the shared data requirements increased, the SharedCache class (as we’ll christen it) could be modified to cluster, using Terracotta”

    Basically, to be clear, what I see is that per app-server you have a single WAR. Via the JRuby-extras project you can run some n Rails instances from that WAR (as depicted in your diagram). But a single app-server can only handle so many Rails instances, so then you need to deploy another app-server with another application WAR that can handle another n Rails instances. In the single app-server case the SharedCache datastructure lived in the process/heap space of the JVM hosting the app-server and (and therefore) the Rails instances, thus data consistency wasn’t a problem, but now across app-servers it is.

    This is where Terracotta kicks in. Right? (And specifically, the class doesn’t have to be modified to be clustered by Terracotta, but that’s beside the point). I jumped on the thread because I agree it is very useful and would be happy to help out getting a prototype done.

    Taylor Gautier Friday, June 15, 2007 at 8:09 pm
  10. Hmm, I think that is what I mean when I said it, I just wasn’t able to go into as much depth as you did.

    The other thing I was thinking was that using this model, you could scale the shared storage beyond the app servers. Huge sites (like the Diggs, Slashdots and Facebooks of the net) have massive farms of memcached servers separate from their actual app servers. So the model I had in mind was you separate out the resources devoted to the clustered SharedCache and make all of that work (and storage) take place on a whole separate set of servers (thanks to the magic of Terracotta). This way, you’re free to cluster and load-balance and really do whatever you want with the app server without fear of interfering with your caching and storage.

    So pretty much, what you said. :-)

    Daniel Spiewak Saturday, June 16, 2007 at 10:56 am
  11. I realize I’m a little bit late to the party, but I thought I’d kick in my 2 cents from having used this approach (storing common resources in Java classes to be used across Rails instances). I have developed a JRuby application (deployed on Tomcat) that utilizes a large map of Java objects initialized from a database. This data structure is too large to keep a separate copy per instance – additionally it must be updated from the database when the underlying records change.

    Using a Java class (with Java thread synchronization) to hold this data in memory and manage updates outside of any particular Rails instance eliminates these two problems and scales well. Thanks for the tip. The only caveat I would add is that you can’t store instances of Ruby objects in the Java class (except simple things like strings or numbers that get converted to Java objects). The references to the objects don’t mean anything between sessions. I got burned on this at first, because I was developing using Webrick, which isn’t multithreaded. When I would deploy to Tomcat things would break because of the invalid object references between rails instances.

    James Norton Tuesday, July 24, 2007 at 2:01 pm
  12. @James

    Very interesting. Did you try storing references to RObject within the session? So, rather than converting the Ruby objects into Java objects, just store the JRuby internal representation? Honestly, I don’t know enough about JRuby internals to know if this avoids the problem or not. :-)

    Daniel Spiewak Tuesday, July 24, 2007 at 2:46 pm
  13. James,

    That sounds really great. Was it hard to setup? Is there any way you can make any part of what you did available to the community? And of course I have to ask, did you try to cluster that Map with Terracotta?

    Taylor Gautier Thursday, August 23, 2007 at 1:44 am

Post a Comment

Comments are automatically formatted. Markup are either stripped or will cause large blocks of text to be eaten, depending on the phase of the moon. Code snippets should be wrapped in <pre>...</pre> tags. Indentation within pre tags will be preserved, and most instances of "<" and ">" will work without a problem.

Please note that first-time commenters are moderated, so don't panic if your comment doesn't appear immediately.

*
*