<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Code Commit &#187; Java</title>
	<atom:link href="http://www.codecommit.com/blog/category/java/feed" rel="self" type="application/rss+xml" />
	<link>http://www.codecommit.com/blog</link>
	<description>(permanently in beta)</description>
	<lastBuildDate>Mon, 07 Jun 2010 07:00:00 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<atom:link rel='hub' href='http://www.codecommit.com/blog/?pushpress=hub'/>
		<item>
		<title>Understanding and Applying Operational Transformation</title>
		<link>http://www.codecommit.com/blog/java/understanding-and-applying-operational-transformation</link>
		<comments>http://www.codecommit.com/blog/java/understanding-and-applying-operational-transformation#comments</comments>
		<pubDate>Mon, 17 May 2010 07:00:00 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/?p=302</guid>
		<description><![CDATA[Almost exactly a year ago, Google made one of the most remarkable press releases in the Web 2.0 era.  Of course, by &#8220;press release&#8221;, I actually mean keynote at their own conference, and by &#8220;remarkable&#8221; I mean potentially-transformative and groundbreaking.  I am referring of course to the announcement of Google Wave, a real-time [...]]]></description>
			<content:encoded><![CDATA[<p>Almost exactly a year ago, Google made one of the most remarkable press releases in the Web 2.0 era.  Of course, by &#8220;press release&#8221;, I actually mean keynote at their own conference, and by &#8220;remarkable&#8221; I mean potentially-transformative and groundbreaking.  I am referring of course to the announcement of <a href="http://wave.google.com">Google Wave</a>, a real-time collaboration tool which has been in open beta for the last several months.</p>
<p>For those of you who don&#8217;t know, Google Wave is a collaboration tool based on real-time, simultaneous editing of documents via a mechanism known as &#8220;operational transformation&#8221;.  Entities which appear as messages in the Wave client are actually &#8220;waves&#8221;.  Within each &#8220;wave&#8221; is a set of &#8220;wavelets&#8221;, each of which contains a set of documents.  Individual documents can represent things like messages, conversation structure (which reply goes where, etc), spell check metadata and so on.  Documents are composed of well-formed XML with an implicit root node.  Additionally, they carry special metadata known as &#8220;annotations&#8221; which are (potentially-overlapping) key/value ranges which span across specific regions of the document.  In the Wave message schema, annotations are used to represent things like bold/italic/underline/strikethrough formatting, links, caret position, the conversation title and a host of other things.  An example document following the Wave message schema might look something like this:</p>
<pre>
&lt;body&gt;
  &lt;line/&gt;<span style="border-bottom-style: solid; border-bottom-width: 1px; border-bottom-color: green;">Test message</span>
  &lt;line/&gt;
  <span style="border-bottom-style: solid; border-bottom-width: 1px; border-bottom-color: red;">&lt;line/&gt;Lorem <span style="padding-bottom: 1px; border-bottom-style: solid; border-bottom-width: 1px; border-bottom-color: blue;">ipsum</span></span><span style="padding-bottom: 1px; border-bottom-style: solid; border-bottom-width: 1px; border-bottom-color: blue;"> dolor</span> sit amet.
&lt;/body&gt;
</pre>
<p>(assuming the following annotations):</p>
<ul>
<li style="color: green;">style/font-weight -> bold</li>
<li style="color: red;"">style/font-style -> italic</li>
<li style="color: blue;">link/manual -> http://www.google.com</li>
</ul>
<p>You will notice that the annotations for <code>style/font-style</code> and <code>link/manual</code> actually overlap.  This is perfectly acceptable in Wave&#8217;s document schema.  The resulting rendering would be something like this:</p>
<div style="padding-left: 2em;">
<p><strong>Test message</strong></p>
</p>
<p><em>Lorem </em><a rel="nofollow" href="http://www.google.com"><em>ipsum</em> dolor</a> sit amet.</p>
</div>
<p>The point of all this explaining is to give you at least a passing familiarity with the Wave document schema so that I can safely use its terminology in the article to come.  See, Wave itself is not nearly so interesting as the idea upon which it is based.  As mentioned, every document in Wave is actually just raw XML with some ancillary annotations.  As far as the Wave server is concerned, you can stuff whatever data you want in there, just so long as it&#8217;s well-formed.  It just so happens that Google chose to implement a communications tool on top of this data backend, but they could have just as easily implemented something more esoteric, like a database or a windowing manager.</p>
<p>The key to Wave is the mechanism by which we interact with these documents: <a href="http://en.wikipedia.org/wiki/Operational_transformation">operational transformation</a>.  Wave actually doesn&#8217;t allow you to get access to a document as raw XML or anything even approaching it.  Instead, it demands that all of your access to the document be performed in terms of operations.  This has two consequences: first, it allows for some really incredible collaborative tools like the Wave client; second, it makes it <em>really</em> tricky to implement any sort of Wave-compatible service.  Given the fact that I&#8217;ve been working on Novell Pulse (which is exactly this sort of service), and in light of the fact that Google&#8217;s documentation on the subject is sparing at best, I thought I would take some time to clarify this critical piece of the puzzle.  Hopefully, the information I&#8217;m about to present will make it easier for others attempting to interoperate with Wave, Pulse and the (hopefully) many OT-based systems yet to come.</p>
<h3>Operations</h3>
<p>Intuitively enough, the fundamental building block of operational transforms are operations themselves.  An operation is exactly what it sounds like: an action which is to be performed on a document.  This action could be inserting or deleting characters, opening (and closing!) an XML element, fiddling with annotations, etc.  A single operation may actually perform many of these actions.  Thus, an operation is actually made up of a sequence of operation <em>components</em>, each of which performs a particular action with respect to the <em>cursor</em> (not to be confused with the <em>caret</em>, which is specific to the client editor and not at all interesting at the level of OT).</p>
<p>There are a number of possible component types.  For example:</p>
<ul>
<li>insertCharacters &mdash; Inserts the specified string at the current index</li>
<li>deleteCharacters &mdash; Deletes the specified string from the current index</li>
<li>openElement &mdash; Creates a new XML open-tag at the current index</li>
<li>deleteOpenElement &mdash; Deletes the specified XML open-tag from the current index</li>
<li>closeElement &mdash; Closes the first currently-open tag at the current index</li>
<li>deleteCloseElement &mdash; Deletes the XML close-tag at the current index</li>
<li>annotationBoundary &mdash; Defines the <em>changes</em> to any annotations (starting or ending) at the current index</li>
<li>retain &mdash; Advances the index a specified number of items</li>
</ul>
<p>Wave&#8217;s OT implementation actually has even more component types, but these are the important ones. You&#8217;ll notice that every component has something to do with the cursor index.  This concept is central to Wave&#8217;s OT implementation.  Operations are effectively a stream of components, each of which defines an action to be performed which effects the content, the cursor or both.  For example, we can encode the example document from earlier as follows:</p>
<ol>
<li>openElement(<code>'body'</code>)</li>
<li>openElement(<code>'line'</code>)</li>
<li>closeElement()</li>
<li>annotationBoundary(<code>startKeys: ['style/font-weight']</code>, <code>startValues: ['bold']</code>)</li>
<li>insertCharacters(<code>'Test message'</code>)</li>
<li>annotationBoundary(<code>endKeys: ['style/font-weight']</code>)</li>
<li>openElement(<code>'line'</code>)</li>
<li>closeElement()</li>
<li>annotationBoundary(<code>startKeys: ['style/font-style']</code>, <code>startValues: ['italic']</code>)</li>
<li>openElement(<code>'line'</code>)</li>
<li>closeElement()</li>
<li>insertCharacters(<code>'Lorem '</code>)</li>
<li>annotationBoundary(<code>startKeys: ['link/manual']</code>, <code>startValues: ['http://www.google.com']</code>)</li>
<li>insertCharacters(<code>'ipsum'</code>)</li>
<li>annotationBoundary(<code>endKeys: ['style/font-style']</code>)</li>
<li>insertCharacters(<code>' dolor'</code>)</li>
<li>annotationBoundary(<code>endKeys: ['link/manual']</code>)</li>
<li>insertCharacters(<code>' sit amet.'</code>)</li>
<li>closeElement()</li>
</ol>
<p>Obviously, this isn&#8217;t the most streamlined way of referring to a document&#8217;s content for a human, but a stream of discrete components like this is <em>perfect</em> for automated processing.  The real utility of this encoding though doesn&#8217;t become apparent until we look at operations which only encode a partial document; effectively performing a particular mutation.  For example, let&#8217;s follow the advice of <em>Strunk and White</em> and capitalize the letter &#8216;m&#8217; in our title of &#8216;Test message&#8217;.  What we want to do (precisely-speaking) is delete the &#8216;m&#8217; and insert the string &#8216;M&#8217; at its previous location.  We can do that with the following operation:</p>
<ol>
<li>retain(<code>8</code>)</li>
<li>deleteCharacters(<code>'m'</code>)</li>
<li>insertCharacters(<code>'M'</code>)</li>
<li>retain(<code>38</code>)</li>
</ol>
<p>Instead of adding content to the document at ever step, most of this operation actually leaves the underlying document untouched.  In practice, retain() tends to be the most commonly used component by a wide margin.  The trick is that every operation <em>must</em> span the full width of the document.  When evaluating this operation, the cursor will start at index 0 and walk forward through the existing document and the incoming operation one item at a time.  Each XML tag (open or close) counts as a single item.  Characters are also single items.  Thus, the entire document contains 47 items.</p>
<p>Our operation above cursors harmlessly over the first eight items (the <code>&lt;body&gt;</code> tag, the <code>&lt;line/&gt;</code> tag and the string <code>'Test '</code>).  Once it reaches the <code>'m'</code> in <code>'message'</code>, we stop the cursor and perform a mutation.  Specifically, we&#8217;re using the deleteCharacters() component to remove the <code>'m'</code>.  This component doesn&#8217;t move the cursor, so we&#8217;re still sitting at index <code>8</code>.  We then use the insertCharacters() component to add the character <code>'M'</code> at precisely our currently location.  This time, some new characters have been inserted, so the cursor advances to the end of the newly-inserted string (meaning that we are now at index <code>9</code>).  This is intuitive because we don&#8217;t want to have to retain() over the text we just inserted.  We do however want to retain() over the remainder of the document, seeing as we don&#8217;t need to do anything else.  The final rendered document looks like the following:</p>
<div style="padding-left: 2em;">
<p><strong>Test Message</strong></p>
</p>
<p><em>Lorem </em><a rel="nofollow" href="http://www.google.com"><em>ipsum</em> dolor</a> sit amet.</p>
</div>
<h3>Composition</h3>
<p>One of Google&#8217;s contributions to the (very old) theory behind operational transformation is the idea of operation composition.  Because Wave operations are these nice, full-span sequences of discrete components, it&#8217;s fairly easy to take two operations which span the same length and merge them together into a single operation.  The results of this action are really quite intuitive.  For example, if we were to compose our document operation (the first example above) with our <code>'m'</code>-changing operation (the second example), the resulting operation would be basically the same as the original document operation, except that instead of inserting the text <code>'Test message'</code>, we would insert <code>'Test Message'</code>.  In composing the two operations together, all of the retains have disappeared and any contradicting components (e.g. a delete and an insert) have been directly merged.</p>
<p>Composition is extremely important to Wave&#8217;s OT as we will see once we start looking at client/server asymmetry.  The important thing to notice now is the fact that composed operations <em>must</em> be fundamentally compatible.  Primarily, this means that the two operations must span the same number of indexes.  It also means that we cannot compose an operation which consists of only a text insert with an operation which attempts to delete an XML element.  Obviously, that&#8217;s not going to work.  Wave&#8217;s <code>Composer</code> utility takes care of validating both the left and the right operation to ensure that they are compatible as part of the composition process.</p>
<p>Please also note that composition is <em>not</em> commutative; ordering is significant.  This is also quite intuitive.  If you type the character <code>a</code> and <em>then</em> type the character <code>b</code>, the result is quite different than if you type the character <code>b</code> and <em>then</em> type the character <code>a</code>.</p>
<h3>Transformation</h3>
<p>Here&#8217;s where we get to some of the really interesting stuff and the motivation behind all of this convoluted representational baggage.  Operational Transformation, at its core, is an <em>optimistic</em> concurrency control mechanism.  It allows two editors to modify the same section of a document at the same time without conflict.  Or rather, it provides a mechanism for sanely resolving those conflicts so that neither user intervention nor locking become necessary.</p>
<p>This is actually a harder problem than it sounds.  Imagine that we have the following document (represented as an operation):</p>
<ol>
<li>insertCharacters(<code>'go'</code>)
</ol>
<p>Now imagine that we have two editors with their cursors positioned at the end of the document.  They <em>simultaneously</em> insert a <code>t</code> and <code>a</code> character (respectively).  Thus, we will have two operations sent to the server.  The first will retain 2 items and insert a <code>t</code>, the second will retain 2 items and insert <code>a</code>.  Naturally, the server needs to enforce atomicity of edits at some point (to avoid race conditions during I/O), so one of these operations will be applied first.  However, as soon as either one of these operations is applied, the retain for the other will become invalid.  Depending on the ordering, the text of the resulting document will either be <code>'goat'</code> or <code>'gota'</code>.</p>
<p>In and of itself, this isn&#8217;t really a problem.  After all, any asynchronous server needs to make decisions about ordering at some point.  However, issues start to crop up as soon as we consider relaying operations from one client to the other.  Client A has already applied its operation, so its document text will be <code>'got'</code>.  Meanwhile, client B has already applied <em>its</em> operation, and so its document text is <code>'goa'</code>.  Each client needs the operation from the other in order to have any chance of converging to the same document state.</p>
<p>Unfortunately, if we naïvely send A&#8217;s operation to B and B&#8217;s operation to A, the results will <em>not</em> converge:</p>
<ul>
<li><code>'got'</code> + (retain(<code>2</code>); insertCharacters(<code>'a'</code>) = <code>'goat'</code></li>
<li><code>'goa'</code> + (retain(<code>2</code>); insertCharacters(<code>'t'</code>) = <code>'gota'</code></li>
</ul>
<p>Even discounting the fact that we have a document size mismatch (our operations each span 2 indexes, while their target documents have width 3), this is obviously not the desired behavior.  Even though our server may have a sane concept of consistent ordering, our clients obviously need some extra hand-holding.  Enter OT.</p>
<p>What we have here is a simple one-step diamond problem.  In the theoretical study of OT, we generally visualize this situation using diagrams like the following:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/One-StepOTDiamondunresolved.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/smiz45tgnzaoxq-9cnpzjiw.png" alt="smiz45tGNzAOXq-9cNpzjiw.png" title="smiz45tGNzAOXq-9cNpzjiw.png" border="0" width="150" height="150" /></a></center></p>
<p>The way you should read diagrams like this is as a graphical representation of operation application on two documents at the same time.  Client operations move the document to the left.  Server operations move the document to the right.  Both client and server operations move the document downward.  Thus, diagrams like these let us visualize the application of operations in a literal &#8220;state space&#8221;.  The dark blue line shows the client&#8217;s path through state space, while the gray line shows the server&#8217;s.  The vertices of these paths (not explicitly rendered) are points in state space, representing a particular state of the document.  When both the client and the server line pass through the same point, it means that the content of their respective documents were in sync, at least at that particular point in time.</p>
<p>So, in the diagram above, operation <em>a</em> could be client A&#8217;s operation (retain(<code>2</code>); insertCharacters(<code>'t'</code>)) and operation <em>b</em> could be client B&#8217;s operation.  This is of course assuming that the server chose B&#8217;s operation as the &#8220;winner&#8221; of the race condition.  As we showed earlier, we cannot simply naïvely apply operation <em>a</em> on the server and <em>b</em> on the client, otherwise we could derive differing document states (<code>'goat'</code> vs <code>'gota'</code>).  What we need to do is automatically adjust operation <em>a</em> with respect to <em>b</em> and operation <em>b</em> with respect to <em>a</em>.</p>
<p>We can do this using an operational transform.  Google&#8217;s OT is based on the following mathematical identity:</p>
<p><center><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/ot-identity1.png"  alt="xform(a, b) = (a', b'),\mbox{ where }b' \circ a \equiv a' \circ b" border="0" width="376" height="35" /></center></p>
<p>In plain English, this means that the <code>transform</code> function takes two operations, one server and one client, and produces a pair of operations.  These operations can be applied to their counterpart&#8217;s end state to produce exactly the same state when complete.  Graphically, we can represent this by the following:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/One-StepOTDiamond(2).svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/sldaw1zxskorphbvnvwh8la.png" alt="sldAW1ZXskOrPHbVnvwh8lA.png" title="sldAW1ZXskOrPHbVnvwh8lA.png" border="0" width="150" height="150" /></a></center></p>
<p>Thus, on the client-side, we receive operation <em>b</em> from the server, pair it with <em>a</em> to produce <em>(a&#8217;, b&#8217;)</em>, and then compose <em>b&#8217;</em> with <em>a</em> to produce our final document state.  We perform an analogous process on the server-side.  The mathematical definition of the <code>transform</code> function guarantees that this process will produce the <em>exact</em> same document state on both server and client.</p>
<p>Coming back to our concrete example, we can finally solve the problem of <code>'goat'</code> vs <code>'gota'</code>.  We start out with the situation where client A has applied operation <em>a</em>, arriving at a document text of <code>'got'</code>.  It now receives operation <em>b</em> from the server, instructing it to retain over 2 items and insert character <code>'a'</code>.  However, before it applies this operation (which would obviously result in the wrong document state), it uses operational transformation to derive operation <em>b&#8217;</em>.  Google&#8217;s OT implementation will resolve the conflict between <code>'t'</code> and <code>'a'</code> in favor of the server.  Thus, <code>b'</code> will consist of the following components:</p>
<ol>
<li>retain(<code>2</code>)</li>
<li>insertCharacters(<code>'a'</code>)</li>
<li>retain(<code>1</code>)</li>
</ol>
<p>You will notice that we no longer have a document size mismatch, since that last retain() ensures that the cursor reaches the end of our length-3 document state (<code>'got'</code>). </p>
<p>Meanwhile, the server has received our operation <em>a</em> and it performs an analogous series of steps to derive operation <em>a&#8217;</em>.  Once again, Google&#8217;s OT must resolve the conflict between <code>'t'</code> and <code>'a'</code> in the <em>same</em> way as it resolved the conflict for client A.  We&#8217;re trying to apply operation <em>a</em> (which inserts the <code>'t'</code> character at position 2) to the server document state, which is currently <code>'goa'</code>.  When we&#8217;re done, we must have the exact same document content as client A following the application of <em>b&#8217;</em>.  Specifically, the server document state must be <code>'goat'</code>.  Thus, the OT process will produce the operation <em>a&#8217;</em> consisting of the following components:</p>
<ol>
<li>retain(<code>3</code>)</li>
<li>insertCharacters(<code>'t'</code>)</li>
</ol>
<p>Client A applies operation <em>b&#8217;</em> to its document state, the server applies operation <em>a&#8217;</em> to its document state, and they <em>both</em> arrive at a document consisting of the text <code>'goat'</code>.  Magic!</p>
<p>It is very important that you really understand this process.  OT is all about the <code>transform</code> function and how it behaves in this exact situation.  As it turns out, this is <em>all</em> that OT does for us in and of itself.  Operational transformation is really just a concurrency primitive.  It doesn&#8217;t solve every problem with collaborative editing of a shared document (as we will see in a moment), but it does solve this problem very well.</p>
<p>One way to think of this is to keep in mind the &#8220;diamond&#8221; shape shown in the above diagram.  OT solves a very simple problem: given the top two sides of the diamond, it can derive the bottom two sides.  In practice, often times we only want one side of the box (e.g. client A only needs operation <em>b&#8217;</em>, it doesn&#8217;t need <em>a&#8217;</em>).  However, OT <em>always</em> gives us both pieces of the puzzle.  It &#8220;completes&#8221; the diamond, so to speak.</p>
<h3>Compound OT</h3>
<p>So far, everything I have presented has come pretty directly from the whitepapers on <a href="http://www.waveprotocol.org">waveprotocol.org</a>.  However, contrary to popular belief, this is <em>not</em> enough information to actually go out and implement your own collaborative editor or Wave-compatible service.</p>
<p>The problem is that OT doesn&#8217;t really do all that much in and of itself.  As mentioned above, OT solves for two sides of the diamond in state space.  It <em>only</em> solves for two sides of a simple, one-step diamond like the one shown above.  Let me say it a third time: the case shown above is the <em>only</em> case which OT handles.  As it turns out, there are other cases which arise in a client/server collaborative editor like Google Wave or Novell Pulse.  In fact, <em>most</em> cases in practice are much more complex than the one-step diamond.</p>
<p>For example, consider the situation where the client performs <em>two</em> operations (say, by typing two characters, one after the other) while at the same time the server performs one operation (originating from another client).  We can diagram this situation in the following way:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/TwoClientOneServerUnresolved.svg" alt="TwoClientOneServerUnresolved.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/svknxt1hbu9jmjrwngccqxa.png" alt="sVkNXT1Hbu9jmjrwnGCCqXA.png" title="sVkNXT1Hbu9jmjrwnGCCqXA.png" border="0" width="206" height="202" /></a></center></p>
<p>So we have two operations in the client history, <em>a</em> and <em>b</em>, and only one operation in the server history, <em>c</em>.  The client is going to send operations <em>a</em> and <em>b</em> to the server, presumably one after the other.  The first operation (<em>a</em>) is no problem at all.  Here we have the simple one-step diamond problem from above, and as well know, OT has no trouble at all in resolving this issue.  The server transforms <em>a</em> and <em>c</em> to derive operation <em>a&#8217;</em>, which it applies to its current state.  The resulting situation looks like the following:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/TwoClientOneServerHalf-Resolved.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/sjkwqr0htegxwppnzgrerlw.png" alt="sJkWQr0hTeGxwPpNZgRERLw.png" title="sJkWQr0hTeGxwPpNZgRERLw.png" border="0" width="206" height="202" /></a></center></p>
<p>Ok, so far so good.  The server has successfully transformed operation <em>a</em> against <em>c</em> and applied the resulting <em>a&#8217;</em> to its local state.  However, the moment we move on to operation <em>b</em>, disaster strikes.  The problem is that the server receives operation <em>b</em>, but it has nothing against which to transform it!</p>
<p>Remember, OT <em>only</em> solves for the bottom two sides of the diamond given the top two sides.  In the case of the first operation (<em>a</em>), the server had both top sides (<em>a</em> and <em>c</em>) and thus OT was able to derive the all-important <em>a&#8217;</em>.  However, in this case, we only have one of the sides of the diamond (<em>b</em>); we don&#8217;t have the server&#8217;s half of the equation because the server never performed such an operation!</p>
<p>In general, the problem we have here is caused by the client and server diverging by more than one step.  Whenever we get into this state, the OT becomes more complicated because we effectively need to transform incoming operations (e.g. <em>b</em>) against operations which <em>never happened!</em>  In this case, the phantom operation that we need for the purposes of OT would take us from the tail end of <em>a</em> to the tail end of <em>a&#8217;</em>.  Think of it like a &#8220;bridge&#8221; between client state space and server state space.  We need this bridge, this second half of the diamond, if we are to apply OT to solve the problem of transforming <em>b</em> into server state space.</p>
<h4>Operation Parentage</h4>
<p>In order to do this, we need to add some metadata to our operations.  Not only do our operations need to contain their components (retain, etc), they also must maintain some notion of parentage.  We need to be able to determine exactly what state an operation requires for successful application.  We will then use this information to detect the case where an incoming operation is parented on a state which is not in our history (e.g. <em>b</em> on receipt by the server).</p>
<p>For the record, Google Wave uses a monotonically-increasing scalar version number to label document states and thus, operation parents.  Novell Pulse does the exact same thing for compatibility reasons, and I recommend that anyone attempting to build a Wave-compatible service follow the same model.  However, I personally think that compound OT is a lot easier to understand if document states are labeled by a hash of their contents.</p>
<p>This scheme has some very nice advantages.  Given an operation (and its associated parent hash), we can determine instantly whether or not we have the appropriate document state to apply said operation.  Hashes also have the very convenient property of converging exactly when the document states converge.  Thus, in our one-step diamond case from earlier, operations <em>a</em> and <em>b</em> would be parented off of the same hash.  Operation <em>b&#8217;</em> would be parented off of the hash of the document resulting from applying <em>a</em> to the initial document state (and similarly for <em>a&#8217;</em>).  Finally, the point in state space where the client and server converge once again (after applying their respective operations) will have a single hash, as the document states will be synchronized.  Thus, any further operations applied on either side will be parented off of a correctly-shared hash.</p>
<p>Just a quick terminology note: when I say &#8220;parent hash&#8221;, I&#8217;m referring to the hash of the document state <em>prior</em> to applying a particular operation.  When I say &#8220;parent operation&#8221; (which I probably will from time to time), I&#8217;m referring to the hash of the document state which results from applying the &#8220;parent operation&#8221; to its parent document state.  Thus, operation <em>b</em> in the diagram above is parented off of operation <em>a</em> which is parented off of the same hash as operation <em>c</em>.</p>
<h4>Compound OT</h4>
<p>Now that our operations have parent information, our server is capable of detecting that operation <em>b</em> is not parented off of any state in its history.  What we need to do is derive an operation which will take us from the parent of <em>b</em> to some point in server state-space.  Graphically, this operation would look something like the following (rendered in dark green):</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/TwoClientOneServerInferredResolution.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/sr3ykmn1qjtwyjnsrdu-qog.png" alt="sr3ykMn1qJTwYjnSRdu_QOg.png" title="sr3ykMn1qJTwYjnSRdu_QOg.png" border="0" width="206" height="202" /></a></center></p>
<p>Fortunately for us, this operation is fairly easy to derive.  In fact, we already derived and subsequently threw it away!  Remember, OT solves for <em>two</em> sides of the diamond.  Thus, when we transformed <em>a</em> against <em>c</em>, the resulting operation pair consisted of <em>a&#8217;</em> (which we applied to our local state) and another operation which we discarded.  That operation is precisely the operation shown in green above.  Thus, all we have to do is re-derive this operation and use it as the second top side of the one-step diamond.  At this point, we have all of the information we need to apply OT and derive <em>b&#8217;</em>, which we can apply to our local state:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/TwoClientOneServerAlmostResolved.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/sygoinep-oz2132ns0tldqg.png" alt="syGoinEP_Oz2132nS0Tldqg.png" title="syGoinEP_Oz2132nS0Tldqg.png" border="0" width="206" height="202" /></a></center></p>
<p>At this point, we&#8217;re <em>almost</em> done.  The only problem we have left to resolve is the application of operation <em>c</em> on the client.  Fortunately, this is a fairly easy thing to do; after all, <em>c</em> is parented off of a state which the client has in its history, so it should be able to directly apply OT.</p>
<p>The one tricky point here is the fact that the client must transform <em>c</em> against not one but <em>two</em> operations (<em>a</em> and <em>b</em>).  Fortunately, this is fairly easy to do.  We could apply OT twice, deriving an intermediary operation in the first step (which happens to be exactly equivalent to the green intermediary operation we derived on the server) and then transforming that operation against <em>b</em>.  However, this is fairly inefficient.  OT is fast, but it&#8217;s still <em>O(n log n)</em>.  The better approach is to first compose <em>a</em> with <em>b</em> and then transform <em>c</em> against the composition of the two operations.  Thanks to Google&#8217;s careful definition of operation composition, this is guaranteed to produce the same operation as we would have received had we applied OT in two separate steps.</p>
<p>The final state diagram looks like the following:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/TwoClientOneServer.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/srzhxo2a6by5-umotoue5oq.png" alt="sRZHxo2A6by5-umoTOUe5oQ.png" title="sRZHxo2A6by5-umoTOUe5oQ.png" border="0" width="206" height="202" /></a></center></p>
<h4>Client/Server Asymmetry</h4>
<p>Technically, what we have here is enough to implement a fully-functional client/server collaborative editing system.  In fact, this is very close to what was presented in the 1995 paper on <a href="http://doi.acm.org/10.1145/215585.215706">the Jupiter collaboration system</a>.  However, while this approach is quite functional, it isn&#8217;t going to work in practice.</p>
<p>The reason for this is in that confusing middle part where the server had to derive an intermediary operation (the green one) in order to handle operation <em>b</em> from the client.  In order to do this, the server needed to hold on to operation <em>a</em> in order to use it a second time in deriving the intermediary operation.  Either that, or the server would have needed to speculatively retain the intermediary operation when it was derived for the first time during the transformation of <em>a</em> to <em>a&#8217;</em>.  Now, this may sound like a trivial point, but consider that the server must maintain this sort of information essentially indefinitely for <em>every</em> client which it handles.  You begin to see how this could become a serious scalability problem!</p>
<p>In order to solve this problem, Wave (and Pulse) imposes a very important constraint on the operations incoming to the server: any operation received by the server <em>must</em> be parented on some point in the server&#8217;s history.  Thus, the server would have rejected operation <em>b</em> in our example above since it did not branch from any point in server state space.  The parent of <em>b</em> was <em>a</em>, but the server didn&#8217;t have <em>a</em>, it only had <em>a&#8217;</em> (which is clearly a different point in state space).</p>
<p>Of course, simply rejecting any divergence which doesn&#8217;t fit into the narrow, one-step diamond pattern is a bit harsh.  Remember that practically, <em>almost all</em> situations arising in collaborative editing will be multi-step divergences like our above example.  Thus, if we naïvely rejected anything which didn&#8217;t fit into the one-step mold, we would render our collaborative editor all-but useless.</p>
<p>The solution is to move all of the heavy lifting onto the client.  We don&#8217;t want the server to have to track every single client as it moves through state space since there could be thousands (or even millions) of clients.  But if you think about it, there&#8217;s really no problem with the client tracking the <em>server</em> as it moves through state space, since there&#8217;s never going to be any more than one (logical) server.  Thus, we can offload most of the compound OT work onto the client side.</p>
<p>Before it sends any operations to the server, the client will be responsible for ensuring those operations are parented off of some point in the server&#8217;s history.  Obviously, the server may have applied some operations that the client doesn&#8217;t know about yet, but that&#8217;s ok.  As long as any operations sent by the client are parented off of <em>some</em> point in the server&#8217;s history, the server will be able to transform that incoming operation against the composition of anything which has happened since that point without tracking any history other than its own.  Thus, the server never does anything more complicated than the simple one-step diamond divergence (modulo some operation composition).  In other words, the server can always directly apply OT to incoming operations, deriving the requisite operation extremely efficiently.</p>
<p>Unfortunately, not all is sunshine and roses.  Under this new regime, the client needs to work twice as hard, translating its operations into server state space and (correspondingly) server operations back into its state space.  We haven&#8217;t seen an example of this &#8220;reverse&#8221; translation (server to client) yet, but we will in a moment.</p>
<p>In order to maintain this guarantee that the client will never send an operation to the server which is not parented on a version in server state space, we need to impose a restriction on the client: we can never send more than one operation at a time to the server.  This means that as soon as the client sends an operation (e.g. <em>a</em> in the example above), it must wait on sending <em>b</em> until the server acknowledges <em>a</em>.  This is necessary because the client needs to somehow translate <em>b</em> into server state space, but it can&#8217;t just &#8220;undo&#8221; the fact that <em>b</em> is parented on <em>a</em>.  Thus, wherever <em>b</em> eventually ends up in server state space, it has to be a descendant of <em>a&#8217;</em>, which is the server-transformed version of <em>a</em>.  Literally, we don&#8217;t know <em>where</em> to translate <em>b</em> into until we know <em>exactly</em> where <em>a</em> fits in the server&#8217;s history.</p>
<p>To help shed some light into this rather confusing scheme, let&#8217;s look at an example:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/InFlightExamplePre.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/s4uwf-7hjj47gvmzz9lze7q.png" alt="s4UWF_7Hjj47GVmZZ9LzE7Q.png" title="s4UWF_7Hjj47GVmZZ9LzE7Q.png" border="0" width="279" height="150" /></a></center></p>
<p>In this situation, the client has performed two operations, <em>a</em> and <em>b</em>.  The client immediately sends operation <em>a</em> to the server and buffers operation <em>b</em> for later transmission (the lighter blue line indicates the buffer boundary).  Note that this buffering in no way hinders the <em>application</em> of local operations.  When the user presses a key, we want the editor to reflect that change <em>immediately</em>, regardless of the buffer state.  Meanwhile, the server has applied two other operations, <em>c</em> and <em>d</em>, which presumably come from other clients.  The server still hasn&#8217;t received our operation <em>a</em>.</p>
<p>Note that we were able to send <em>a</em> immediately because we are preserving every bit of data the server sends us.  We still don&#8217;t know about <em>c</em> and <em>d</em>, but we do know that the last time we heard from the server, it was at the same point in state space as we were (the parent of <em>a</em> and <em>c</em>).  Thus, since <em>a</em> is already parented on a point in server state space, we can just send it off.</p>
<p>Now let&#8217;s fast-forward just a little bit.  The server receives operation <em>a</em>.  It looks into its history and retrieves whatever operations have been applied since the parent of <em>a</em>.  In this case, those operations are <em>c</em> and <em>d</em>.  The server then composes <em>c</em> and <em>d</em> together and transforms <em>a</em> against the result, producing <em>a&#8217;</em>.</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/InFlightExampleAppliedClient.svg" alt="InFlightExampleAppliedClient.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/sf4htljrmvstlgdlm42iyga.png" alt="sF4htlJRMvStlGdlM42IYGA.png" title="sF4htlJRMvStlGdlM42IYGA.png" border="0" width="280" height="202" /></a></center></p>
<p>After applying <em>a&#8217;</em>, the server broadcasts the operation to all clients, including the one which originated the operation.  This is a very important design feature: whenever the server applies a transformed operation, it sends that operation off to all of its clients without delay.  As long as we can guarantee strong ordering in the communication channels between the client and the server (and often we can), the clients will be able to count on the fact that they will receive operations from the server in <em>exactly</em> the order in which the server applied them.  Thus, they will be able to maintain a locally-inferred copy of the server&#8217;s history.</p>
<p>This also means that our client is going to receive <em>a&#8217;</em> from the server just like any other operation.  In order to avoid treating our own transformed operations as if they were new server operations, we need some way of identifying our own operations and treating them specially.  To do this, we add another bit of metadata to the operation: a locally-synthesized unique ID.  This unique ID will be attached to the operation when we send it to the server and <em>preserved</em> by the server through the application of OT.  Thus, operation <em>a&#8217;</em> will have the same ID as operation <em>a</em>, but a very different ID from operations <em>c</em> and <em>d</em>.</p>
<p>With this extra bit of metadata in place, clients are able to distinguish their own operations from others sent by the server.  Non-self-initiated operations (like <em>c</em> and <em>d</em>) must be translated into client state space and applied to the local document.  Self-initiated operations (like <em>a&#8217;</em>) are actually server acknowledgements of our currently-pending operation.  Once we receive this acknowledgement, we can flush the client buffer and send the pending operations up to the server.</p>
<p>Moving forward with our example, let&#8217;s say that the client receives operation <em>c</em> from the server.  Since <em>c</em> is already parented on a version in our local history, we can apply simple OT to transform it against the composition of <em>a</em> and <em>b</em> and apply the resulting operation to our local document:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/InFlightExampleAppliedClientAppliedServer1.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/sjlugn387jtmeulo58tktcq.png" alt="sjLUgN387JTmEuLO58TktCQ.png" title="sjLUgN387JTmEuLO58TktCQ.png" border="0" width="280" height="203" /></a></center></p>
<p>Of course, as we always need to keep in mind, the client is a live editor which presumably has a real person typing madly away, changing the document state.  There&#8217;s nothing to prevent the client from creating <em>another</em> operation, parented off of <em>c&#8217;</em> which pushes it even further out of sync with the server:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/InFlightExampleAppliedClientAppliedServer1WayOut.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/s7tvo-jtrw9ryxeemjpkbia.png" alt="s7TvO-Jtrw9RYxEEmjpKBIA.png" title="s7TvO-Jtrw9RYxEEmjpKBIA.png" border="0" width="280" height="262" /></a></center></p>
<p>This is really getting to be a bit of a mess!  We&#8217;ve only sent one of our operations to the server, we&#8217;re trying to buffer the rest, but the server is trickling in more operations to confuse things and we still haven&#8217;t received the acknowledgement for our very first operation!  As it turns out, this is the most complicated case which can ever arise in a Wave-style collaborative editor.  If we can nail this one, we&#8217;re good to go.</p>
<p>The first thing we need to do is figure out what to do with <em>d</em>.  We&#8217;re going to receive that operation before we receive <em>a&#8217;</em>, and so we really need to figure out how to apply it to our local document.  Once again, the problem is that the incoming operation (<em>d</em>) is not parented off of any point in our state space, so OT can&#8217;t help us directly.  Just as with <em>b</em> in our fundamental compound OT example from earlier, we need to infer a &#8220;bridge&#8221; between server state space and client state space.  We can then use this bridge to transform <em>d</em> and slide it all the way down into position at the end of our history.</p>
<p>To do this, we need to identify conceptually what operation(s) would take us from the parent of <em>d</em> to the the most recent point in our history (after applying <em>e</em>).  Specifically, we need to infer the green dashed line in the diagram below.  Once we have this operation (whatever it is), we can compose it with <em>e</em> and get a single operation against which we can transform <em>d</em>.</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/InFlightExampleServerTranslation.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/sstrn9pivyemy1aq9ueqszg.png" alt="sSTrn9pivyEMy1aq9UeQszg.png" title="sSTrn9pivyEMy1aq9UeQszg.png" border="0" width="279" height="262" /></a></center></p>
<p>The first thing to recognize is that the inferred bridge (the green dashed line) is going to be composed exclusively of client operations.  This is logical as we are attempting to translate a server operation, so there&#8217;s no need to transform it against something which the server already has.  The second thing to realize is that this bridge is traversing a line parallel to the composition of <em>a</em> and <em>b</em>, just &#8220;shifted down&#8221; exactly one step.  To be precise, the bridge is what we would get if we composed <em>a</em> and <em>b</em> and then transformed the result against <em>c</em>.</p>
<p>Now, we could try to detect this case specifically and write some code which would fish out <em>a</em> and <em>b</em>, compose them together, transform the result against <em>c</em>, compose the result of <em>that</em> with <em>e</em> and finally transform <em>d</em> against the final product, but as you can imagine, it would be a mess.  More than that, it would be dreadfully inefficient.  No, what we want to do is proactively maintain a bridge which will always take us from the absolute latest point in server state space (that we know of) to the absolute latest point in client state space.  Thus, whenever we receive a new operation from the server, we can directly transform it against this bridge without any extra effort.</p>
<h4>Building the Bridge</h4>
<p>We can maintain this bridge by composing together all operations which have been synthesized locally since the point where we diverged from the server.  Thus, at first, the bridge consists only of <em>a</em>.  Soon afterward, the client applies its next operation, <em>b</em>, which we compose into the bridge.  Of course, we inevitably receive an operation from the server, in this case, <em>c</em>.  At this point, we use our bridge to transform <em>c</em> immediately to the correct point in client state space, resulting in <em>c&#8217;</em>.  Remember that OT derives <em>both</em> bottom sides of the diamond.  Thus, we not only receive <em>c&#8217;</em>, but we also receive a new bridge which has been transformed against <em>c</em>.  This new bridge is precisely the green dashed line in our diagram above.</p>
<p>Meanwhile, the client has performed another operation, <em>e</em>.  Just as before, we immediately compose this operation onto the bridge.  Thanks to our bit of trickery when transforming <em>c</em> into <em>c&#8217;</em>, we can rest assured that this composition will be successful.  In other words, we know that the result of applying the bridge to the document resulting from <em>c</em> will be precisely the document state before applying <em>e</em>, thus we can cleanly compose <em>e</em> with the bridge.</p>
<p>Finally, we receive <em>d</em> from the server.  Just as with <em>c</em>, we can immediately transform <em>d</em> against the bridge, deriving both <em>d&#8217;</em> (which we apply to our local document) as well as the new bridge, which we hold onto for future server translations.</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/InFlightExampleAppliedServer.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/shay5yvvxthunzmgtzckqia.png" alt="shAY5YVvXThuNzmGtZcKQiA.png" title="shAY5YVvXThuNzmGtZcKQiA.png" border="0" width="280" height="324" /></a></center></p>
<p>With <em>d&#8217;</em> now in hand, the next operation we will receive from the server will be <em>a&#8217;</em>, the transformed version of our <em>a</em> operation from earlier.  As soon as we receive this operation, we need to compose together any operations which have been held in the buffer and send them off to the server.  However, before we send this buffer, we need to make sure that it is parented off of some point in server state space.  And as you can see by the diagram above, we&#8217;re going to have troubles both in composing <em>b</em> and <em>e</em> (since <em>e</em> does not descend directly from <em>b</em>) and in guaranteeing server parentage (since <em>b</em> is parented off of a point in client state space not shared with the server).</p>
<p>To solve this problem, we need to play the same trick with our buffer as we previously played with the translation bridge: any time the client or the server does anything, we adjust the buffer accordingly.  With the bridge, our invariant was that the bridge would always be parented off of a point in server state space and would be the one operation needed to transform incoming server operations.  With the buffer, the invariant must be that the buffer is always parented off of a point in server state space and will be the one operation required to bring the server into perfect sync with the client (given the operations we have received from the server thus far).</p>
<p>The one wrinkle in this plan is the fact that the buffer <em>cannot</em> contain the operation which we have already sent to the server (in this case, <em>a</em>).  Thus, the buffer isn&#8217;t really going to be parented off of server state space until we receive <em>a&#8217;</em>, at which point we should have adjusted the buffer so that it is parented precisely on <em>a&#8217;</em>, which we now know to be in server state space.</p>
<p>Building the buffer is a fairly straightforward matter.  Once the client sends <em>a</em> to the server, it goes into a state where any further local operations will be composed into the buffer (which is initially empty).  After <em>a</em>, the next client operation which is performed is <em>b</em>, which becomes the first operation composed into the buffer.  The next operation is <em>c</em>, which comes from the server.  At this point, we must somehow transform the buffer with respect to the incoming server operation.  However, obviously the server operation (<em>c</em>) is not parented off of the same point as our buffer (currently <em>b</em>).  Thus, we must <em>first</em> transform <em>c</em> against <em>a</em> to derive an intermediary operation, <em>c&#8221;</em>, which is parented off of the parent of the buffer (<em>b</em>):</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/InFlightExampleAppliedServerInferredPartialBuffer.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/sf1hdlplyd4j0v7i2bstt1w2.png" alt="sF1HdlPLYD4J0v7i2BStT1w(2).png" title="sF1HdlPLYD4J0v7i2BStT1w(2).png" border="0" width="280" height="324" /></a></center></p>
<p>Once we have this inferred operation, <em>c&#8221;</em>, we can use it to transform the buffer (<em>b</em>) &#8220;down&#8221; one step.  When we derive <em>c&#8221;</em>, we also derive a transformed version of <em>a</em>, which is <em>a&#8221;</em>.  In essence, we are anticipating the operation which the server will derive when it transforms <em>a</em> against its local history.  The idea is that when we finally do receive the real <em>a&#8217;</em>, it should be exactly equivalent to our inferred <em>a&#8221;</em>.</p>
<p>At this point, the client performs another operation, <em>e</em>, which we immediately compose into the buffer (remember, we also composed it into the bridge, so we&#8217;ve got several things going on here).  This composition works because we already transformed the buffer (<em>b</em>) against the intervening server operation (<em>c</em>).  So <em>e</em> is parented off of <em>c&#8217;</em>, which is the same state as we get were we to apply <em>a&#8221;</em> and then the buffer to the server state resulting from <em>c</em>.  This should sound familiar.  By a strange coincidence, <em>a&#8221;</em> composed with the buffer is precisely equivalent to the bridge.  In practice, we use this fact to only maintain one set of data, but the process is a little easier to explain when we keep them separate.</p>
<p>Checkpoint time!  The client has performed operation <em>a</em>, which it sent to the server.  It then performed operation <em>b</em>, received operation <em>c</em> and finally performed operation <em>e</em>.  We have an operation, <em>a&#8221;</em> which will be equivalent to <em>a&#8217;</em> if the server has no other intervening operations.  We also have a buffer which is the composition of a transformed <em>b</em> and <em>e</em>.  This buffer, composed with <em>a&#8221;</em>, serves as a bridge from the very latest point in server state space (that we know of) to the very latest point in client state space.</p>
<p>Now is when we receive the next operation from the server, <em>d</em>.  Just as when we received <em>c</em>, we start by transforming it against <em>a&#8221;</em> (our &#8220;in flight&#8221; operation).  The resulting transformation of <em>a&#8221;</em> becomes our new in flight operation, while the resulting transformation of <em>d</em> is in turn used to transform our buffer down another step.  At this point, we have a new <em>a&#8221;</em> which is parented off of <em>d</em> and a newly-transformed buffer which is parented off of <em>a&#8221;</em>.</p>
<p><em>Finally,</em> we receive <em>a&#8217;</em> from the server.  We could do a bit of verification now to ensure that <em>a&#8221;</em> really is equivalent to <em>a&#8217;</em>, but it&#8217;s not necessary.  What we do need to do is take our buffer and send it up to the server.  Remember, the buffer is parented off of <em>a&#8221;</em>, which happens to be equivalent to <em>a&#8217;</em>.  Thus, when we send the buffer, we know that it is parented off of a point in server state space.  The server will eventually acknowledge the receipt of our buffer operation, and we will (finally) converge to a shared document state:</p>
<p><center><a href="/blog/misc/understanding-and-applying-operational-transformation/InFlightExample.svg"><img src="http://www.codecommit.com/blog/wp-content/uploads/2010/05/sgGTv_bxol7LtNWAnFsCCXg2.png" alt="sgGTv_bxol7LtNWAnFsCCXg(2).png" title="sgGTv_bxol7LtNWAnFsCCXg(2).png" border="0" width="280" height="324" /></a></center></p>
<p>The good news is that, as I mentioned before, this was the most complicated case that a collaborative editor client ever needs to handle.  It should be clear that no matter how many additional server operations we receive, or how many more client operations are performed, we can simply handle them within this general framework of buffering and bridging.  And, as when we sent the <em>a</em> operation, sending the buffer puts the client back into buffer mode with any new client operations being composed into this buffer.  In practice, an actively-editing client will spend most of its time in this state: very much out of sync with the server, but maintaining the inferred operations required to get things back together again.</p>
<h3>Conclusion</h3>
<p>The OT scheme presented in this article is precisely what we use on Novell Pulse.  And while I&#8217;ve never seen Wave&#8217;s client code, numerous little hints in the waveprotocol.org whitepapers as well as discussions with the Wave API team cause me to strongly suspect that this is how Google does it as well.  What&#8217;s more, Google Docs recently revamped their word processing application with a new editor based on operational transformation.  While there hasn&#8217;t been any word from Google on how exactly they handle &#8220;compound OT&#8221; cases within Docs, it looks like they followed the same route as Wave and Pulse (the tell-tale sign is a perceptible &#8220;chunking&#8221; of incoming remote operations during connection lag).</p>
<p>None of the information presented in this article on &#8220;compound OT&#8221; is available within Google&#8217;s documentation on waveprotocol.org (unfortunately).  Anyone attempting to implement a collaborative editor based on Wave&#8217;s OT would have to rediscover all of these steps on their own.  My hope is that this article rectifies that situation.  To the best of my knowledge, the information presented here should be everything you need to build your own client/server collaborative editor based on operational transformation.  So, no more excuses for second-rate collaboration!</p>
<h3>Resources</h3>
<ul>
<li>
<p>To obtain Google&#8217;s OT library, you must take a Mercurial clone of the <a href="http://code.google.com/p/wave-protocol/">wave-protocol</a> repository:</p>
<pre>$ hg clone https://wave-protocol.googlecode.com/hg/ wave-protocol</pre>
<p>Once you have the source, you should be able to build everything you need by simply running the Ant build script.  The main OT classes are <code>org.waveprotocol.wave.model.document.operation.algorithm.Composer</code> and <code>org.waveprotocol.wave.model.document.operation.algorithm.Transformer</code>.  Their use is exactly as described in this article.  Please note that <code>Transformer</code> does not handle compound OT, you will have to implement that yourself by using <code>Composer</code> and <code>Transformer</code>.  Operations are represented by the <code>org.waveprotocol.wave.model.document.operation.DocOp</code> interface, and can be converted into the more useful <code>org.waveprotocol.wave.model.document.operation.BufferedDocOp</code> implementation by using the <code>org.waveprotocol.wave.model.document.operation.impl.DocOpUtil.buffer</code> method.</p>
<p>All of these classes can be found in the <strong>fedone-api-0.2.jar</strong> file.</p>
</li>
<li><a href="http://www.waveprotocol.org/whitepapers/operational-transform">Google&#8217;s Own Whitepaper on OT</a></li>
<li><a href="http://doi.acm.org/10.1145/215585.215706">The original paper on the Jupiter system</a> (the primary theoretical basis for Google&#8217;s OT)</li>
<li><a href="http://en.wikipedia.org/wiki/Operational_transformation">Wikipedia&#8217;s article on operational transformation</a> (surprisingly informative)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/understanding-and-applying-operational-transformation/feed</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>Interop Between Java and Scala</title>
		<link>http://www.codecommit.com/blog/java/interop-between-java-and-scala</link>
		<comments>http://www.codecommit.com/blog/java/interop-between-java-and-scala#comments</comments>
		<pubDate>Mon, 09 Feb 2009 07:00:00 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Scala]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/java/interop-between-java-and-scala</guid>
		<description><![CDATA[Sometimes, the simplest things are the most difficult to explain.&#160; Scala&#8217;s interoperability with Java is completely unparalleled, even including languages like Groovy which tout their tight integration with the JVM&#8217;s venerable standard-bearer.&#160; However, despite this fact, there is almost no documentation (aside from chapter 29 in Programming in Scala) which shows how this Scala/Java integration [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes, the simplest things are the most difficult to explain.&nbsp; Scala&#8217;s interoperability with Java is completely unparalleled, even including languages like Groovy which tout their tight integration with the JVM&#8217;s venerable standard-bearer.&nbsp; However, despite this fact, there is almost no documentation (aside from chapter 29 in <em>Programming in Scala</em>) which shows how this Scala/Java integration works and where it can be used.&nbsp; So while it may not be the most exciting or theoretically interesting topic, I have taken it upon myself to fill the gap.</p>
<h3>Classes are Classes</h3>
<p>The first piece of knowledge you need about Scala is that Scala classes are real JVM classes.&nbsp; Consider the following snippets, the first in Java:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> Person <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> String <span style="color: #2e7c0f;">getName</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">return</span> <span style="color: #cb0710;">&quot;Daniel Spiewak&quot;</span>;
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>&#8230;and the second in Scala:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">class</span> Person <span class="br0">&#123;</span>
  <span style="color: #140dcc;">def</span> <span style="color: #2e7c0f;">getName</span><span class="br0">&#40;</span><span class="br0">&#41;</span> = <span style="color: #cb0710;">&quot;Daniel Spiewak&quot;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Despite the very different syntax, both of these snippets will produce almost identical bytecode when compiled.&nbsp; Both will result in a single file, <code>Person.class</code>, which contains a default, no-args constructor and a public method, <code>getName()</code>, with return type <code>java.lang.String</code>.&nbsp; Both classes may be used from Scala:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">val</span> p = <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">Person</span><span class="br0">&#40;</span><span class="br0">&#41;</span>
p.<span style="color: #2e7c0f;">getName</span><span class="br0">&#40;</span><span class="br0">&#41;</span>       <span style="color: #999999;">// =&gt; &quot;Daniel Spiewak&quot;</span></pre></td></tr></table></div>

<p>&#8230;and from Java:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5">Person p = <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">Person</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
p.<span style="color: #2e7c0f;">getName</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;      <span style="color: #445fba;">// =&gt; &quot;Daniel Spiewak&quot;</span></pre></td></tr></table></div>

<p>In the case of either language, we can easily swap implementations of the <code>Person</code> class without making any changes to the call-site.&nbsp; In short, you can use Scala classes from Java (as well as Java classes from Scala) without ever even knowing that they were defined within another language.</p>
<p>This single property is the very cornerstone of Scala&#8217;s philosophy of bytecode translation.&nbsp; Wherever possible &mdash; and that being more often than not &mdash; Scala elements are translated into bytecode which <em>directly</em> corresponds to the equivalent feature in Java.&nbsp; Scala classes equate to Java classes, methods and fields within those classes become Java methods and fields.</p>
<p>This allows some pretty amazing cross-language techniques.&nbsp; For example, I can extend a Java class within Scala, overriding some methods.&nbsp; I can in turn extend this Scala class from within Java once again with everything working exactly as anticipated:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">class</span> MyAbstractButton <span style="color: #140dcc;">extends</span> JComponent <span class="br0">&#123;</span>
  <span style="color: #140dcc;">private</span> <span style="color: #140dcc;">var</span> pushed = <span style="color: #a40d14;">false</span>
&nbsp;
  <span style="color: #140dcc;">def</span> <span style="color: #2e7c0f;">setPushed</span><span class="br0">&#40;</span>p: <span style="color: #800080;">Boolean</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
    pushed = p
  <span class="br0">&#125;</span>
&nbsp;
  <span style="color: #140dcc;">def</span> getPushed = pushed
&nbsp;
  <span style="color: #140dcc;">override</span> <span style="color: #140dcc;">def</span> <span style="color: #2e7c0f;">paintComponent</span><span class="br0">&#40;</span>g: Graphics<span class="br0">&#41;</span> <span class="br0">&#123;</span>
    <span style="color: #a40d14;">super</span>.<span style="color: #2e7c0f;">paintComponent</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span>
&nbsp;
    <span style="color: #999999;">// draw a button</span>
  <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>


<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> ProKitButton <span style="color: #140dcc;">extends</span> MyAbstractButton <span class="br0">&#123;</span>
    <span style="color: #445fba;">// do something uniquely Apple-esque</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<h3>Traits are Interfaces</h3>
<p>This is probably the one interoperability note which is the <em>least</em> well-known.&nbsp; Scala&#8217;s traits are vastly more powerful than Java&#8217;s interfaces, often leading developers to the erroneous conclusion that they are incompatible.&nbsp; Specifically, traits allow method definitions, while interfaces must be purely-abstract.&nbsp; Yet, despite this significant distinction, Scala is still able to compile traits into interfaces at the bytecode level&#8230;with some minor enhancements.</p>
<p>The simplest case is when the trait only contains abstract members.&nbsp; For example:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">trait</span> Model <span class="br0">&#123;</span>
  <span style="color: #140dcc;">def</span> value: <span style="color: #857d1f;">Any</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>If we look at the bytecode generated by compiling this trait, we will see that it is actually equivalent to the following Java definition:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">interface</span> Model <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> Object <span style="color: #2e7c0f;">value</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Thus, we can declare traits in Scala and implement them as interfaces in Java classes:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> StringModel <span style="color: #140dcc;">implements</span> Model <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> Object <span style="color: #2e7c0f;">value</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">return</span> <span style="color: #cb0710;">&quot;Hello, World!&quot;</span>;
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>This is precisely equivalent to a Scala class which mixes-in the <code>Model</code> trait:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">class</span> StringModel <span style="color: #140dcc;">extends</span> Model <span class="br0">&#123;</span>
  <span style="color: #140dcc;">def</span> value = <span style="color: #cb0710;">&quot;Hello, World!&quot;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Things start to get a little sticky when we have method definitions within our traits.&nbsp; For example, we could add a <code>printValue()</code> method to our <code>Model</code> trait:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">trait</span> Model <span class="br0">&#123;</span>
  <span style="color: #140dcc;">def</span> value: <span style="color: #857d1f;">Any</span>
&nbsp;
  <span style="color: #140dcc;">def</span> <span style="color: #2e7c0f;">printValue</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
    <span style="color: #2e7c0f;">println</span><span class="br0">&#40;</span>value<span class="br0">&#41;</span>
  <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Obviously, we can&#8217;t directly translate this into <em>just</em> an interface; something else will be required.&nbsp; Scala solves this problem by introducing an ancillary class which contains all of the method definitions for a given trait.&nbsp; Thus, when we look at the translation for our modified <code>Model</code> trait, the result looks something like this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">interface</span> Model <span style="color: #140dcc;">extends</span> ScalaObject <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> Object <span style="color: #2e7c0f;">value</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
&nbsp;
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">printValue</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
<span class="br0">&#125;</span>
&nbsp;
<span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> Model$class <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> <span style="color: #140dcc;">static</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">printValue</span><span class="br0">&#40;</span>Model self<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        System.<span class="me1">out</span>.<span style="color: #2e7c0f;">println</span><span class="br0">&#40;</span>self.<span style="color: #2e7c0f;">value</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Thus, we can get the <em>effect</em> of Scala&#8217;s powerful mixin inheritance within Java by implementing the <code>Model</code> trait and delegating from the <code>printValue()</code> method to the <code>Model$class</code> implementation:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> StringModel <span style="color: #140dcc;">implements</span> Model <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> Object <span style="color: #2e7c0f;">value</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">return</span> <span style="color: #cb0710;">&quot;Hello, World!&quot;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">printValue</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        Model$class.<span style="color: #2e7c0f;">printValue</span><span class="br0">&#40;</span><span style="color: #a40d14;">this</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #445fba;">// method missing here (see below)</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>It&#8217;s not perfect, but it allows us to use some of Scala&#8217;s more advanced trait-based functionality from within Java.&nbsp; Incidentally, the above code <em>does</em> compile without a problem.&nbsp; I wasn&#8217;t actually aware of this fact, but &#8220;<code>$</code>&#8221; is a legal character in Java identifiers, allowing interaction with some of Scala&#8217;s more interesting features.</p>
<p>There is, however, one little wrinkle that I&#8217;m conveniently side-stepping: the <code>$tag</code> method.&nbsp; This is a method defined within the <code>ScalaObject</code> trait designed to help optimize pattern matching.&nbsp; Unfortunately, it also means yet another abstract method which must be defined when implementing Scala traits which contain method definitions.&nbsp; The correct version of the <code>StringModel</code> class from above actually looks like the following:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> StringModel <span style="color: #140dcc;">implements</span> Model <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> Object <span style="color: #2e7c0f;">value</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">return</span> <span style="color: #cb0710;">&quot;Hello, World!&quot;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">printValue</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        Model$class.<span style="color: #2e7c0f;">printValue</span><span class="br0">&#40;</span><span style="color: #a40d14;">this</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">int</span> $tag<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">return</span> <span style="color: #cb0710;">0</span>;
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>To be honest, I&#8217;m not sure what is the &#8220;correct&#8221; value to return from <code>$tag</code>.&nbsp; In this case, <code>0</code> is just a stub, and I&#8217;m guessing a safe one since <code>StringModel</code> is the only subtype of <code>Model</code>.&nbsp; Can anyone who knows more about the Scala compiler shed some light on this issue?</p>
<h3>Generics are, well&#8230;Generics</h3>
<p>Generics are (I think) probably the coolest and most well-done part of Scala&#8217;s Java interop.&nbsp; Anyone who has more than a passing familiarity with Scala will know that its type system is significantly more powerful than Java&#8217;s.&nbsp; Some of this power comes in the form of its type parameterization, which is vastly superior to Java&#8217;s generics.&nbsp; For example, type variance can be handled at declaration-site, rather than only call-site (as in Java):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">abstract</span> <span style="color: #140dcc;">class</span> List<span style="color: #7f0055;"><span class="br0">&#91;</span>+A<span class="br0">&#93;</span></span> <span class="br0">&#123;</span>
  ...
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>The <code>+</code> notation prefixing the <code>A</code> type parameter on the <code>List</code> class means that <code>List</code> will vary covariantly with its parameter.&nbsp; In English, this means that <code>List[String]</code> is a subtype of <code>List[Any]</code> (because <code>String</code> is a subtype of <code>Any</code>).&nbsp; This is a very intuitive relationship, but one which Java is incapable of expressing.</p>
<p>Fortunately, Scala is able to exploit one of the JVM&#8217;s most maligned features to support things like variance and higher-kinds without sacrificing perfect Java interop.&nbsp; Thanks to type erasure, Scala generics can be compiled to Java generics without any loss of functionality on the Scala side.&nbsp; Thus, the Java translation of the <code>List</code> definition above would be as follows:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #140dcc;">abstract</span> <span style="color: #857d1f;">class</span> List&lt;A&gt; <span class="br0">&#123;</span>
    ...
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>The variance annotation is gone, but Java wouldn&#8217;t be able to make anything of it anyway.&nbsp; The huge advantage to this translation scheme is it means that Java&#8217;s generics and Scala&#8217;s generics are one and the same at the bytecode level.&nbsp; Thus, Java can use generic Scala classes without a second thought:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #800080;">import</span> scala.<span class="me1">Tuple2</span>;
&nbsp;
...
<span class="me1">Tuple2</span>&lt;String, String&gt; me = <span style="color: #140dcc;">new</span> Tuple2&lt;String, String&gt;<span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Daniel&quot;</span>, <span style="color: #cb0710;">&quot;Spiewak&quot;</span><span class="br0">&#41;</span>;</pre></td></tr></table></div>

<p>Obviously, this is a lot more verbose than the Scala equivalent, &#8220;<code>("Daniel", "Spiewak")</code>&#8220;, but at least it works.</p>
<h3>Operators are Methods</h3>
<p>One of the most obvious differences between Java and Scala is that Scala supports operator overloading.&nbsp; In fact, Scala supports a variant of operator overloading which is far stronger than anything offered by C++, C# or even Ruby.&nbsp; With very few exceptions, <em>any</em> symbol may be used to define a custom operator.&nbsp; This provides tremendous flexibility in DSLs and even your average, every-day API (such as <code>List</code> and <code>Map</code>).</p>
<p>Obviously, this particular language feature is not going to translate into Java quite so nicely.&nbsp; Java doesn&#8217;t support operator overloading of <em>any</em> variety, much less the über-powerful form defined by Scala.&nbsp; Thus, Scala operators must be compiled into an entirely non-symbolic form at the bytecode level, otherwise Java interop would be irreparably broken, and the JVM itself would be unable to swallow the result.</p>
<p>A good starting place for deciding on this translation is the way in which operators are declared in Scala: as methods.&nbsp; Every Scala operator (including unary operators like <code>!</code>) is defined as a method within a class:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">abstract</span> <span style="color: #140dcc;">class</span> List<span style="color: #7f0055;"><span class="br0">&#91;</span>+A<span class="br0">&#93;</span></span> <span class="br0">&#123;</span>
  <span style="color: #140dcc;">def</span> ::<span style="color: #7f0055;"><span class="br0">&#91;</span>B &gt;: A<span class="br0">&#93;</span></span><span class="br0">&#40;</span>e: B<span class="br0">&#41;</span> = ...
&nbsp;
  <span style="color: #140dcc;">def</span> +<span style="color: #7f0055;"><span class="br0">&#91;</span>B &gt;: A<span class="br0">&#93;</span></span><span class="br0">&#40;</span>e: B<span class="br0">&#41;</span> = ...
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Since Scala classes become Java classes and Scala methods become Java methods, the most obvious translation would be to take each operator method and produce a corresponding Java method with a heavily-translated name.&nbsp; In fact, this is exactly what Scala does.&nbsp; The above class will compile into the equivalent of this Java code:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #140dcc;">abstract</span> <span style="color: #857d1f;">class</span> List&lt;A&gt; <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> &lt;B <span style="color: #a40d14;">super</span> A&gt; List&lt;B&gt; $colon$colon<span class="br0">&#40;</span>B e<span class="br0">&#41;</span> <span class="br0">&#123;</span> ... <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">public</span> &lt;B <span style="color: #a40d14;">super</span> A&gt; List&lt;B&gt; $plus<span class="br0">&#40;</span>B e<span class="br0">&#41;</span> <span class="br0">&#123;</span> ... <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Every allowable symbol in Scala&#8217;s method syntax has a corresponding translation of the form &#8220;<code>$</code><em>trans</em>&#8220;.&nbsp; A list of supported translations is one of those pieces of documentation that you would expect to find on the Scala website.&nbsp; However, alas, it is absent.&nbsp; The following is a table of all of the translations of which I am aware:</p>
<div align="center">
<table border="1">
<tbody>
<tr>
<td><b>Scala Operator</b></td>
<td><b>Compiles To</b></td>
</tr>
<tr>
<td><code>=</code></td>
<td><code>$eq</code></td>
</tr>
<tr>
<td><code>&gt;</code></td>
<td><code>$greater</code></td>
</tr>
<tr>
<td><code>&lt;</code></td>
<td><code>$less</code></td>
</tr>
<tr>
<td><code>+</code></td>
<td><code>$plus</code></td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>$minus</code></td>
</tr>
<tr>
<td><code>*</code></td>
<td><code>$times</code></td>
</tr>
<tr>
<td><code>/</code></td>
<td><code>div</code></td>
</tr>
<tr>
<td><code>!</code></td>
<td><code>$bang</code></td>
</tr>
<tr>
<td><code>@</code></td>
<td><code>$at</code></td>
</tr>
<tr>
<td><code>#</code></td>
<td><code>$hash</code></td>
</tr>
<tr>
<td><code>%</code></td>
<td><code>$percent</code></td>
</tr>
<tr>
<td><code>^</code></td>
<td><code>$up</code></td>
</tr>
<tr>
<td><code>&amp;</code></td>
<td><code>$amp</code></td>
</tr>
<tr>
<td><code>~</code></td>
<td><code>$tilde</code></td>
</tr>
<tr>
<td><code>?</code></td>
<td><code>$qmark</code></td>
</tr>
<tr>
<td><code>|</code></td>
<td><code>$bar</code></td>
</tr>
<tr>
<td><code>\</code></td>
<td><code>$bslash</code></td>
</tr>
<tr>
<td><code>:</code></td>
<td><code>$colon</code></td>
</tr>
</tbody>
</table>
</div>
<p>Using this table, you should be able to derive the &#8220;real name&#8221; of any Scala operator, allowing its use from within Java.&nbsp; Of course, the idea solution would be if Java actually supported operator overloading and could use Scala&#8217;s operators directly, but somehow I doubt that will happen any time soon.</p>
<h3>Odds and Ends</h3>
<p>One final tidbit which might be useful: <code>@BeanProperty</code>.&nbsp; This is a special annotation which is essentially read by the Scala compiler to mean &#8220;generate a getter and setter for this field&#8221;:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">import</span> scala.<span class="me1">reflect</span>.<span class="me1">BeanProperty</span>
&nbsp;
<span style="color: #140dcc;">class</span> Person <span class="br0">&#123;</span>
  <span style="color: #ca9925;">@BeanProperty</span>
  <span style="color: #140dcc;">var</span> name = <span style="color: #cb0710;">&quot;Daniel Spiewak&quot;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>The need for this annotation comes from the fact that Scala&#8217;s ever-convenient <code>var</code> and <code>val</code> declarations actually generate code which looks like the following (assuming no <code>@BeanProperty</code> annotation):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #445fba;">// *without* @BeanProperty</span>
<span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> Person <span class="br0">&#123;</span>
    <span style="color: #140dcc;">private</span> String name = <span style="color: #cb0710;">&quot;Daniel Spiewak&quot;</span>;
&nbsp;
    <span style="color: #140dcc;">public</span> String <span style="color: #2e7c0f;">name</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">return</span> name;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">void</span> name_$eq<span class="br0">&#40;</span>String name<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #a40d14;">this</span>.<span class="me1">name</span> = name;
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>This works well from Scala, but as you can see, Java-land is not quite paradise.&nbsp; While it is certainly feasible to use the <code>_$eq</code> syntax instead of the familiar <code>set</code>/<code>get</code>/<code>is</code> triumvirate, it is not an ideal situation.</p>
<p>Adding the <code>@BeanProperty</code> annotation (as we have done in the earlier Scala snippet) solves this problem by causing the Scala compiler to auto-generate more than one pair of methods for that particular field.&nbsp; Rather than just <code>value</code> and <code>value_$eq</code>, it will also generate the familiar <code>getValue</code> and <code>setValue</code> combination that all Java developers will know and love.&nbsp; Thus, the actual translation resulting from the <code>Person</code> class in Scala will be as follows:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> Person <span class="br0">&#123;</span>
    <span style="color: #140dcc;">private</span> String name = <span style="color: #cb0710;">&quot;Daniel Spiewak&quot;</span>;
&nbsp;
    <span style="color: #140dcc;">public</span> String <span style="color: #2e7c0f;">name</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">return</span> name;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">public</span> String <span style="color: #2e7c0f;">getName</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">return</span> <span style="color: #2e7c0f;">name</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">void</span> name_$eq<span class="br0">&#40;</span>String name<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #a40d14;">this</span>.<span class="me1">name</span> = name;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">setName</span><span class="br0">&#40;</span>String name<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        name_$eq<span class="br0">&#40;</span>name<span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>This merely provides a pair of delegates, but it does suffice to smooth out the mismatch between Java Bean-based frameworks and Scala&#8217;s elegant instance fields.</p>
<h3>Conclusion</h3>
<p>This has been a whirlwind, disjoint tour covering a fairly large slice of information on how to use Scala code from within Java.&nbsp; For the most part, things are all roses and fairy tales.&nbsp; Scala classes map precisely onto Java classes, generics work perfectly, and pure-abstract traits correspond directly to Java interfaces.&nbsp; Other areas where Scala is decidedly more powerful than Java (like operators) do tend to be a bit sticky, but there is <em>always</em> a way to make things work.</p>
<p>If you&#8217;re considering mixing Scala and Java sources within your project, I hope that this article has smoothed over some of the doubts you may have had regarding its feasibility.&nbsp; As David Pollack says, Scala is really &#8220;just another Java library&#8221;.&nbsp; Just stick <code>scala-library.jar</code> on your classpath and all of your Scala classes should be readily available within your Java application.&nbsp; And given how well Scala integrates with Java at the language level, what could be simpler?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/interop-between-java-and-scala/feed</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>Hacking Buildr: Interactive Shell Support</title>
		<link>http://www.codecommit.com/blog/java/hacking-buildr-interactive-shell-support</link>
		<comments>http://www.codecommit.com/blog/java/hacking-buildr-interactive-shell-support#comments</comments>
		<pubDate>Mon, 12 Jan 2009 07:52:51 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Scala]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/java/hacking-buildr-interactive-shell-support</guid>
		<description><![CDATA[Last week, we looked at the unfortunately-unexplored topic of Scala/Java joint compilation.&#160; Specifically, we saw several different ways in which this functionality may be invoked covering a number of different tools.&#160; Among these tools was Buildr, a fast Ruby-based drop-in replacement for Maven with a penchant for simple configuration.&#160; In the article I mentioned that [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, we looked at the unfortunately-unexplored topic of <a href="http://www.codecommit.com/blog/scala/joint-compilation-of-scala-and-java-sources">Scala/Java joint compilation</a>.&nbsp; Specifically, we saw several different ways in which this functionality may be invoked covering a number of different tools.&nbsp; Among these tools was Buildr, a fast Ruby-based drop-in replacement for Maven with a penchant for simple configuration.&nbsp; In the article I mentioned that Buildr doesn&#8217;t actually have support for the Scala joint compiler out of the box.&nbsp; In fact, this feature actually requires the use of a Buildr fork I&#8217;ve been using to experiment with different extensions.&nbsp; Among these extensions is a feature I&#8217;ve been wanting from Buildr for a long time: the ability to launch a pre-configured interactive shell.</p>
<p>For those coming from a primarily-Java background, the concept of an interactive shell may seem a bit foreign.&nbsp; Basically, an interactive shell &mdash; or <a href="http://en.wikipedia.org/wiki/REPL"><em>REPL</em></a>, as it is often called &mdash; is a line-by-line language interpreter which allows you to execute snippets of code with immediate result.&nbsp; This has been a common tool in the hands of dynamic language enthusiasts since the days of LISP, but has only recently found its way into the world of mainstream static languages such as Scala.</p>
<div style="text-align:center;"><img src="http://www.codecommit.com/blog/wp-content/uploads/2009/01/interactive-shells.png" alt="interactive-shells.png" border="0"/></div>
<p>One of the most useful applications of these tools is the testing of code (particularly frameworks) before the implementations are fully completed.&nbsp; For example, when working on <a href="http://github.com/djspiewak/scala-collections/tree/master/src/main/scala/com/codecommit/collection/Vector.scala">my port of Clojure&#8217;s <code>PersistentVector</code></a>, I would often spin up a Scala shell to quickly test one aspect or another of the class.&nbsp; As a minor productivity plug, <a href="http://www.zeroturnaround.com/javarebel/">JavaRebel</a> is a truly <em>invaluable</em> tool for development of this variety.</p>
<p>The problem with this pattern of work is it requires the interactive shell in question to be pre-configured to include the project&#8217;s output directory on the CLASSPATH.&nbsp; While this isn&#8217;t usually so bad, things can get very sticky when you&#8217;re working with a project which includes a large number of dependencies.&nbsp; It isn&#8217;t too unreasonable to imagine shell invocations stretching into the tens of lines, just to spin up a &#8220;quick and dirty&#8221; test tool.</p>
<p>Further complicating affairs is the fact that many projects do away with the notion of fixed dependency paths and simply allow tools like Maven or Buildr to manage the CLASSPATH entirely out of sight.&nbsp; In order to fire up a Scala shell for a project with any external dependencies, I must first manually read my <code>buildfile</code>, parsing out all of the artifacts in use.&nbsp; Then I have to grope about in my <code>~/.m2/repository</code> directory until I find the JARs in question.&nbsp; Needless to say, the productivity benefits of this technique become extremely suspect after the first or second expedition.</p>
<p>For this reason, I strongly believe that the launch of an interactive shell should be the responsibility of the tool managing the dependencies, rather than that of the developer.&nbsp; Note that Maven already has some support for shells in conjunction with certain languages (Scala among them), but it is as crude and verbose as the tool itself.&nbsp; What I really want is to be able to invoke the following command and have the appropriate shell launched with a pre-configured CLASSPATH.&nbsp; I shouldn&#8217;t have to worry about the language of my project, the location of my repository or even if the shell requires extra configuration on my platform.&nbsp; The idea is that everything should all work auto-magically:</p>
<pre style="margin-left: 15px;">$ buildr shell</pre>
<p>This is exactly what the <a href="http://github.com/djspiewak/buildr/tree/interactive-shell"><code>interactive-shell</code> branch of my Buildr fork</a> is designed to accomplish.&nbsp; Whenever the <code>shell</code> task is invoked, Buildr looked through the current project and attempts to guess the language involved.&nbsp; This guesswork is required for a number of other features, so Buildr is actually pretty accurate in this area.&nbsp; If the language in question is Groovy or Scala, then the desired shell is obvious.&nbsp; Java does not have an integrated shell, which means that the default behavior on a Java project would be to raise an error.</p>
<p>However, the benefits of interactive shells are not limited to just the latest-and-greatest languages.&nbsp; I often use a Scala shell with Java projects, and for certain things a JRuby shell as well (<code>jirb</code>).&nbsp; Thus, my interactive shell extension also provides a mechanism to allow users to override the default shell on a per-project basis:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="ruby">define <span style="color: #cb0710;">'my-project'</span> <span style="color: #140dcc;">do</span>
  shell.<span class="me1">using</span> <span style="color: #ca9925;">:clj</span>
<span style="color: #140dcc;">end</span></pre></td></tr></table></div>

<p>With this configuration, regardless of the language used by the compiler for &#8220;my-project&#8221;, Buildr will launch the Clojure REPL whenever the <code>shell</code> task is invoked.&nbsp; The currently supported shells and their corresponding Buildr identifiers:</p>
<ul>
<li>Clojure&#8217;s REPL &mdash; <code>:clj</code></li>
<li>Groovy&#8217;s Shell &mdash; <code>:groovysh</code></li>
<li>JRuby&#8217;s IRB &mdash; <code>:jirb</code></li>
<li>Scala&#8217;s Shell &mdash; <code>:scala</code></li>
</ul>
<p>It is also possible to explicitly launch a specific shell.&nbsp; This is useful for situations where you might want to use the Scala shell for testing some things and the JRuby IRB for quickly prototyping other ideas (I do this a lot).&nbsp; The command to launch the JIRB shell in the context of <code>my-project</code> would be as follows:</p>
<pre style="margin-left: 15px;">$ buildr my-project:shell:jirb</pre>
<p>As a special value-added feature, all of these shells (except for Groovy&#8217;s, which is weird) will be automatically configured to use JavaRebel for the project compilation target classes if it can be automatically detected.&nbsp; This detection is performed by examining <code>REBEL_HOME</code>, <code>JAVA_REBEL</code>, <code>JAVAREBEL</code> and <code>JAVAREBEL_HOME</code> environment variables in order.&nbsp; If any one of these points to a directory which contains <code>javarebel.jar</code> or points directly to <code>javarebel.jar</code> itself, the configuration is assumed and the respective shell invocation is appropriately modified.</p>
<div style="text-align:center;"><img src="http://www.codecommit.com/blog/wp-content/uploads/2009/01/javarebel-integration.png" alt="javarebel-integration.png" border="0"/></div>
<p>Best of all, this support is implemented using a highly-extensible framework similar to Buildr&#8217;s own <code>Compiler</code> API.&nbsp; It&#8217;s very easy for plugin implementors or even average developers to simply drop-in a new shell provider, perhaps for an internal language or even some unexpected application.&nbsp; The core functionality of shell detection is integrated into Buildr itself, but this in no way hampers extensibility.&nbsp; For example, I could easily create a third-party <code>.rake</code> plugin for Buildr which added support for a whole new language (e.g. Haskell).&nbsp; In this plugin, I could also define a new shell provider which would be the default for projects using that language (e.g. GHCi).</p>
<h3>Open Question</h3>
<p>The good news is that this feature <a href="http://www.nabble.com/Interactive-Shell-Support-td21273331.html">has been discussed extensively</a> on the <code>buildr-user</code> mailing-list and the prevailing opinion seems to be that it should be folded into the main Buildr distribution.&nbsp; Exactly what form this will take has yet to be decided.&nbsp; The bad news is that there is still some dispute about a fundamental aspect of this feature&#8217;s operation.</p>
<p>The question revolves around what the exact behavior should be when the <code>shell</code> task is invoked.&nbsp; Should Buildr detect the project (or sub-project) you are in and automatically configure the shell&#8217;s CLASSPATH accordingly?&nbsp; This would give the interactive shell access to different classes depending on the current working directory.&nbsp; Alternatively, should there be one all-powerful shell per-<code>buildfile</code> configured at the root level?&nbsp; This would allow your shell to remain consistent throughout the project, regardless of your current directory.&nbsp; However, it would also mean that some configuration would be required in order to enable the functionality.&nbsp; (more details of this debate can be found <a href="http://www.nabble.com/Interactive-Shell-Support-td21273331.html">on the mailing-list</a>).</p>
<p>Additionally, what should the exact syntax be for invoking a specific shell?&nbsp; Rake 0.8 allows tasks to take parameters enclosed within square brackets.&nbsp; Thus, the syntax would be something more like the following:</p>
<pre style="margin-left: 15px">$ buildr collection:shell[jirb]</pre>
<p>In some sense, this is more logical since it reflects the fact that a single task, <code>shell</code>, is taking care of the work of invoking stuff.&nbsp; On the other hand, it&#8217;s a little less consistent with the rest of Buildr&#8217;s tasks, particularly things like &#8220;<code>test:TestClass</code>&#8221; and so on.&nbsp; This too is a matter which has yet to be settled.</p>
<p>All in all, this is a pretty experimental branch which is very open (and desirous) of outside input.&nbsp; How would you use a feature like this?&nbsp; Is there anything missing from what I have presented?&nbsp; What design path should be we take with regards to project-local vs global shell configurations?</p>
<p>If you feel like adding your voice to the chorus, feel free to leave a comment or (better yet) post a reply on the mailing-list thread.&nbsp; You&#8217;re also perfectly free to fork my remote branch at GitHub to better experiment with things yourself.&nbsp; The root of the whole plate of spaghetti is the <code>lib/buildr/shell.rb</code> file.&nbsp; <em>Bon appetit!</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/hacking-buildr-interactive-shell-support/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Gun for Hire (off topic)</title>
		<link>http://www.codecommit.com/blog/java/gun-for-hire-off-topic</link>
		<comments>http://www.codecommit.com/blog/java/gun-for-hire-off-topic#comments</comments>
		<pubDate>Wed, 07 Jan 2009 08:24:36 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/java/gun-for-hire-off-topic</guid>
		<description><![CDATA[Just in case you thought Christmas was over, I have a late gift for the world: I&#8217;m available for hire!&#160; Ok, so maybe this wasn&#8217;t exactly the stocking-stuffer you were expecting, but it&#8217;s the thought that counts.
I&#8217;m announcing my availability for employment as a part-time developer.&#160; Those of you who follow this blog are probably [...]]]></description>
			<content:encoded><![CDATA[<p>Just in case you thought Christmas was over, I have a late gift for the world: I&#8217;m available for hire!&nbsp; Ok, so maybe this wasn&#8217;t exactly the stocking-stuffer you were expecting, but it&#8217;s the thought that counts.</p>
<p>I&#8217;m announcing my availability for employment as a part-time developer.&nbsp; Those of you who follow this blog are probably already familiar with my areas of expertise, so I don&#8217;t think there is a need to bore you with a rehash.&nbsp; Resume available on request!</p>
<p>Anyway, my preference would be a project where I get to use multiple different languages, particularly Scala and Clojure, but I&#8217;m flexible.&nbsp; If you think my skills would make a positive addition to your team, <a href="mailto:djspiewak@gmail.com">shoot me an email</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/gun-for-hire-off-topic/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>How Do You Apply Polyglotism?</title>
		<link>http://www.codecommit.com/blog/java/how-do-you-apply-polyglotism</link>
		<comments>http://www.codecommit.com/blog/java/how-do-you-apply-polyglotism#comments</comments>
		<pubDate>Mon, 18 Aug 2008 07:00:00 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/java/how-do-you-apply-polyglotism</guid>
		<description><![CDATA[For the past two years or so, there has been an increasing meme across the developer blogosphere encouraging the application of the polyglot methodology.&#160; For those of you who have been living under a rock, the idea behind polyglot programming is that each section of a given project should use whatever language happens to be [...]]]></description>
			<content:encoded><![CDATA[<p>For the past two years or so, there has been an increasing meme across the developer blogosphere encouraging the application of the polyglot methodology.&#160; For those of you who have been living under a rock, the idea behind polyglot programming is that each section of a given project should use whatever language happens to be most applicable to the problem in question.&#160; This makes for a great topic for arm-chair bloggers, leading to endless pontification and flame-wars on forum after forum, but it seems to be a bit more difficult to apply in the real world.</p>
<p>The fact is that very few companies are open to the idea of diversity in language selection.&#160; Just look at Google, one of the most open-minded and developer-friendly companies around.&#160; They employ some of the smartest people I know, programmers who have actually <em>invented</em> languages with wide-scale adoption.&#160; However, this same company mandates the use of a very small set of languages including Python, Java, C++ and JavaScript.&#160; If a company like Google can&#8217;t even bring itself to dabble in language diversity, what hope do we have for the Apples of the world?</p>
<p>A few months ago, I received an internal email from the startup company where I work.&#160; This email was putting forth a new policy which would restrict all future developments to one of two languages: PHP or Java.&#160; In fact, this policy went on to push for the eventual rewrite of all legacy projects which had been written in other languages including Objective-C, Ruby, Python and a fair number of shell scripts.&#160; I was utterly flabbergasted (to say the least).&#160; A few swift emails later, we were able to come to a more moderate position, but the prevailing attitude remains extremely focused on minimizing the choice of languages.</p>
<p>To my knowledge, this sort of policy is fairly common in the industry.&#160; Companies (particularly those employing consultants) seem to prefer to keep the technologies employed to a minimum, focusing on the least-common denominator so as to reduce the requirements for incoming developer skill sets.&#160; This is rather distressing to me, because I get a great deal of pleasure out of solving problems differently using alternative languages.&#160; For example, I would have loved to build the clustering system at my company using the highly-scalable actor model with Scala, but the idea was shot down right out of the gate because it involved a non-mainstream language.&#160; To be fair to my colleagues, the overall design involved was given more serious consideration, but it was always within the confines of Java, rather than the original actor-driven concept.</p>
<p>There is actually another aspect to this question: assuming you are <em>allowed</em> to use a variety of languages to &quot;get the job done&quot;, how do you apply them?&#160; Ola Bini has talked about the various layers of a system, but this is harder to see in practice than it would seem.&#160; How do you define where to &quot;draw the line&quot; between using Java and Scala, or even the more dramatic differences between Java and JRuby or Groovy?&#160; Of course, we can base our decision strictly on lines of code, but in that case, Scala would trump Java every time.&#160; For that matter, Ruby would probably beat out the two of them, and I&#8217;m certainly not writing my next large-scale enterprise app exclusively in a dynamic language.</p>
<p>I realize this is somewhat of a cop-out post, just asking a question and never arriving at a satisfactory conclusion, but I would really like to know how other developers approach this issue.&#160; What criteria do you weigh in making the decision to go with a particular language?&#160; What sorts of languages work well for which tasks?&#160; And above all, how do you convince your boss that this is the right way to go?&#160; The floor is open, please enlighten me!&#160; <img src='http://www.codecommit.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/how-do-you-apply-polyglotism/feed</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Bencode Stream Parsing in Java</title>
		<link>http://www.codecommit.com/blog/java/bencode-stream-parsing-in-java</link>
		<comments>http://www.codecommit.com/blog/java/bencode-stream-parsing-in-java#comments</comments>
		<pubDate>Tue, 15 Jul 2008 07:00:00 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/java/bencode-stream-parsing-in-java</guid>
		<description><![CDATA[It’s surprising how universal XML has become.&#160; It doesn&#8217;t seem to matter what the problem, XML is the solution.&#160; For example, consider a simple client/server architecture where the communication protocol must transmit some sort of structured data.&#160; Nine developers out of ten will form the basis of the protocol around XML.&#160; If it’s a lot [...]]]></description>
			<content:encoded><![CDATA[<p>It’s surprising how universal XML has become.&#160; It doesn&#8217;t seem to matter what the problem, XML is the solution.&#160; For example, consider a simple client/server architecture where the communication protocol must transmit some sort of structured data.&#160; Nine developers out of ten will form the basis of the protocol around XML.&#160; If it’s a lot of data to be transferred, then they will compress the XML using Java’s stream compression libraries.&#160; If there’s binary data to be transmitted, it will either be stored as CDATA within the XML or as files within the same compressed archive.&#160; Very few developers will actually stop and consider alternative solutions.</p>
<p>One such &#8220;alternative solution&#8221; is <a href="http://en.wikipedia.org/wiki/Bencode">bencode</a> (pronounced &#8220;bee-encode&#8221;).&#160; Similar to formats like XML and JSON, bencode defines a series of constructs which may be used to encode arbitrarily complex data.&#160; However, unlike XML, the design focus of the format was not to produce verbose, human-readable documents, but rather to encode data in the most concise manner possible.&#160; To that end, the core bencode specification only includes four data types, two simple and two composite structures.&#160; These types are defined with an almost complete absence of meta, requiring very little “structure” to clutter the data stream.</p>
<p>Unfortunately, outside of applications like BitTorrent, this elegant binary format has seen remarkably little adoption.&#160; Because of this state of affairs, it can be extremely difficult to find libraries to actually process bencode data.&#160; Not too long ago, I ran into a production use-case which required both parsing and generation of bencode-formatted files.&#160; I considered digging into the source code for <a href="http://www.vuze.com">Vuze</a> (nee &#8220;Azureus&#8221;), but a) it seemed like a lot of boring, nearly-wasted effort, and b) I strongly suspect that their bencode parser and generator are extremely <a href="http://en.wikipedia.org/wiki/Computational_complexity_theory#Time_and_space_complexity">space inefficient</a>, since the data sources which they deal with are remarkably small.</p>
<p>The second hang-up was really a more significant motivator than the first, due to the fact that I knew I would be dealing with bencode streams potentially gigabytes in size.&#160; So, rather than fruitlessly dig through someone else’s code, I decided to put all of this <a href="http://www.codecommit.com/blog/scala/naive-text-parsing-in-scala">formal parser theory</a> to work and roll my own library.&#160; Unless you&#8217;re already familiar with bencode, I suggest you read <a href="http://en.wikipedia.org/wiki/Bencode">the Wikipedia article</a> to get a feel for the format, otherwise some of what I will be talking about will make no sense at all.&#160; <img src='http://www.codecommit.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>The first thing I needed to do was build the generation half of the library.&#160; I decided that it would be easier if I avoided trying to use the same backend framework classes with both the generator and the parser.&#160; For example, there are actually two classes in the framework which contain the logic for handling an integer: <code>IntegerValue</code> and <code>IntegerType</code>.&#160; The former is for use in the parser, while the latter is for use in the generator.&#160; This separation of logic may seem a little strange, but it actually <em>simplifies</em> things tremendously.</p>
<p>Remember my primary requirement: extremely efficient implementation of both generator and parser, especially with respect to space.&#160; If I attempted to use the same classes to represent data for both the parser and the generator, then the parser would be forced to read the entire stream into some sort of in-memory representation (think about it; it&#8217;s actually true).&#160; Obviously, this is unacceptable for streams that are gigabytes in size, so the traditional &#8220;good design&#8221; from an object-oriented standpoint was out.</p>
<h3>Stream Generation</h3>
<p>Since I needed the functionality of bencode stream generation before I needed parsing, I started with that aspect of the framework.&#160; Here again, the most obvious &#8220;object-oriented&#8221; approach would have been the wrong one.&#160; When we think of generating output in a structured format programmatically, we naturally imagine a DOM-like tree representation (preferably framework-agnostic) which is then walked by the framework to produce the output.&#160; The major disadvantage to this approach is that it requires paging <em>everything</em> into memory.&#160; This works for smaller applications or situations where the data is already in memory, but for my particular use-case, it would have been disastrous.</p>
<p>The only way to avoid paging everything into memory for stream generation is to structure the API so that the data is &#8220;pulled&#8221; by the generator, rather than &#8220;pushed&#8221; to it in tree-form.&#160; In other words, the data itself has to be lazy-loaded, using callbacks to grab the data as-needed and hold it in memory only as long as is absolutely necessary.&#160; In a functional language, this would be done with closures (or even normal data types in a pure-functional language).&#160; However, as we all know, Java does not support such time-saving features.&#160; The only recourse is to use abstract classes and interfaces which can be overridden in anonymous inner-classes as well as top-level classes as necessary.</p>
<p align="center"><img title="image" height="283" alt="image" src="http://www.codecommit.com/blog/wp-content/uploads/2008/07/image2.png" width="656" border="0" /> </p>
<p>After a bit of experimentation, the finalized hierarchy looks something like this.&#160; Logically, every type must be able to query its abstract method for data of a certain Java type (<code>long</code> for <code>IntegerType</code>, <code>InputStream</code> for <code>StringType</code>, etc), convert this data into bencode with the appropriate meta, and then write the result to a given <code>OutputStream</code>.&#160; Also following our nose, we see the semantic differences between composite and primitive types are really quite limited, especially if we simplify everything to a black box &#8220;get data / write encoding&#8221; methodology.&#160; In fact, the only thing that <code>CompositeType</code> actually does is enforce the prefix/suffix encoding of every composite type.&#160; Since this is in compliance with the bencode specification, we are safe in extracting this functionality into a superclass.</p>
<p>The more interesting distinction is between so-called &#8220;variant&#8221; and &#8220;invariant&#8221; types.&#160; This is where you should begin to notice that I have over-engineered this library to some degree.&#160; If I was just trying to create a pure bencode generator, then I could have skipped <code>InvariantPrimitiveType</code> and <code>VariantPrimitiveType</code> and just let <code>IntegerType</code> and <code>StringType</code> extend <code>PrimitiveType</code> directly.&#160; This comes back to my initial requirements.</p>
<p>Priority one was to create a framework which was blazingly fast, but priority two was to ensure that it was extensible at the type level.&#160; For the particular application I was interested in, I required more than just the core bencode types.&#160; Also on the agenda were proper UTF-8 strings, dates, and support for <code>null</code>.&#160; To accommodate all of this without too much code duplication, I knew I would have to extract a lot of the functionality into generic superclasses.&#160; Hence my somewhat incorrect use of the terms &#8220;variant&#8221; and &#8220;invariant&#8221; to describe the difference between the integer type &#8211; which is prefix/suffix delimited &#8211; and the string type &#8211; which defines a length as its prefix and has no closing suffix.</p>
<p>Anyway, back to the problem at hand.&#160; In addition to the <code>CompositeType</code> and <code>PrimitiveType</code>, you should also notice <code>EntryType</code>.&#160; This &#8220;extra&#8221; type exists to handle the fact that bencode dictionaries are extremely weird and sit rather outside the &#8220;common functionality&#8221; umbrella of the format in general.&#160; For one thing, the specification requires that dictionary entries be sorted by key, obviously implying some sort of <code>Comparable</code> relation.&#160; Moreover, these keys must be themselves strings, but <code>StringType</code> isn&#8217;t comparable because its <code>writeValue(OutputStream)</code> method doesn&#8217;t return the data in question, but merely writes it to a given <code>OutputStream</code>.&#160; Are we starting to see the problems with space-efficient implementations?</p>
<p>Enough babble though, let&#8217;s see some code!&#160; Here&#8217;s how we might encode some very simple data using the generator framework:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> GeneratorTest <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> <span style="color: #140dcc;">static</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">main</span><span class="br0">&#40;</span>String<span class="br0">&#91;</span><span class="br0">&#93;</span> args<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        ByteArrayOutputStream os = <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">ByteArrayOutputStream</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
        <span style="color: #140dcc;">final</span> <span style="color: #857d1f;">byte</span><span class="br0">&#91;</span><span class="br0">&#93;</span> picture = <span style="color: #140dcc;">new</span> <span style="color: #857d1f;">byte</span><span class="br0">&#91;</span><span style="color: #cb0710;">0</span><span class="br0">&#93;</span>;        <span style="color: #445fba;">// presumably something interesting</span>
&nbsp;
        DictionaryType root = <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">DictionaryType</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #663f31;">@Override</span>
            <span style="color: #140dcc;">protected</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">populate</span><span class="br0">&#40;</span>SortedSet&lt;EntryType&lt;?&gt;&gt; entries<span class="br0">&#41;</span> <span class="br0">&#123;</span>
                entries.<span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #140dcc;">new</span> EntryType&lt;LiteralStringType&gt;<span class="br0">&#40;</span>
                        <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">LiteralStringType</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;name&quot;</span><span class="br0">&#41;</span>, 
                        <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">LiteralStringType</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Arthur Dent&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
                entries.<span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #140dcc;">new</span> EntryType&lt;LiteralStringType&gt;<span class="br0">&#40;</span>
                        <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">LiteralStringType</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;number&quot;</span><span class="br0">&#41;</span>, 
                        <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IntegerType</span><span class="br0">&#40;</span><span style="color: #cb0710;">42</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
&nbsp;
                entries.<span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #140dcc;">new</span> EntryType&lt;LiteralStringType&gt;<span class="br0">&#40;</span>
                        <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">LiteralStringType</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;picture&quot;</span><span class="br0">&#41;</span>, 
                        <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">StringType</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
&nbsp;
                    <span style="color: #663f31;">@Override</span>
                    <span style="color: #140dcc;">protected</span> <span style="color: #857d1f;">long</span> <span style="color: #2e7c0f;">getLength</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
                        <span style="color: #140dcc;">return</span> picture.<span class="me1">length</span>;
                    <span class="br0">&#125;</span>
&nbsp;
                    <span style="color: #663f31;">@Override</span>
                    <span style="color: #140dcc;">protected</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">writeValue</span><span class="br0">&#40;</span>OutputStream os<span class="br0">&#41;</span> <span style="color: #140dcc;">throws</span> IOException <span class="br0">&#123;</span>
                        os.<span style="color: #2e7c0f;">write</span><span class="br0">&#40;</span>picture<span class="br0">&#41;</span>;
                    <span class="br0">&#125;</span>
                <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
&nbsp;
                entries.<span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #140dcc;">new</span> EntryType&lt;LiteralStringType&gt;<span class="br0">&#40;</span>
                        <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">LiteralStringType</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;planets&quot;</span><span class="br0">&#41;</span>, 
                        <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">ListType</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
&nbsp;
                    <span style="color: #663f31;">@Override</span>
                    <span style="color: #140dcc;">protected</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">populate</span><span class="br0">&#40;</span>ListTypeStream list<span class="br0">&#41;</span> <span style="color: #140dcc;">throws</span> IOException <span class="br0">&#123;</span>
                        list.<span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">LiteralStringType</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Earth&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
                        list.<span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">LiteralStringType</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Somewhere else&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
                        list.<span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">LiteralStringType</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Old Earth&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
                    <span class="br0">&#125;</span>
                <span class="br0">&#125;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
            <span class="br0">&#125;</span>
        <span class="br0">&#125;</span>;
&nbsp;
        <span style="color: #140dcc;">try</span> <span class="br0">&#123;</span>
            root.<span style="color: #2e7c0f;">write</span><span class="br0">&#40;</span>os<span class="br0">&#41;</span>;
        <span class="br0">&#125;</span> <span style="color: #140dcc;">catch</span> <span class="br0">&#40;</span>IOException e<span class="br0">&#41;</span> <span class="br0">&#123;</span>
            e.<span style="color: #2e7c0f;">printStackTrace</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
        <span class="br0">&#125;</span>
&nbsp;
        System.<span class="me1">out</span>.<span style="color: #2e7c0f;">println</span><span class="br0">&#40;</span><span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">String</span><span class="br0">&#40;</span>os.<span style="color: #2e7c0f;">toByteArray</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">private</span> <span style="color: #140dcc;">static</span> <span style="color: #857d1f;">class</span> LiteralStringType <span style="color: #140dcc;">extends</span> StringType 
            <span style="color: #140dcc;">implements</span> Comparable&lt;LiteralStringType&gt; <span class="br0">&#123;</span>
        <span style="color: #140dcc;">private</span> <span style="color: #140dcc;">final</span> String value;
&nbsp;
        <span style="color: #140dcc;">public</span> <span style="color: #2e7c0f;">LiteralStringType</span><span class="br0">&#40;</span>String value<span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #a40d14;">this</span>.<span class="me1">value</span> = value;
        <span class="br0">&#125;</span>
&nbsp;
        <span style="color: #663f31;">@Override</span>
        <span style="color: #140dcc;">protected</span> <span style="color: #857d1f;">long</span> <span style="color: #2e7c0f;">getLength</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">return</span> value.<span style="color: #2e7c0f;">length</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
        <span class="br0">&#125;</span>
&nbsp;
        <span style="color: #663f31;">@Override</span>
        <span style="color: #140dcc;">protected</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">writeValue</span><span class="br0">&#40;</span>OutputStream os<span class="br0">&#41;</span> <span style="color: #140dcc;">throws</span> IOException <span class="br0">&#123;</span>
            os.<span style="color: #2e7c0f;">write</span><span class="br0">&#40;</span>value.<span style="color: #2e7c0f;">getBytes</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;US-ASCII&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
        <span class="br0">&#125;</span>
&nbsp;
        <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">int</span> <span style="color: #2e7c0f;">compareTo</span><span class="br0">&#40;</span>LiteralStringType o<span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">return</span> o.<span class="me1">value</span>.<span style="color: #2e7c0f;">compareTo</span><span class="br0">&#40;</span>value<span class="br0">&#41;</span>;
        <span class="br0">&#125;</span>
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>It&#8217;s hard to imagine why some people claim that Java is a verbose language&#8230;</p>
<p>The API may seem a little clumsy, but most of that is caused by the conniptions required to make the generator lazily pull the data, rather than paging it all into memory ahead of time.&#160; Throwing that aside, the rest of the verbosity seems to come from the need for <code>LiteralStringType</code>, rather than just having a <code>StringType</code> which could handle this for us.&#160; The reason for this extra headache is shown in the population of the &#8220;picture&#8221; field, which presumably may contain several megabytes worth of data from some external source such as a file or database (in this case of course, it doesn&#8217;t contain anything, but that&#8217;s besides the point).</p>
<p>The result of the above is as follows:</p>
<pre>d4:name11:Arthur Dent6:numberi42e7:picture0:7:planetsl5:Earth14:Somewhere else9:Old Earthee</pre>
<p>Or, with a little formatting to make it more palatable:</p>
<pre>d
  4:name
  11:Arthur Dent

  6:number
  i42e

  7:picture
  0:

  7:planets
  l
    5:Earth
    14:Somewhere else
    9:Old Earth
  e
e</pre>
<p>Technically, this is no longer valid bencode, but it is much easier to read this way.</p>
<h3>The Parser</h3>
<p>With all this bustle surrounding the generator, it&#8217;s easy to forget about the inverse process: parsing.&#160; As it turns out, this is both easier and far less elegant than the solution for the generator (I know, it&#8217;s a sad state of affairs when the above is considered &#8220;elegant&#8221;).&#160; Here again, there was a need for the parser to be extremely efficient, especially in terms of memory.&#160; Thus, the logical approach of simply parsing the stream into an in-memory tree doesn&#8217;t really work.&#160; Instead, the parser must be a so-called &#8220;pull parser&#8221;, which only parses each token upon request.&#160; The parser only does exactly what work you ask of it, nothing more.</p>
<p>My initial designs for the parser attempted to follow the example set by the generator: each value type self-contained, responsible for parsing its own format.&#160; As it turns out, this can be difficult to accomplish.&#160; I could have expanded slightly on the parser combinator concept, but monads are very clumsy to achieve in Java, which led me to rule out that option.&#160; In the end, I took a middle ground.</p>
<p><center><a href="http://www.codecommit.com/blog/misc/parser-classes.png"><img title="Click for full size" height="188" alt="Click for full size" src="http://www.codecommit.com/blog/wp-content/uploads/2008/07/image3.png" width="467" border="0" /></a></center> </p>
<p>As before, a common superinterface sits above the entire representative hierarchy.&#160; To understand this hierarchy a little better, perhaps it would be helpful to look at the full source for <code>Value</code>:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">interface</span> Value&lt;T&gt; <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> T <span style="color: #2e7c0f;">resolve</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span style="color: #140dcc;">throws</span> IOException;
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">boolean</span> <span style="color: #2e7c0f;">isResolved</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>The <code>resolve()</code> method is really the core of the entire parser.&#160; The concept is that each value will be able to consume the bytes necessary to determine its own value, which is converted and returned.&#160; This is extremely convenient because it enables <code>VariantValue</code>(s) (such as string) to carry the logic for parsing to a specific length, rather than the conventional <strong>e</strong> terminator.&#160; In order to avoid clogging up memory, the return value of <code>resolve()</code> should <em>not</em> be <a href="http://en.wikipedia.org/wiki/Memoization">memoized</a> (though, there is nothing in the framework to prevent it).&#160; Conventionally, values which are already resolved should throw an exception if they are resolved a second time.&#160; This prevents the framework from holding onto values which are no longer needed.</p>
<p>You will also notice that <code>CompositeValue</code> not only inherits from <code>Value</code>, but also from the JDK interface, <code>Iterable</code>.&#160; Logically, a composite value is a linear collection of values, consumed one at a time.&#160; To me, that sounds a lot like a unidirectional iterator.&#160; We can, of course, resolve the entire composite at once, mindlessly consuming all of its values, but since all of the values are lost once consumed, the only purpose for such an action would be if we <em>know</em> that we don&#8217;t care about a particular composite and we just want to rapidly skip to the next value in the stream.</p>
<p>Returning to primitive values, the resolve() method for IntegerValue is worthy of note, not so much for its uniqueness, but because it is very similar to the parsing technique used in all the other values:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> Long <span style="color: #2e7c0f;">resolve</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span style="color: #140dcc;">throws</span> IOException <span class="br0">&#123;</span>
    <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>resolved<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">throw</span> <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IOException</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Value already resolved&quot;</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
    resolved = <span style="color: #a40d14;">true</span>;
&nbsp;
    <span style="color: #857d1f;">boolean</span> negative = <span style="color: #a40d14;">false</span>;
    <span style="color: #857d1f;">long</span> value = <span style="color: #cb0710;">0</span>;
&nbsp;
    <span style="color: #857d1f;">int</span> b = <span style="color: #cb0710;">0</span>;
    <span style="color: #140dcc;">while</span> <span class="br0">&#40;</span><span class="br0">&#40;</span>b = is.<span style="color: #2e7c0f;">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> &gt;= <span style="color: #cb0710;">0</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #857d1f;">int</span> digit = b - <span style="color: #cb0710;">'0'</span>;
&nbsp;
        <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>digit &lt; <span style="color: #cb0710;">0</span> || digit &gt; <span style="color: #cb0710;">9</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>b == <span style="color: #cb0710;">'-'</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
                negative = <span style="color: #a40d14;">true</span>;
            <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>b == <span style="color: #cb0710;">'e'</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
                <span style="color: #140dcc;">break</span>;
            <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span class="br0">&#123;</span>
                <span style="color: #140dcc;">throw</span> <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IOException</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Unexpected character in integer value: &quot;</span> 
                    + Character.<span style="color: #2e7c0f;">forDigit</span><span class="br0">&#40;</span>b, <span style="color: #cb0710;">10</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
            <span class="br0">&#125;</span>
        <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span class="br0">&#123;</span>
            value = <span class="br0">&#40;</span>value * <span style="color: #cb0710;">10</span><span class="br0">&#41;</span> + digit;
        <span class="br0">&#125;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>negative<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        value *= <span style="color: #cb0710;">-1</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">return</span> value;
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>The <strong>i</strong> prefix itself is consumed before control flow even enters this method.&#160; This is because the prefix is required to determine the appropriate value implementation to use.&#160; Specifically, the logic to perform this determination is contained within the <code>Parser</code> class, which maintains a map of <code>Value</code>(s) and their associated prefixes.&#160; String values have special logic associated with them, as they do not have a prefix.</p>
<p>As with most hand-coded parsers, this one operates on the principle of &#8220;eat until it hurts&#8221;.&#160; We start out by assuming that the integer value extends to the end of the stream, then we set about to find a premature end to the integer, at which point we break out and call it a day.&#160; Since we are moving from left to right through a base-10 integer, we must multiply the current accumulator by 10 prior to adding the new digit.&#160; </p>
<p>Actually, the real heart of the parser framework is <code>CompositeValue</code>.&#160; This class is inherited by Parser to define a special value encompassing the stream itself (which is viewed as a composite value with no delimiters and only a single child).&#160; This unification allows us to keep the code for parsing a composite stream in a single location.&#160; This implementation is a little less concise than the code for parsing an integer, but it follows the same pattern and is fairly instructive:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">protected</span> <span style="color: #140dcc;">final</span> Value&lt;?&gt; <span style="color: #2e7c0f;">parse</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span style="color: #140dcc;">throws</span> IOException <span class="br0">&#123;</span>
    <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>resolved<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">throw</span> <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IOException</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Composite value already resolved&quot;</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>previous != <span style="color: #a40d14;">null</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>!previous.<span style="color: #2e7c0f;">isResolved</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
            previous.<span style="color: #2e7c0f;">resolve</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;        <span style="color: #445fba;">// ensure we're at the right spot in the stream</span>
        <span class="br0">&#125;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #857d1f;">byte</span> b = <span style="color: #cb0710;">-1</span>;
    <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>readAhead <span style="color: #140dcc;">instanceof</span> Some<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        b = readAhead.<span style="color: #2e7c0f;">value</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
        readAhead = <span style="color: #140dcc;">new</span> None&lt;Byte&gt;<span class="br0">&#40;</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span class="br0">&#123;</span>
        b = <span style="color: #2e7c0f;">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>b &gt;= <span style="color: #cb0710;">0</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        Class&lt;? <span style="color: #140dcc;">extends</span> Value&lt;?&gt;&gt; valueType = parser.<span style="color: #2e7c0f;">getValueType</span><span class="br0">&#40;</span>b<span class="br0">&#41;</span>;
&nbsp;
        <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>valueType != <span style="color: #a40d14;">null</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">return</span> previous = Parser.<span style="color: #2e7c0f;">createValue</span><span class="br0">&#40;</span>valueType, parser, is<span class="br0">&#41;</span>;
        <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>b &gt; <span style="color: #cb0710;">'0'</span> &amp;&amp; b &lt;= <span style="color: #cb0710;">'9'</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">return</span> previous = <span style="color: #2e7c0f;">readString</span><span class="br0">&#40;</span>b - <span style="color: #cb0710;">'0'</span><span class="br0">&#41;</span>;
        <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>b == <span style="color: #cb0710;">' '</span> || b == <span style="color: #cb0710;">'<span class="es0">\n</span>'</span> || b == <span style="color: #cb0710;">'<span class="es0">\r</span>'</span> || b == <span style="color: #cb0710;">'<span class="es0">\t</span>'</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">return</span> <span style="color: #2e7c0f;">parse</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;        <span style="color: #445fba;">// loop state</span>
        <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">throw</span> <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IOException</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Unexpected character in the parse stream: &quot;</span> 
                + Character.<span style="color: #2e7c0f;">forDigit</span><span class="br0">&#40;</span>b, <span style="color: #cb0710;">10</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
        <span class="br0">&#125;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">throw</span> <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IOException</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Unexpected end of stream in composite value&quot;</span><span class="br0">&#41;</span>;
<span class="br0">&#125;</span>
&nbsp;
<span style="color: #140dcc;">private</span> <span style="color: #140dcc;">final</span> StringValue <span style="color: #2e7c0f;">readString</span><span class="br0">&#40;</span><span style="color: #857d1f;">long</span> length<span class="br0">&#41;</span> <span style="color: #140dcc;">throws</span> IOException <span class="br0">&#123;</span>
    <span style="color: #857d1f;">int</span> i = is.<span style="color: #2e7c0f;">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span>;
&nbsp;
    <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>i &gt;= <span style="color: #cb0710;">0</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #857d1f;">byte</span> b = <span class="br0">&#40;</span><span style="color: #857d1f;">byte</span><span class="br0">&#41;</span> i;
&nbsp;
        <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>b == <span style="color: #cb0710;">':'</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">return</span> Parser.<span style="color: #2e7c0f;">createValue</span><span class="br0">&#40;</span>StringValue.<span style="color: #857d1f;">class</span>, parser, 
                <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">SubStream</span><span class="br0">&#40;</span>is, length<span class="br0">&#41;</span><span class="br0">&#41;</span>;
        <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>b &gt;= <span style="color: #cb0710;">'0'</span> &amp;&amp; b &lt;= <span style="color: #cb0710;">'9'</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">return</span> <span style="color: #2e7c0f;">readString</span><span class="br0">&#40;</span><span class="br0">&#40;</span>length * <span style="color: #cb0710;">10</span><span class="br0">&#41;</span> + b - <span style="color: #cb0710;">'0'</span><span class="br0">&#41;</span>;
        <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span class="br0">&#123;</span>
            <span style="color: #140dcc;">throw</span> <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IOException</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Unexpected character in string value: &quot;</span> 
                + Character.<span style="color: #2e7c0f;">forDigit</span><span class="br0">&#40;</span>i, <span style="color: #cb0710;">10</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
        <span class="br0">&#125;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">throw</span> <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IOException</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;Unexpected end of stream in string value&quot;</span><span class="br0">&#41;</span>;
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>It seems a bit imposing, but really this code is more of the same logic we saw previously when dealing with integers.&#160; The only value type which really gives us trouble here is string.&#160; We can&#8217;t simply treat it like the others because it has no prefix.&#160; For this reason, we must assume that <em>any</em> unbound integer is an inclusive prefix for a string.&#160; In most parser implementations, this would require backtracking, but because we are doing this by hand, we can condense the backtrack into an inherited parameter (borrowing terminology from <a href="http://en.wikipedia.org/wiki/Attribute_grammar">attribute grammars</a>), avoiding the performance hit.</p>
<p>There&#8217;s one final bit of weirdness which deserves attention before we bail on this small epic: dictionary values.&#160; Intuitively, a dictionary value should be parsed into a Java <code>Map</code>, or some sort of associative data structure.&#160; Unfortunately, a map is by definition a random access data structure.&#160; Since we are dealing with a sequential bencode <em>stream</em>, the only recourse to satisfy this property would be to page the entire dictionary into memory.&#160; This of course violates one of the primary requirements which is to avoid using more memory than necessary.</p>
<p>The solution I eventually chose to this problem was to limit dictionary access to sequential, which translates into alphabetical given the nature of bencode dictionaries.&#160; Thus, a dictionary can be parsed in the same way as a list, where each element is a sequential key and value, jointly represented by <code>EntryValue</code>.&#160; To make usage patterns slightly easier, <code>EntryValue</code> memoizes the key and value.&#160; Due to the fact that both of these objects are themselves <code>Value</code>(s), this does not lead to inadvertent memory bloat.</p>
<h3>Conclusion</h3>
<p>Hopefully the parser and generator presented here will be of some utility in situations where you have to parse large volumes of bencoded data.&#160; The API is (admittedly) bizarre and difficult to deal with, but the performance results are difficult to deny.&#160; This framework is currently deployed in production, where benchmarks have shown that it imposes little-to-no runtime overhead, and practically zero memory overhead (despite the sizeable amounts of data being processed).</p>
<p>For convenience, I actually created a <a href="http://code.google.com/p/jbencode/">Google Code project</a> for this framework so as to facilitate its development internally to the project I was working on.&#160; The end result of this is unlike most of my experiments, there is actually a proper SVN from which the source may be obtained!&#160; A packaged JAR may be obtained from the downloads section.</p>
<ul>
<li>Download <a href="http://jbencode.googlecode.com/files/jbencode.jar">jbencode.jar</a></li>
<li><a href="http://code.google.com/p/jbencode/source/checkout">Full sources</a> (SVN)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/bencode-stream-parsing-in-java/feed</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>The Need for a Common Compiler Framework</title>
		<link>http://www.codecommit.com/blog/java/the-need-for-a-common-compiler-framework</link>
		<comments>http://www.codecommit.com/blog/java/the-need-for-a-common-compiler-framework#comments</comments>
		<pubDate>Mon, 23 Jun 2008 07:00:08 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Scala]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/java/the-need-for-a-common-compiler-framework</guid>
		<description><![CDATA[In recent years, we have seen a dramatic rise in the number of languages used in mainstream projects.&#160; In particular, languages which run on the JVM or CLR have become quite popular (probably because sane people hate dealing with x86 assembly).&#160; Naturally, such languages prefer to interoperate with other languages built on these core platforms, [...]]]></description>
			<content:encoded><![CDATA[<p>In recent years, we have seen a dramatic rise in the number of languages used in mainstream projects.&nbsp; In particular, languages which run on the JVM or CLR have become quite popular (probably because sane people hate dealing with x86 assembly).&nbsp; Naturally, such languages prefer to interoperate with other languages built on these core platforms, particularly Java and C# (respectively).&nbsp; Collectively, years of effort have been put into devising and implementing better ways of working with libraries written in these &#8220;parent languages&#8221;.&nbsp; The problem is that such efforts are crippled by one fundamental limitation: circular dependencies.</p>
<p>Let&#8217;s take Scala as an example.&nbsp; Of all of the JVM languages, this one probably has the potential for the tightest integration with Java.&nbsp; Even Groovy, which is renowned for its integration, still falls short in many key areas.&nbsp; (generics, anyone?)&nbsp; With Scala, every class is a Java class, every method is a Java method, and there is no API which cannot be accessed from Java as natively as any other.&nbsp; For example, I can write a simple linked list implementation in Scala and then use it in Java without any fuss whatsoever (<b>warning:</b> untested sample):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">class</span> LinkedList<span style="color: #7f0055;"><span class="br0">&#91;</span>T<span class="br0">&#93;</span></span> <span class="br0">&#123;</span>
  <span style="color: #140dcc;">private</span> <span style="color: #140dcc;">var</span> root: Node = _
&nbsp;
  <span style="color: #140dcc;">def</span> <span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span>data: T<span class="br0">&#41;</span> = <span class="br0">&#123;</span>
    <span style="color: #140dcc;">val</span> insert = <span style="color: #2e7c0f;">Node</span><span class="br0">&#40;</span>data, <span style="color: #a40d14;">null</span><span class="br0">&#41;</span>
&nbsp;
    <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>root == <span style="color: #a40d14;">null</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
      root = insert
    <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span class="br0">&#123;</span>
      root.<span class="me1">next</span> = insert
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #a40d14;">this</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span style="color: #140dcc;">def</span> <span style="color: #2e7c0f;">get</span><span class="br0">&#40;</span>index: <span style="color: #800080;">Int</span><span class="br0">&#41;</span> = <span class="br0">&#123;</span>
    <span style="color: #140dcc;">def</span> <span style="color: #2e7c0f;">walk</span><span class="br0">&#40;</span>node: Node, current: <span style="color: #800080;">Int</span><span class="br0">&#41;</span>: T = <span class="br0">&#123;</span>
      <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>node == <span style="color: #a40d14;">null</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #140dcc;">throw</span> <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IndexOutOfBoundsException</span><span class="br0">&#40;</span>index.<span class="me1">toString</span><span class="br0">&#41;</span>
      <span class="br0">&#125;</span>
&nbsp;
      <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>current &lt; index<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #2e7c0f;">walk</span><span class="br0">&#40;</span>node.<span class="me1">next</span>, current + <span style="color: #cb0710;">1</span><span class="br0">&#41;</span>
      <span class="br0">&#125;</span> <span style="color: #140dcc;">else</span> <span class="br0">&#123;</span>
        node.<span class="me1">data</span>
      <span class="br0">&#125;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>index &lt; <span style="color: #cb0710;">0</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
      <span style="color: #140dcc;">throw</span> <span style="color: #140dcc;">new</span> <span style="color: #2e7c0f;">IndexOutOfBoundsException</span><span class="br0">&#40;</span>index.<span class="me1">toString</span><span class="br0">&#41;</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #2e7c0f;">walk</span><span class="br0">&#40;</span>root, <span style="color: #cb0710;">0</span><span class="br0">&#41;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span style="color: #140dcc;">def</span> size = <span class="br0">&#123;</span>
    <span style="color: #140dcc;">def</span> <span style="color: #2e7c0f;">walk</span><span class="br0">&#40;</span>node: Node<span class="br0">&#41;</span>: <span style="color: #800080;">Int</span> = <span style="color: #140dcc;">if</span> <span class="br0">&#40;</span>node == <span style="color: #a40d14;">null</span><span class="br0">&#41;</span> <span style="color: #cb0710;">0</span> <span style="color: #140dcc;">else</span> <span style="color: #cb0710;">1</span> + <span style="color: #2e7c0f;">walk</span><span class="br0">&#40;</span>node.<span class="me1">next</span><span class="br0">&#41;</span>
&nbsp;
    <span style="color: #2e7c0f;">walk</span><span class="br0">&#40;</span>root<span class="br0">&#41;</span>
  <span class="br0">&#125;</span>
&nbsp;
  <span style="color: #140dcc;">private</span> <span style="color: #140dcc;">case</span> <span style="color: #140dcc;">class</span> <span style="color: #2e7c0f;">Node</span><span class="br0">&#40;</span>data: T, <span style="color: #140dcc;">var</span> next: Node<span class="br0">&#41;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Once this class is compiled, we can use it in our Java code just as if it were written within the language itself:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> Driver <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> <span style="color: #140dcc;">static</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">main</span><span class="br0">&#40;</span>String<span class="br0">&#91;</span><span class="br0">&#93;</span> args<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        LinkedList&lt;String&gt; list = <span style="color: #140dcc;">new</span> LinkedList&lt;String&gt;<span class="br0">&#40;</span><span class="br0">&#41;</span>;
&nbsp;
        <span style="color: #140dcc;">for</span> <span class="br0">&#40;</span>String arg : args<span class="br0">&#41;</span> <span class="br0">&#123;</span>
            list.<span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span>arg<span class="br0">&#41;</span>;
        <span class="br0">&#125;</span>
&nbsp;
        System.<span class="me1">out</span>.<span style="color: #2e7c0f;">println</span><span class="br0">&#40;</span><span style="color: #cb0710;">&quot;List has size: &quot;</span> + list.<span style="color: #2e7c0f;">size</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
&nbsp;
        <span style="color: #140dcc;">for</span> <span class="br0">&#40;</span><span style="color: #857d1f;">int</span> i = <span style="color: #cb0710;">0</span>; i &lt; list.<span style="color: #2e7c0f;">size</span><span class="br0">&#40;</span><span class="br0">&#41;</span>; i++<span class="br0">&#41;</span> <span class="br0">&#123;</span>
            System.<span class="me1">out</span>.<span style="color: #2e7c0f;">println</span><span class="br0">&#40;</span>list.<span style="color: #2e7c0f;">get</span><span class="br0">&#40;</span>i<span class="br0">&#41;</span>.<span style="color: #2e7c0f;">trim</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
        <span class="br0">&#125;</span>
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Impressively seamless interoperability!&nbsp; We actually could have gotten really fancy and thrown in some operator overloading.&nbsp; Obviously, Java wouldn&#8217;t have been able to use the operators themselves, but it still would have been able to call them just like normal Java instance methods.&nbsp; Using Scala in this way, we can get all the advantages of its concise syntax and slick design without really abandoning our Java code base.</p>
<p>The problem comes in when we try to satisfy more complex cases.&nbsp; Groovy proponents often trot out the example of a Java class inherited by a Groovy class which is in turn inherited by another Java class.&nbsp; In Scala, that would be doing something like this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> Shape <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> <span style="color: #140dcc;">abstract</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">draw</span><span class="br0">&#40;</span>Canvas c<span class="br0">&#41;</span>;
<span class="br0">&#125;</span></pre></td></tr></table></div>


<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">class</span> <span style="color: #2e7c0f;">Rectangle</span><span class="br0">&#40;</span><span style="color: #140dcc;">val</span> width: <span style="color: #800080;">Int</span>, <span style="color: #140dcc;">val</span> height: <span style="color: #800080;">Int</span><span class="br0">&#41;</span> <span style="color: #140dcc;">extends</span> Shape <span class="br0">&#123;</span>
  <span style="color: #140dcc;">override</span> <span style="color: #140dcc;">def</span> <span style="color: #2e7c0f;">draw</span><span class="br0">&#40;</span>c: Canvas<span class="br0">&#41;</span> <span class="br0">&#123;</span>
    <span style="color: #999999;">// ...</span>
  <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>


<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> Square <span style="color: #140dcc;">extends</span> Rectangle <span class="br0">&#123;</span>
    <span style="color: #140dcc;">public</span> <span style="color: #2e7c0f;">Square</span><span class="br0">&#40;</span><span style="color: #857d1f;">int</span> size<span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #2e7c0f;">super</span><span class="br0">&#40;</span>size, size<span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>Unfortunately, this isn&#8217;t exactly possible in Scala.&nbsp; Well, I take that back.&nbsp; We can cheat a bit and first compile <code>Shape</code> using javac, then compile <code>Rectangle</code> using scalac and finally <code>Square</code> using javac, but that would be quite nasty indeed.&nbsp; What&#8217;s worse is such a technique would completely fall over if the <code>Canvas</code> class were to have a dependency on <code>Rectangle</code>, something which isn&#8217;t too hard to imagine.&nbsp; In short, Scala is bound by the limitations of a separate compiler, as are most languages on the JVM.</p>
<p>Groovy solves this problem by building their own Java compiler into groovyc, thus allowing the compilation of both Java and Groovy sources within the same process.&nbsp; This solves the problem of circular references because neither set of sources is completely compiled before the other.&nbsp; It&#8217;s a nice solution, and one which Scala will be adopting in an upcoming release of its compiler.&nbsp; However, it doesn&#8217;t really solve everything.</p>
<p>Consider a more complex scenario.&nbsp; Imagine we have Java class <code>Shape</code>, which is extended by Scala class <code>Rectangle</code> <em>and</em> Groovy class <code>Circle</code>.&nbsp; Imagine also that class <code>Canvas</code> has a dependency on both <code>Rectangle</code> and <code>Circle</code>, perhaps for some special graphics optimizations.&nbsp; Suddenly we have a three-way circular dependency and no way of resolving it without a compiler which can handle <em>all three</em> languages: Java, Groovy and Scala.&nbsp; This is starting to become a bit more interesting.</p>
<p>Of course, we can solve this problem in the same way we solved the Groovy-Java dependence problem: just add support to the compiler!&nbsp; Unfortunately, it may have been trivial to implement a Java compiler as part of groovyc, but Scala is a much more difficult language from a compiler&#8217;s point of view.&nbsp; But even supposing that we do create an integrated Scala compiler, we still haven&#8217;t solved the problem.&nbsp; It&#8217;s not difficult to imagine throwing another language into the mix; Clojure, for example.&nbsp; Do we keep going, tacking languages onto our once-Groovy compiler until we support everything usable on the JVM?&nbsp; It should be obvious why this is a bad plan.</p>
<p>A more viable solution would be to create a common compiler framework, one which would be used as the basis for all JVM languages.&nbsp; This framework would have common abstractions for things like name resolution and type checking.&nbsp; Instead of creating an entire compiler from scratch, every language would simply extend this core framework and implement their own language as some sort of module.&nbsp; In this way, it would be easy to build up a custom set of modules which solve the needs of your project.&nbsp; Since the compilers are modular and based on the same core framework, they would be able to handle simultaneous compilation of all JVM languages involved, effectively solving the circular dependency problem in a generalized fashion.</p>
<p>The framework could even make things easier on would-be compiler implementors by handling common operations like bytecode emission.&nbsp; Fundamentally, all of these tightly-integrated languages are just different front-ends to a common backend: the JVM.&nbsp; I haven&#8217;t looked at the sources, but I would imagine that there is a <em>lot</em> of work which had to be done in each compiler to solve problems which were already handled in another.</p>
<p>Of course, all this is purely speculative.&nbsp; Everyone builds their compiler in a slightly different way (slightly => radically in the case of languages like Scala) and I wouldn&#8217;t imagine that it would be easy to build this sort of common compiler backend.&nbsp; However, the technology is in place.&nbsp; We already have nice module systems like OSGi, and we&#8217;re certainly no strangers to the work involved in building up a proper CLASSPATH for a given project.&nbsp; Why should this be any different?</p>
<p>It&#8217;s not without precedent either.&nbsp; GCC defines a common backend for a number of compilers, such as G++, GCJ and even an Objective-C compiler.&nbsp; Granted, it&#8217;s neither as high-level nor as modular as we would need to solve circular dependencies, but it&#8217;s something to go on.</p>
<p>It will be interesting to see where the JVM language sphere is headed next.&nbsp; The rapid emergence of so many new languages is leading to problems which will have to be addressed before the polyglot methodology will be truly accepted by the industry.&nbsp; Some of the smartest people in the development community are working toward solutions; and whether they take my idea of a modular framework or not, somewhere along the line the problem of simultaneous compilation must be solved.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/the-need-for-a-common-compiler-framework/feed</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>The Brilliance of BDD</title>
		<link>http://www.codecommit.com/blog/java/the-brilliance-of-bdd</link>
		<comments>http://www.codecommit.com/blog/java/the-brilliance-of-bdd#comments</comments>
		<pubDate>Mon, 09 Jun 2008 07:00:59 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[Scala]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/java/the-brilliance-of-bdd</guid>
		<description><![CDATA[As I have previously written, I have recently been spending some time experimenting with various aspects of Scala, including some of the frameworks which have become available.&#160; One of the frameworks I have had the privilege of using is the somewhat unassumingly-titled Specs, and implementation of the behavior-driven development methodology in Scala.
Specs takes full advantage [...]]]></description>
			<content:encoded><![CDATA[<p>As I have <a href="http://www.codecommit.com/blog/scala/naive-text-parsing-in-scala">previously written</a>, I have recently been spending some time experimenting with various aspects of Scala, including some of the frameworks which have become available.&nbsp; One of the frameworks I have had the privilege of using is the somewhat unassumingly-titled <a href="http://code.google.com/p/specs/">Specs</a>, and implementation of the <a href="http://en.wikipedia.org/wiki/Behavior_driven_development">behavior-driven development</a> methodology in Scala.</p>
<p>Specs takes full advantage of Scala&#8217;s flexible syntax, offering a very natural format for structuring tests.&nbsp; For example, we could write a simple specification for a hypothetical <code>add</code> method in the following way:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="scala"><span style="color: #140dcc;">object</span> AddSpec <span style="color: #140dcc;">extends</span> Specification <span class="br0">&#123;</span>
  <span style="color: #cb0710;">&quot;add method&quot;</span> should <span class="br0">&#123;</span>
    <span style="color: #cb0710;">&quot;handle simple positives&quot;</span> in <span class="br0">&#123;</span>
      <span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #cb0710;">1</span>, <span style="color: #cb0710;">2</span><span class="br0">&#41;</span> mustEqual <span style="color: #cb0710;">3</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #cb0710;">&quot;handle simple negatives&quot;</span> in <span class="br0">&#123;</span>
      <span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #cb0710;">-1</span>, <span style="color: #cb0710;">-2</span><span class="br0">&#41;</span> mustEqual <span style="color: #cb0710;">-3</span>
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #cb0710;">&quot;handle mixed signs&quot;</span> in <span class="br0">&#123;</span>
      <span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #cb0710;">1</span>, <span style="color: #cb0710;">-2</span><span class="br0">&#41;</span> mustEqual <span style="color: #cb0710;">-1</span>
      <span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #cb0710;">-1</span>, <span style="color: #cb0710;">2</span><span class="br0">&#41;</span> mustEqual <span style="color: #cb0710;">1</span>
    <span class="br0">&#125;</span>
  <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>We could go on, of course, but you get the picture.&nbsp; This code will lead to the execution of four separate assertions in three tests (to put things into JUnit terminology).&nbsp; Fundamentally, this isn&#8217;t too much different than a standard series of unit tests, just with a slightly nicer syntax.</p>
<p>Specs defines a domain-specific language for structuring test assertions in a simple and intuitive way.&nbsp; However, this is hardly the only framework for BDD.&nbsp; Perhaps the most well-known such framework is <a href="http://rspec.info/">RSpec</a>, which answers a similar use-case in the Ruby programming language.&nbsp; Our previous specification could be rewritten using RSpec as follows:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="ruby">describe AddLib <span style="color: #140dcc;">do</span>
  it <span style="color: #cb0710;">'should handle simple positives'</span> <span style="color: #140dcc;">do</span>
    add<span class="br0">&#40;</span><span style="color: #cb0710;">1</span>, <span style="color: #cb0710;">2</span><span class="br0">&#41;</span>.<span class="me1">should</span> == <span style="color: #cb0710;">3</span>
  <span style="color: #140dcc;">end</span>
&nbsp;
  it <span style="color: #cb0710;">'should handle simple negatives'</span> <span style="color: #140dcc;">do</span>
    add<span class="br0">&#40;</span><span style="color: #cb0710;">-1</span>, <span style="color: #cb0710;">-2</span><span class="br0">&#41;</span>.<span class="me1">should</span> == <span style="color: #cb0710;">-3</span>
  <span style="color: #140dcc;">end</span>
&nbsp;
  it <span style="color: #cb0710;">'should handle mixed signs'</span> <span style="color: #140dcc;">do</span>
    add<span class="br0">&#40;</span><span style="color: #cb0710;">1</span>, <span style="color: #cb0710;">-2</span><span class="br0">&#41;</span>.<span class="me1">should</span> == <span style="color: #cb0710;">-1</span>
    add<span class="br0">&#40;</span><span style="color: #cb0710;">-1</span>, <span style="color: #cb0710;">2</span><span class="br0">&#41;</span>.<span class="me1">should</span> == <span style="color: #cb0710;">1</span>
  <span style="color: #140dcc;">end</span>
<span style="color: #140dcc;">end</span></pre></td></tr></table></div>

<p>The end-result is basically the same: the <code>add</code> method will be tested against the given assertions (all four of them) and the results printed in some sort of report form.&nbsp; In this area, RSpec is significantly more mature than Specs, generating very slick HTML reports and nicely formatted console output.&nbsp; This isn&#8217;t really a fundamental weakness of the Specs framework however, just indicative of the fact that RSpec has been around for a <em>lot</em> longer.</p>
<p>These two frameworks are interesting of course, but they are merely implementations of a much larger concept: behavior-driven development.&nbsp; I&#8217;ve never been much of a fan of unit testing.&nbsp; It&#8217;s always seemed to be incredibly dull and a very nearly fruitless waste of effort.&nbsp; As much as I hate it though, I have to bow to the benefits of a self-contained test suite; and so I press on, cursing JUnit every step of the way.</p>
<p>BDD provides a nice alternative to unit testing.&nbsp; At its core, it is not much different in that test groupings and primitive assertions are used to check all aspects of a test unit against predefined data.&nbsp; However, there is something about the &#8220;flow&#8221; of a behavioral spec that is considerably easier to deal with.&nbsp; For some reason, it is far less painful to devise a comprehensive test suite using BDD principles than conventional unit testing.&nbsp; It seems a little far-fetched, but BDD actually makes it easier to write (and more importantly, formulate) exactly the same tests.</p>
<p>It&#8217;s an odd phenomenon, one which can only be caused by the storyboard flow of the code itself.&nbsp; It is very natural to think of distinct requirements for a test unit when each of these requirements are being labeled and entered in a logical sequence.&nbsp; Moreover, the syntax of both Specs and RSpec is such that there is very little boiler-plate required to setup an additional test.&nbsp; Compare the previous BDD specs with the following JUnit4 example:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"></td><td class="code"><pre class="java5"><span style="color: #140dcc;">public</span> <span style="color: #857d1f;">class</span> MathTest <span class="br0">&#123;</span>
    <span style="color: #663f31;">@Test</span>
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">testSimplePositives</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #2e7c0f;">assertEquals</span><span class="br0">&#40;</span><span style="color: #cb0710;">3</span>, <span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #cb0710;">1</span>, <span style="color: #cb0710;">2</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #663f31;">@Test</span>
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">testSimpleNegatives</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #2e7c0f;">assertEquals</span><span class="br0">&#40;</span><span style="color: #cb0710;">-3</span>, <span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #cb0710;">-1</span>, <span style="color: #cb0710;">-2</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
&nbsp;
    <span style="color: #663f31;">@Test</span>
    <span style="color: #140dcc;">public</span> <span style="color: #857d1f;">void</span> <span style="color: #2e7c0f;">testMixedSigns</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
        <span style="color: #2e7c0f;">assertEquals</span><span class="br0">&#40;</span><span style="color: #cb0710;">-1</span>, <span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #cb0710;">1</span>, <span style="color: #cb0710;">-2</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
        <span style="color: #2e7c0f;">assertEquals</span><span class="br0">&#40;</span><span style="color: #cb0710;">1</span>, <span style="color: #2e7c0f;">add</span><span class="br0">&#40;</span><span style="color: #cb0710;">-1</span>, <span style="color: #cb0710;">2</span><span class="br0">&#41;</span><span class="br0">&#41;</span>;
    <span class="br0">&#125;</span>
<span class="br0">&#125;</span></pre></td></tr></table></div>

<p>JUnit just requires that much more syntax.&nbsp; It breaks up the logical flow of the tests and (more importantly) the developer train of thought.&nbsp; What can be worse is this syntax bloat makes it very tempting to just group all of the assertions into a single test &#8211; to save typing if nothing else.&nbsp; This is problematic because one assertion may shadow all the others in the case of a failure, preventing them from ever being executed.&nbsp; This can make certain problems much more difficult to isolate.</p>
<p>Logical flow is extremely important to test structure.&nbsp; BDD frameworks provide a very nice syntax for painlessly defining comprehensive test suites.&nbsp; The really wonderful thing about all of this is that BDD is available on the JVM, right now.&nbsp; There&#8217;s nothing stopping you from writing your code in Java as you normally would, then creating your test suite in Scala using Specs rather than JUnit.&nbsp; Alternatively, you could use RSpec on top of JRuby, or <a href="http://groovy.codehaus.org/Using+GSpec+with+Groovy">Gspec</a> with Groovy.&nbsp; All of these are seamless replacements for a test framework like JUnit, and requiring of far less syntactic overhead.</p>
<p>The growing move toward polyglot programming encourages the use of a separate language when it is best suited to a particular task.&nbsp; In this case, several languages are available which offer far more powerful test frameworks than those which can be found in Java.&nbsp; Why not take advantage of them?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/the-brilliance-of-bdd/feed</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Dramatically Improved UI in jEdit</title>
		<link>http://www.codecommit.com/blog/java/dramatically-improved-ui-in-jedit</link>
		<comments>http://www.codecommit.com/blog/java/dramatically-improved-ui-in-jedit#comments</comments>
		<pubDate>Thu, 22 May 2008 07:00:42 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/java/dramatically-improved-ui-in-jedit</guid>
		<description><![CDATA[This is definitely old news by now (in fact, almost a month old), but I&#8217;m just now discovering it myself so I decided to share.&#160; The jEdit project is renowned for two things:

Marvelous support for every language under the sun
Eye-bleedingly bad UI design

It&#8217;s always been possible to hack yourself an improved version without too much [...]]]></description>
			<content:encoded><![CDATA[<p>This is definitely old news by now (in fact, almost a month old), but I&#8217;m just now discovering it myself so I decided to share.&nbsp; The jEdit project is renowned for two things:<br/><br/></p>
<ul>
<li>Marvelous support for every language under the sun</li>
<li>Eye-bleedingly bad UI design</li>
</ul>
<p>It&#8217;s always been possible to hack yourself an improved version without too much trouble; but by default, jEdit has always looked terrible.&nbsp; This one factor, more than anything else, has contributed to jEdit&#8217;s reputation as the supercharged editor which everyone refuses to try.&nbsp; Fortunately, this influence has been seriously reduced in the <a href="https://sourceforge.net/project/showfiles.php?group_id=588&amp;package_id=3753&amp;release_id=595374">4.3pre14</a> release:</p>
<p align="center"><img height="295" alt="image" src="http://www.codecommit.com/blog/wp-content/uploads/2008/05/image2.png" width="428" border="0"> </p>
<p>Compare that to <a href="http://jedit.org/index.php?page=screenshot&amp;image=34">the old look</a>.&nbsp; Even with Java 6 subpixel rendering, the interface remained a mess.&nbsp; What&#8217;s more, many of the interface elements were custom renderings, preventing the platform-native LAF from appropriately styling them (the toolbar controls are a prime example).&nbsp; All of this is fixed in 4.3pre14.</p>
<p>jEdit is rapidly approaching &#8220;usable editor&#8221; status out of the box, something that even the mighty TextMate hasn&#8217;t quite achieved.&nbsp; Granted, it&#8217;s still Swing-based, which means the fonts render horribly on Vista without Java 6uN, but it&#8217;s a step in the right direction.&nbsp; Now, if only they would do something about <a href="http://www.jedit.org">their website</a>&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/dramatically-improved-ui-in-jedit/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Weekend Fun: ActiveObjects Testability</title>
		<link>http://www.codecommit.com/blog/java/weekend-fun-activeobjects-testability</link>
		<comments>http://www.codecommit.com/blog/java/weekend-fun-activeobjects-testability#comments</comments>
		<pubDate>Sun, 18 May 2008 19:43:21 +0000</pubDate>
		<dc:creator>Daniel Spiewak</dc:creator>
				<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://www.codecommit.com/blog/java/weekend-fun-activeobjects-testability</guid>
		<description><![CDATA[I&#8217;m not entirely sure what these metrics mean, but they give me a warm feeling inside.&#160;  
      Analyzed classes:   136
 Excellent classes (.):   121  89.0%
      Good classes (=):     9   6.6%
Needs work classes (@): [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m not entirely sure what <a href="http://code.google.com/p/testability-explorer">these metrics</a> mean, but they give me a warm feeling inside.&nbsp; <img src='http://www.codecommit.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<pre>      Analyzed classes:   136
 Excellent classes (.):   121  89.0%
      Good classes (=):     9   6.6%
Needs work classes (@):     6   4.4%
             Breakdown: [.............................................===@@]
       0                                                                    118
    10 |......................................................................:   118
    31 |..                                                                    :     3
    52 |                                                                      :     0
    73 |===                                                                   :     5
    94 |===                                                                   :     4
   115 |                                                                      :     0
   136 |@@                                                                    :     3
   157 |                                                                      :     0
   178 |                                                                      :     0
   199 |                                                                      :     0
   220 |                                                                      :     0
   241 |                                                                      :     0
   262 |                                                                      :     0
   283 |                                                                      :     0
   304 |@                                                                     :     1
   325 |                                                                      :     0
   346 |@                                                                     :     1
   367 |                                                                      :     0
   388 |                                                                      :     0
   409 |                                                                      :     0
   430 |                                                                      :     0
   451 |                                                                      :     0
   472 |                                                                      :     0
   493 |@                                                                     :     1
   514 |                                                                      :     0

Highest Cost
============
net.java.ao.EntityManager 501
net.java.ao.schema.SchemaGenerator 353
net.java.ao.EntityProxy 296
net.java.ao.schema.ddl.SchemaReader 141
net.java.ao.RelatedEntityImpl 127
net.java.ao.SearchableEntityManager 127
net.java.ao.Query 99
net.java.ao.types.EntityType 87
net.java.ao.db.HSQLDatabaseProvider 85
net.java.ao.db.OracleDatabaseProvider 85
net.java.ao.DatabaseProvider 83
net.java.ao.Common 82
net.java.ao.EntityManager$1 82
net.java.ao.db.PostgreSQLDatabaseProvider 82
net.java.ao.types.TypeManager 80
net.java.ao.schema.ddl.DDLAction 30
net.java.ao.SoftHashMap 28
net.java.ao.schema.AbstractFieldNameConverter 28
net.java.ao.SoftHashMap$HashIterator 20</pre>
<p>Most of the badness seems to stem from <code>EntityManager</code>, which makes a lot of sense given the way it is designed.&nbsp; <code>EntityProxy</code> also poses issues, but in practice this isn&#8217;t a real problem because of how extensive the JUnit tests are for just this class.&nbsp; Overall, ActiveObjects testability isn&#8217;t anywhere near to the <a href="http://www.testabilityexplorer.org/report/guice/guice/1.0">Guice score</a>, but it&#8217;s not as horrible as <a href="http://www.testabilityexplorer.org/report/jruby/jruby/0.9.8">JRuby</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.codecommit.com/blog/java/weekend-fun-activeobjects-testability/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
