<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Implementing Persistent Vectors in Scala</title>
	<atom:link href="http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/feed" rel="self" type="application/rss+xml" />
	<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala</link>
	<description>(permanently in beta)</description>
	<lastBuildDate>Sun, 29 Aug 2010 20:01:44 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Daniel Spiewak</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4176</link>
		<dc:creator>Daniel Spiewak</dc:creator>
		<pubDate>Mon, 20 Oct 2008 22:42:20 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4176</guid>
		<description>Recommended reading: Chris Okasaki&#039;s doctoral thesis was on persistent data structures and ways of making them efficient.  It&#039;s very readable and was actually republished as a proper book.</description>
		<content:encoded><![CDATA[<p>Recommended reading: Chris Okasaki&#8217;s doctoral thesis was on persistent data structures and ways of making them efficient.  It&#8217;s very readable and was actually republished as a proper book.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sciss</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4172</link>
		<dc:creator>Sciss</dc:creator>
		<pubDate>Mon, 20 Oct 2008 21:58:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4172</guid>
		<description>well, i&#039;m reading through two papers now (quite an effort for a not-computer-science person) 

Neil Sarnek, Robert Tarjan, &quot;Planar Point Location using persistent trees&quot;

and

James Driscoll, Daniel Sleator, Robert Tarjan &quot;Making data structures persistent&quot;

... hopefully with some insight.</description>
		<content:encoded><![CDATA[<p>well, i&#8217;m reading through two papers now (quite an effort for a not-computer-science person) </p>
<p>Neil Sarnek, Robert Tarjan, &#8220;Planar Point Location using persistent trees&#8221;</p>
<p>and</p>
<p>James Driscoll, Daniel Sleator, Robert Tarjan &#8220;Making data structures persistent&#8221;</p>
<p>&#8230; hopefully with some insight.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Spiewak</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4170</link>
		<dc:creator>Daniel Spiewak</dc:creator>
		<pubDate>Mon, 20 Oct 2008 19:29:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4170</guid>
		<description>Ah, I didn&#039;t realize that you needed range matching (I suppose I should have read your original post...).  This is starting to get a bit out of my depth, but it would seem to me that you could match key spans in O(log(n)) time using a balanced binary search tree.  This is because you can just treat ranges as trimmed sub-trees.  Doing it immutably might be a bit tough, but still possible.  It would seem to me that you could take advantage of Scala&#039;s SortedMap#rangeImpl method for this, but I&#039;m not entirely sure.

A higher branching factor (such as is used in Vector) might be interesting to explore, but I don&#039;t think it would help too much.  In fact, I think it would hinder performance.  For one thing, you would *have* to use a tree structure rather than a trie (otherwise range matches are still linear time).  Also, bumping up the branching factor makes it much more complicated to just select out your sub-tree.  You would have to trim nodes as well as trees, and that gets to be quite inefficient.</description>
		<content:encoded><![CDATA[<p>Ah, I didn&#8217;t realize that you needed range matching (I suppose I should have read your original post&#8230;).  This is starting to get a bit out of my depth, but it would seem to me that you could match key spans in O(log(n)) time using a balanced binary search tree.  This is because you can just treat ranges as trimmed sub-trees.  Doing it immutably might be a bit tough, but still possible.  It would seem to me that you could take advantage of Scala&#8217;s SortedMap#rangeImpl method for this, but I&#8217;m not entirely sure.</p>
<p>A higher branching factor (such as is used in Vector) might be interesting to explore, but I don&#8217;t think it would help too much.  In fact, I think it would hinder performance.  For one thing, you would *have* to use a tree structure rather than a trie (otherwise range matches are still linear time).  Also, bumping up the branching factor makes it much more complicated to just select out your sub-tree.  You would have to trim nodes as well as trees, and that gets to be quite inefficient.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sciss</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4168</link>
		<dc:creator>Sciss</dc:creator>
		<pubDate>Mon, 20 Oct 2008 17:22:33 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4168</guid>
		<description>here is some measurements with the above code. t2-t1 is the loop adding elements, t5-t3 is the from and until operations, t8-t6 is creating the intersection between the from and until collections. interestingly, all are O(N), where i had guessed that t5-t3 should be O(logN) ? or do i have to use a special kind of iterator generation to get that performance?

// for N=50000
t2-t1 = 1226 ms
t5-t3 = 6 ms
t8-t6 = 159 ms

// for N=100000
t2-t1 = 2688 m
t5-t3 = 12 ms
t8-t6 = 317 ms

// for N=200000
t2-t1 = 6155 ms
t5-t3 = 20 ms
t8-t6 = 603 ms</description>
		<content:encoded><![CDATA[<p>here is some measurements with the above code. t2-t1 is the loop adding elements, t5-t3 is the from and until operations, t8-t6 is creating the intersection between the from and until collections. interestingly, all are O(N), where i had guessed that t5-t3 should be O(logN) ? or do i have to use a special kind of iterator generation to get that performance?</p>
<p>// for N=50000<br />
t2-t1 = 1226 ms<br />
t5-t3 = 6 ms<br />
t8-t6 = 159 ms</p>
<p>// for N=100000<br />
t2-t1 = 2688 m<br />
t5-t3 = 12 ms<br />
t8-t6 = 317 ms</p>
<p>// for N=200000<br />
t2-t1 = 6155 ms<br />
t5-t3 = 20 ms<br />
t8-t6 = 603 ms</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sciss</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4167</link>
		<dc:creator>Sciss</dc:creator>
		<pubDate>Mon, 20 Oct 2008 16:50:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4167</guid>
		<description>p.s. here is what i mean. i&#039;m splitting my range object (Span) into two pairs of keys, by start and stop (Long). but when querying a range, the probematic lines are the last four, which destroy the speed...


case class Span( start: Long, stop: Long );
case class Lala( name: String, span: Span );

var t_start = new scala.collection.immutable.TreeMap[ Long, Lala ]()
var t_stop = new scala.collection.immutable.TreeMap[ Long, Lala ]()

val rand = new java.util.Random

(1 to 1000).foreach { i =&gt;
	val start = rand.nextInt( 400000 )
	val stop  = start + rand.nextInt( 400000 )
	val span = Span( start, stop )
	val lala = Lala( String.valueOf( i ), span )
	t_start = t_start.update( span.start, lala )
	t_stop  = t_stop.update( span.stop, lala  )
}


val search = Span( 100000, 200000 )

val coll1 = t_start from search.start
val coll2 = t_stop until search.stop

val c1vals = coll1.values
val c1valSet = new scala.collection.mutable.HashSet[ Lala ]
c1valSet ++= c1vals

val result = coll2.filter { e =&gt; c1valSet.contains( e._2 )}</description>
		<content:encoded><![CDATA[<p>p.s. here is what i mean. i&#8217;m splitting my range object (Span) into two pairs of keys, by start and stop (Long). but when querying a range, the probematic lines are the last four, which destroy the speed&#8230;</p>
<p>case class Span( start: Long, stop: Long );<br />
case class Lala( name: String, span: Span );</p>
<p>var t_start = new scala.collection.immutable.TreeMap[ Long, Lala ]()<br />
var t_stop = new scala.collection.immutable.TreeMap[ Long, Lala ]()</p>
<p>val rand = new java.util.Random</p>
<p>(1 to 1000).foreach { i =&gt;<br />
	val start = rand.nextInt( 400000 )<br />
	val stop  = start + rand.nextInt( 400000 )<br />
	val span = Span( start, stop )<br />
	val lala = Lala( String.valueOf( i ), span )<br />
	t_start = t_start.update( span.start, lala )<br />
	t_stop  = t_stop.update( span.stop, lala  )<br />
}</p>
<p>val search = Span( 100000, 200000 )</p>
<p>val coll1 = t_start from search.start<br />
val coll2 = t_stop until search.stop</p>
<p>val c1vals = coll1.values<br />
val c1valSet = new scala.collection.mutable.HashSet[ Lala ]<br />
c1valSet ++= c1vals</p>
<p>val result = coll2.filter { e =&gt; c1valSet.contains( e._2 )}</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sciss</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4164</link>
		<dc:creator>Sciss</dc:creator>
		<pubDate>Mon, 20 Oct 2008 15:22:57 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4164</guid>
		<description>@Daniel

i understand i can use tuples as keys. but they won&#039;t solve the two problems
- duplicate keys
- range searches for boundaries

while i can imagine solving the first problem by using a Map[ Span, Seq[ Object ]], the second problem seems impossible to do with the normal tree / vector: if any object is defined by its Span( ,  ), how would i query for example

&quot;give me all objects that intersect with Span( 10000, 40000 )&quot; ?

if i&#039;m not wrong this requires a two-dimensional tree... or do you see another solution? thanks again!</description>
		<content:encoded><![CDATA[<p>@Daniel</p>
<p>i understand i can use tuples as keys. but they won&#8217;t solve the two problems<br />
- duplicate keys<br />
- range searches for boundaries</p>
<p>while i can imagine solving the first problem by using a Map[ Span, Seq[ Object ]], the second problem seems impossible to do with the normal tree / vector: if any object is defined by its Span( ,  ), how would i query for example</p>
<p>&#8220;give me all objects that intersect with Span( 10000, 40000 )&#8221; ?</p>
<p>if i&#8217;m not wrong this requires a two-dimensional tree&#8230; or do you see another solution? thanks again!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Spiewak</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4138</link>
		<dc:creator>Daniel Spiewak</dc:creator>
		<pubDate>Mon, 13 Oct 2008 22:03:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4138</guid>
		<description>Actually, you don&#039;t have to create a new proxy object; Scala already gives you one: Tuple2.  I think you can accomplish your goals by using tuples as keys.  Thus, for some types A, B and C, your Map type signature will be something like the following:

Map[(A, B), C]</description>
		<content:encoded><![CDATA[<p>Actually, you don&#8217;t have to create a new proxy object; Scala already gives you one: Tuple2.  I think you can accomplish your goals by using tuples as keys.  Thus, for some types A, B and C, your Map type signature will be something like the following:</p>
<p>Map[(A, B), C]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sciss</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4137</link>
		<dc:creator>Sciss</dc:creator>
		<pubDate>Mon, 13 Oct 2008 21:58:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4137</guid>
		<description>@Daniel

thanks for the fast reply, and thanks for the hint to TreeMap. in fact, i only understand the conception of scala.collection.immutable after reading your article - coming from the java world, i had always thought that that package is just creating read-only objects like java.util.Collections.unmodifiable(List&#124;Map&#124;...) !

TreeMap however doesn&#039;t work in my case due to two things:
- all range operations are 1-dimensional, where i need two-dimensional keys
  (like a (start, stop) tuple)
- i cannot insert two elements at the same key

i just realize the second issue is also true for the Vector class ;-(  and i guess it&#039;s a lot of work to modify it so that it accepts duplicate keys. (there is a trick however i read in a paper - you can store a proxy value for a key that points to more than one value, the proxy value contains a list of all objects stored under that key). however that introduces a new problem for the generics. i guess i will need to have a wrapper class to handle that.</description>
		<content:encoded><![CDATA[<p>@Daniel</p>
<p>thanks for the fast reply, and thanks for the hint to TreeMap. in fact, i only understand the conception of scala.collection.immutable after reading your article &#8211; coming from the java world, i had always thought that that package is just creating read-only objects like java.util.Collections.unmodifiable(List|Map|&#8230;) !</p>
<p>TreeMap however doesn&#8217;t work in my case due to two things:<br />
- all range operations are 1-dimensional, where i need two-dimensional keys<br />
  (like a (start, stop) tuple)<br />
- i cannot insert two elements at the same key</p>
<p>i just realize the second issue is also true for the Vector class ;-(  and i guess it&#8217;s a lot of work to modify it so that it accepts duplicate keys. (there is a trick however i read in a paper &#8211; you can store a proxy value for a key that points to more than one value, the proxy value contains a list of all objects stored under that key). however that introduces a new problem for the generics. i guess i will need to have a wrapper class to handle that.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Spiewak</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4136</link>
		<dc:creator>Daniel Spiewak</dc:creator>
		<pubDate>Mon, 13 Oct 2008 20:37:40 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4136</guid>
		<description>Correction, the Tree class should be linked as follows: http://www.scala-lang.org/docu/files/api/scala/collection/immutable/TreeMap.html

TreeMap uses Tree to implement an essentially Red-Black tree structure.</description>
		<content:encoded><![CDATA[<p>Correction, the Tree class should be linked as follows: <a href="http://www.scala-lang.org/docu/files/api/scala/collection/immutable/TreeMap.html" rel="nofollow">http://www.scala-lang.org/docu/files/api/scala/collection/immutable/TreeMap.html</a></p>
<p>TreeMap uses Tree to implement an essentially Red-Black tree structure.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Spiewak</title>
		<link>http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala/comment-page-1#comment-4135</link>
		<dc:creator>Daniel Spiewak</dc:creator>
		<pubDate>Mon, 13 Oct 2008 20:34:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/scala/implementing-persistent-vectors-in-scala#comment-4135</guid>
		<description>@Sciss

Hmm, I&#039;m not sure I understand completely what you&#039;re trying to do, but I can definately help you with a log(n) efficient Map!  :-)  What you want is a functional, balanced binary search tree.  Traditionally, most languages use an immutable Red-Black tree for this case, but Scala actually has an even faster implementation: &lt;a href=&quot;http://www.scala-lang.org/docu/files/api/scala/collection/immutable/Tree.html&quot; rel=&quot;nofollow&quot;&gt;scala.collection.immutable.Tree&lt;/a&gt;.  This structure guarantees O(log(n)) for all operations.

It&#039;s actually possible to do a little better though.  Clojure has a persistent hash map based on Bagwell&#039;s ideal hash tries which is able to claim O(log32(n)) for all operations.  This is actually even a little better than conventional mutable bucket hashing which has an absolute worse-case O(log(n)) (when everything hashes to the same bucket).  I haven&#039;t had the chance to port Clojure&#039;s hash map to Scala yet, but I&#039;m working on it!  :-)  This persistent hash map is very similar in design to the persistent vector (they&#039;re both based on the same structural concept).</description>
		<content:encoded><![CDATA[<p>@Sciss</p>
<p>Hmm, I&#8217;m not sure I understand completely what you&#8217;re trying to do, but I can definately help you with a log(n) efficient Map!  <img src='http://www.codecommit.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />   What you want is a functional, balanced binary search tree.  Traditionally, most languages use an immutable Red-Black tree for this case, but Scala actually has an even faster implementation: <a href="http://www.scala-lang.org/docu/files/api/scala/collection/immutable/Tree.html" rel="nofollow">scala.collection.immutable.Tree</a>.  This structure guarantees O(log(n)) for all operations.</p>
<p>It&#8217;s actually possible to do a little better though.  Clojure has a persistent hash map based on Bagwell&#8217;s ideal hash tries which is able to claim O(log32(n)) for all operations.  This is actually even a little better than conventional mutable bucket hashing which has an absolute worse-case O(log(n)) (when everything hashes to the same bucket).  I haven&#8217;t had the chance to port Clojure&#8217;s hash map to Scala yet, but I&#8217;m working on it!  <img src='http://www.codecommit.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />   This persistent hash map is very similar in design to the persistent vector (they&#8217;re both based on the same structural concept).</p>
]]></content:encoded>
	</item>
</channel>
</rss>
