<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Is a Separate Text Search Engine a Bad Idea?</title>
	<atom:link href="http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea/feed" rel="self" type="application/rss+xml" />
	<link>http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea</link>
	<description>(permanently in beta)</description>
	<lastBuildDate>Sun, 29 Aug 2010 20:01:44 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Daniel Spiewak</title>
		<link>http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea/comment-page-1#comment-2585</link>
		<dc:creator>Daniel Spiewak</dc:creator>
		<pubDate>Tue, 18 Dec 2007 16:26:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea#comment-2585</guid>
		<description>That&#039;s a good point.  The problem I see is the same as I illustrated in the article: service overload.  PostgreSQL does support full text indexing, but if you try to put both full-text indexing and databasing into a single point of failure, you&#039;re going to have to deal with some serious scalability issues.  I won&#039;t deny that it&#039;s possible to do things that way, but I don&#039;t think it&#039;s a terribly good idea for a high volume application.

What would be really nice is some sort of integration between the search engine (Lucene) and the database (MySQL, PostgreSQL, it shouldn&#039;t matter).  This sort of integration could allow things like your use case, without overloading the database.</description>
		<content:encoded><![CDATA[<p>That&#8217;s a good point.  The problem I see is the same as I illustrated in the article: service overload.  PostgreSQL does support full text indexing, but if you try to put both full-text indexing and databasing into a single point of failure, you&#8217;re going to have to deal with some serious scalability issues.  I won&#8217;t deny that it&#8217;s possible to do things that way, but I don&#8217;t think it&#8217;s a terribly good idea for a high volume application.</p>
<p>What would be really nice is some sort of integration between the search engine (Lucene) and the database (MySQL, PostgreSQL, it shouldn&#8217;t matter).  This sort of integration could allow things like your use case, without overloading the database.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Prathapan Sethu</title>
		<link>http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea/comment-page-1#comment-2584</link>
		<dc:creator>Prathapan Sethu</dc:creator>
		<pubDate>Tue, 18 Dec 2007 11:45:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea#comment-2584</guid>
		<description>There are cases where you would want to further filter or sort the result set from a full-text query by another parameter stored in a database. For instance, I search for articles stored in Lucene, but want to sort/rank them by user-given ratings, which are stored in a database. I don&#039;t want to store the user ratings inside Lucene because that data changes a lot, and keeping the Lucene index updated is not easy. Multiple users cannot write to Lucene index concurrently. Another example: what if I want to sort the shopping cart items by price? So I think we need a single product which can handle both full-text and regular SQL queries and sort the results based on full-text relevance or some other db field. If only the database vendors take this seriously. I can see a huge demand for such an IR-DB product.</description>
		<content:encoded><![CDATA[<p>There are cases where you would want to further filter or sort the result set from a full-text query by another parameter stored in a database. For instance, I search for articles stored in Lucene, but want to sort/rank them by user-given ratings, which are stored in a database. I don&#8217;t want to store the user ratings inside Lucene because that data changes a lot, and keeping the Lucene index updated is not easy. Multiple users cannot write to Lucene index concurrently. Another example: what if I want to sort the shopping cart items by price? So I think we need a single product which can handle both full-text and regular SQL queries and sort the results based on full-text relevance or some other db field. If only the database vendors take this seriously. I can see a huge demand for such an IR-DB product.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Spiewak</title>
		<link>http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea/comment-page-1#comment-2405</link>
		<dc:creator>Daniel Spiewak</dc:creator>
		<pubDate>Fri, 12 Oct 2007 18:37:16 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea#comment-2405</guid>
		<description>Well, I&#039;ll admit my experience with that kind of system is limited, but I know of other people who do have the experience and they say that it&#039;s often *easier* to keep the Lucene bits in sync than the DBMS.

If you&#039;re having difficulties keeping Lucene in sync with your database (or visa versa), you may want to take a hard look at how you&#039;re doing things.  Not just the underlying search-vs-database, but how you&#039;re implementing them.  As I mentioned, a few ORMs can keep search indexes and databases in sync automatically.  Have you tried this approach?  Even if you haven&#039;t, a reasonable DAO should be able to take care of most of the pain by centralizing the Lucene update code.

Lucene scales incredibly simply (since it&#039;s usually just a bunch of files sitting on a hard drive).  I honestly can&#039;t see how a huge deployment would make it difficult to keep everything in sync.  Maybe on the database side of life, but not with Lucene itself.</description>
		<content:encoded><![CDATA[<p>Well, I&#8217;ll admit my experience with that kind of system is limited, but I know of other people who do have the experience and they say that it&#8217;s often *easier* to keep the Lucene bits in sync than the DBMS.</p>
<p>If you&#8217;re having difficulties keeping Lucene in sync with your database (or visa versa), you may want to take a hard look at how you&#8217;re doing things.  Not just the underlying search-vs-database, but how you&#8217;re implementing them.  As I mentioned, a few ORMs can keep search indexes and databases in sync automatically.  Have you tried this approach?  Even if you haven&#8217;t, a reasonable DAO should be able to take care of most of the pain by centralizing the Lucene update code.</p>
<p>Lucene scales incredibly simply (since it&#8217;s usually just a bunch of files sitting on a hard drive).  I honestly can&#8217;t see how a huge deployment would make it difficult to keep everything in sync.  Maybe on the database side of life, but not with Lucene itself.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: You're wrong</title>
		<link>http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea/comment-page-1#comment-2402</link>
		<dc:creator>You're wrong</dc:creator>
		<pubDate>Fri, 12 Oct 2007 08:19:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea#comment-2402</guid>
		<description>You are obviously using a relatively simple database + index configuration. One DB and one index ?

Well, the world is bigger than that. When you have data *that changes frequently* spread across several hundreds of servers, some of it replicated in different clusters, keeping the Lucene index(es) *up to date and consistent* becomes a huge task. I know because I&#039;m doing it right now and it sucks.

Keeping the full text search engine outside of the database is the wrong way and market requirements will force RDBMS manufacturers to do a better job at integrating them while preserving performance and scalability.</description>
		<content:encoded><![CDATA[<p>You are obviously using a relatively simple database + index configuration. One DB and one index ?</p>
<p>Well, the world is bigger than that. When you have data *that changes frequently* spread across several hundreds of servers, some of it replicated in different clusters, keeping the Lucene index(es) *up to date and consistent* becomes a huge task. I know because I&#8217;m doing it right now and it sucks.</p>
<p>Keeping the full text search engine outside of the database is the wrong way and market requirements will force RDBMS manufacturers to do a better job at integrating them while preserving performance and scalability.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: owen</title>
		<link>http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea/comment-page-1#comment-2401</link>
		<dc:creator>owen</dc:creator>
		<pubDate>Thu, 11 Oct 2007 19:33:15 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea#comment-2401</guid>
		<description>use the best tool for the job.  and sometimes this means more than one tool.  unless the day comes when I can write an entire program in my RDMS, lol THEN I&#039;ll be happy and singletonitized!</description>
		<content:encoded><![CDATA[<p>use the best tool for the job.  and sometimes this means more than one tool.  unless the day comes when I can write an entire program in my RDMS, lol THEN I&#8217;ll be happy and singletonitized!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: The How-To Geek</title>
		<link>http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea/comment-page-1#comment-2400</link>
		<dc:creator>The How-To Geek</dc:creator>
		<pubDate>Thu, 11 Oct 2007 12:42:14 +0000</pubDate>
		<guid isPermaLink="false">http://www.codecommit.com/blog/java/is-a-separate-text-search-engine-a-bad-idea#comment-2400</guid>
		<description>Using a lucene backend is great for a search engine, or even a site where the navigation involves browsing through mostly static content. It&#039;s not strictly limited to searching, I know of a lot of shopping sites that are built on lucene.

It&#039;s pretty much useless for a site or section that needs transactional support, for instance you wouldn&#039;t use it for a shopping cart page because it&#039;s just not made for that type of thing. The index building functions are far less performant than you&#039;d think, and you really require transactions for that type of data.

I think that you have to design an application so that you can &quot;plug in&quot; a new search engine when MySQL fulltext starts to break down. For small applications MySQL will work perfectly (like this blog), but once you start to have a large volume of searching going on it simply can&#039;t handle the load.

That&#039;s where it would be great to just &quot;plug&quot; a new search engine in that could utilize lucene.</description>
		<content:encoded><![CDATA[<p>Using a lucene backend is great for a search engine, or even a site where the navigation involves browsing through mostly static content. It&#8217;s not strictly limited to searching, I know of a lot of shopping sites that are built on lucene.</p>
<p>It&#8217;s pretty much useless for a site or section that needs transactional support, for instance you wouldn&#8217;t use it for a shopping cart page because it&#8217;s just not made for that type of thing. The index building functions are far less performant than you&#8217;d think, and you really require transactions for that type of data.</p>
<p>I think that you have to design an application so that you can &#8220;plug in&#8221; a new search engine when MySQL fulltext starts to break down. For small applications MySQL will work perfectly (like this blog), but once you start to have a large volume of searching going on it simply can&#8217;t handle the load.</p>
<p>That&#8217;s where it would be great to just &#8220;plug&#8221; a new search engine in that could utilize lucene.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
