will's blog

Can Lucene be Used To Substitute Real Database?

Submitted by will on Tue, 2005-10-25 13:09.
 > Can Lucene to be used in place of mysql so that
 > website visitors can input data that will in turn
 > inserting row into Lucene just like mysql db?
 
 That's a bad idea. Lucene lacks a real update (you need to delete and
 re-add) and also sees everything as a string, even numbers. So although
 it's technically possible you don't want to do it.

First of all, just using Lucene to replace rdms is quite possible in some specific cases.

In addition to updating and string/number issues, Lucene also lacks many rdms functionalities. One of them is aggregation functions like SUM(), or "group by".

Bad Search is One of the top 10 web design mistakes

Submitted by will on Wed, 2005-10-05 10:37.

One of the top 10 web design mistakes: http://www.useit.com/alertbox/designmistakes.html

5. Bad Search Everything else on this list is pretty easy to get right, but unfortunately fixing search requires considerable work and an investment in better software. It's worth doing, though, because search is a fundamental component of the Web user experience and is getting more important every year.

Here is a very good article on site search design. http://www.useit.com/alertbox/20010513.html

Good to know DBSight is designed to create a site search in hours!

DBSight vs common Lucene search implementation

Submitted by will on Tue, 2005-08-02 06:57.

The common approach to implement Lucene Search is to create Lucene index right after the content is submitted. It works fine in a single server, limited volume.

But if your server has several servers, all generating new content for search. In this case, it's difficult to implement Lucene search. Well, you can create Lucene index on each server, merge them together (?), and distribute the index back to each server...

And if the volume is high, merging Lucene indexes become ugly slow. Merging indexes intrinsically is a slow process. It has to go through all documents you have.

This common approach doesn't sound scalable, by either number of servers or number from documents.

XML feed