Monthly Archives: October 2012

Building your own Lucene Scorer

This post is about Apache Lucene, which is a “high-performance, full-featured text search engine library written entirely in Java”. If you have no idea on what I am talking about, this tutorial is not for you :). Be advised that this is my first month using Lucene, so there is still a chance that everything I say here is just plain wrong :P. Also, I am currently using version 3.6.1.

Doing an assignment from my Information Retrieval class I was faced with the problem of creating my own Scorer class on Lucene. When you create a new IndexSearcher, by default Lucene uses DefaultSimilarity, which is actually cosine similarity (in a Vector Space Model) with different weights such as boosts given when indexing, boosts given in the query, tf*idf and document length norm. A description on how it works exactly can be found on Similarity class documentation and on Lucene Score documentation.

Continue reading

Posted in Development | Tagged | 2 Comments