October, 2012 | Fernando Brito

Monthly Archives: October 2012

Building your own Lucene Scorer

Posted on October 26, 2012 by Fernando Brito

This post is about Apache Lucene, which is a “high-performance, full-featured text search engine library written entirely in Java”. If you have no idea on what I am talking about, this tutorial is not for you :). Be advised that this is my first month using Lucene, so there is still a chance that everything I say here is just plain wrong :P. Also, I am currently using version 3.6.1.

Doing an assignment from my Information Retrieval class I was faced with the problem of creating my own Scorer class on Lucene. When you create a new IndexSearcher, by default Lucene uses DefaultSimilarity, which is actually cosine similarity (in a Vector Space Model) with different weights such as boosts given when indexing, boosts given in the query, tf*idf and document length norm. A description on how it works exactly can be found on Similarity class documentation and on Lucene Score documentation.

Continue reading →

Posted in Development | Tagged information retrieval | 2 Comments

Monthly Archives: October 2012

Building your own Lucene Scorer

Recent Posts

Archives

Pages

Categories

Suscribe

Monthly Archives: October 2012

Building your own Lucene Scorer

Most Viewed Posts

Recent Posts

Archives

Pages

Categories

Suscribe