Monday, October 20, 2008

Lucene Free text search Engine

When I was working with the Lucene  search engine, I faced with some problems of "Too many clauses" in the search criteria. To avoid above problem always use PrefixFilters

Term term = new Term(_facetName, axisValue.FromValue);
PrefixFilter filter = new PrefixFilter(term);

instead of

Term term = new Term(_facetName, axisValue.FromValue);
Filter p = new QueryWrapperFilter(new TermQuery(term));


You can avoid words which are very short when indexing. This will improve the performance of the searching. Following is a sample of implementing custom analyzer with the Token filter.


   public class CustomAnalyzer : Analyzer {

      private Set stopSet;
      private int _minTokenLength = 3;


      public static  string[] STOP_WORDS = StopAnalyzer.ENGLISH_STOP_WORDS;

      /** Builds an analyzer. */
      public CustomAnalyzer(int minTokenLength)
          : this(STOP_WORDS, minTokenLength)
      {
      }

      /** Builds an analyzer with the given stop words. */
      public CustomAnalyzer(String[] stopWords,int minTokenLength)
      {
        stopSet = StopFilter.makeStopSet(stopWords);
        _minTokenLength = minTokenLength;
      }

      /** Constructs a {@link StandardTokenizer} filtered by a {@link
      StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
      public override TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new StandardTokenizer(reader);
        result = new StandardFilter(result);
        result = new LengthTokenFilter(result, _minTokenLength);
        result = new LowerCaseFilter(result);
        result = new StopFilter(result, stopSet);
        return result;
      }
}


ToenFilter to remove tokens with short lengths

public class LengthTokenFilter: TokenFilter {
    private int minLength;

    public int MinLength
    {
        get { return minLength; }
        set { minLength = value; }
    }

    internal LengthTokenFilter(TokenStream input, int minLength)
        : base(input)
    {
        this.minLength = minLength;
    }

    public override Token next(Token result){

        while ((result = input.next(result)) != null) {
            if (result.termLength() >= minLength) {
                return result;
            }
        }

        return null;
    }
}







No comments: