Monday, October 20, 2008

Lucene Free text search Engine

When I was working with the Lucene  search engine, I faced with some problems of "Too many clauses" in the search criteria. To avoid above problem always use PrefixFilters

Term term = new Term(_facetName, axisValue.FromValue);
PrefixFilter filter = new PrefixFilter(term);

instead of

Term term = new Term(_facetName, axisValue.FromValue);
Filter p = new QueryWrapperFilter(new TermQuery(term));

You can avoid words which are very short when indexing. This will improve the performance of the searching. Following is a sample of implementing custom analyzer with the Token filter.

   public class CustomAnalyzer : Analyzer {

      private Set stopSet;
      private int _minTokenLength = 3;

      public static  string[] STOP_WORDS = StopAnalyzer.ENGLISH_STOP_WORDS;

      /** Builds an analyzer. */
      public CustomAnalyzer(int minTokenLength)
          : this(STOP_WORDS, minTokenLength)

      /** Builds an analyzer with the given stop words. */
      public CustomAnalyzer(String[] stopWords,int minTokenLength)
        stopSet = StopFilter.makeStopSet(stopWords);
        _minTokenLength = minTokenLength;

      /** Constructs a {@link StandardTokenizer} filtered by a {@link
      StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
      public override TokenStream tokenStream(String fieldName, Reader reader) {
        TokenStream result = new StandardTokenizer(reader);
        result = new StandardFilter(result);
        result = new LengthTokenFilter(result, _minTokenLength);
        result = new LowerCaseFilter(result);
        result = new StopFilter(result, stopSet);
        return result;

ToenFilter to remove tokens with short lengths

public class LengthTokenFilter: TokenFilter {
    private int minLength;

    public int MinLength
        get { return minLength; }
        set { minLength = value; }

    internal LengthTokenFilter(TokenStream input, int minLength)
        : base(input)
        this.minLength = minLength;

    public override Token next(Token result){

        while ((result = != null) {
            if (result.termLength() >= minLength) {
                return result;

        return null;

No comments: