When I was working with the Lucene search engine, I faced with some problems of "Too many clauses" in the search criteria. To avoid above problem always use PrefixFilters
Term term = new Term(_facetName, axisValue.FromValue);
PrefixFilter filter = new PrefixFilter(term);
instead of
Term term = new Term(_facetName, axisValue.FromValue);
Filter p = new QueryWrapperFilter(new TermQuery(term));
You can avoid words which are very short when indexing. This will improve the performance of the searching. Following is a sample of implementing custom analyzer with the Token filter.
public class CustomAnalyzer : Analyzer {
private Set stopSet;
private int _minTokenLength = 3;
public static string[] STOP_WORDS = StopAnalyzer.ENGLISH_STOP_WORDS;
/** Builds an analyzer. */
public CustomAnalyzer(int minTokenLength)
: this(STOP_WORDS, minTokenLength)
{
}
/** Builds an analyzer with the given stop words. */
public CustomAnalyzer(String[] stopWords,int minTokenLength)
{
stopSet = StopFilter.makeStopSet(stopWords);
_minTokenLength = minTokenLength;
}
/** Constructs a {@link StandardTokenizer} filtered by a {@link
StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
public override TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LengthTokenFilter(result, _minTokenLength);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
return result;
}
}
ToenFilter to remove tokens with short lengths
public class LengthTokenFilter: TokenFilter {
private int minLength;
public int MinLength
{
get { return minLength; }
set { minLength = value; }
}
internal LengthTokenFilter(TokenStream input, int minLength)
: base(input)
{
this.minLength = minLength;
}
public override Token next(Token result){
while ((result = input.next(result)) != null) {
if (result.termLength() >= minLength) {
return result;
}
}
return null;
}
}
Term term = new Term(_facetName, axisValue.FromValue);
PrefixFilter filter = new PrefixFilter(term);
instead of
Term term = new Term(_facetName, axisValue.FromValue);
Filter p = new QueryWrapperFilter(new TermQuery(term));
You can avoid words which are very short when indexing. This will improve the performance of the searching. Following is a sample of implementing custom analyzer with the Token filter.
public class CustomAnalyzer : Analyzer {
private Set stopSet;
private int _minTokenLength = 3;
public static string[] STOP_WORDS = StopAnalyzer.ENGLISH_STOP_WORDS;
/** Builds an analyzer. */
public CustomAnalyzer(int minTokenLength)
: this(STOP_WORDS, minTokenLength)
{
}
/** Builds an analyzer with the given stop words. */
public CustomAnalyzer(String[] stopWords,int minTokenLength)
{
stopSet = StopFilter.makeStopSet(stopWords);
_minTokenLength = minTokenLength;
}
/** Constructs a {@link StandardTokenizer} filtered by a {@link
StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
public override TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LengthTokenFilter(result, _minTokenLength);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
return result;
}
}
ToenFilter to remove tokens with short lengths
public class LengthTokenFilter: TokenFilter {
private int minLength;
public int MinLength
{
get { return minLength; }
set { minLength = value; }
}
internal LengthTokenFilter(TokenStream input, int minLength)
: base(input)
{
this.minLength = minLength;
}
public override Token next(Token result){
while ((result = input.next(result)) != null) {
if (result.termLength() >= minLength) {
return result;
}
}
return null;
}
}
No comments:
Post a Comment