Facets in Lucene.net – the theory

Ok, so lets talk a a little about the actual facetting theory. Or to be more exact: lets talk about how it’s implemented in Lucene.NET. Cause thanks to Lucene, facet calculation boils down to pretty simple bitmapping arithmetics.

What we do is, when facet calculation for a specific field for the first time is being requested, we read (and cache) bitmaps from the Lucene index. Each possible value is given a bitmap of its own. So for value “BLUE” a bitmap could be 01001. That would mean that for the specific facet field, document number 2 and 5 indeed contains the value “BLUE”. As a side note, fetching all term values is supported from core Lucene.net and it even offers an bitmap abstraction, through the OpenBitSetDISI class

private FacetValues ReadBitSetsForValues(string facetAttributeFieldName)
        {
            var facetValues = new FacetValues();
            facetValues.Term = facetAttributeFieldName;

            facetValues.FacetValueBitSetList.AddRange(
                GetFacetValueTerms(facetAttributeFieldName).Select(fvt => new FacetValues.FacetValueBitSet
                {
                    Value = fvt.Term,
                    Filter = fvt.Filter,
                    OpenBitSetDISI =
                        new OpenBitSetDISI(fvt.Filter.GetDocIdSet(IndexReader).Iterator(), IndexReader.MaxDoc)
                }));

            return facetValues;
        }

        private IEnumerable<FacetValueTermFilter> GetFacetValueTerms(string facetAttributeFieldName)
        {
            var termReader = IndexReader.Terms(new Term(facetAttributeFieldName, String.Empty));
            do
            {
                if (termReader.Term.Field != facetAttributeFieldName)
                    yield break;

                var facetQuery = new TermQuery(termReader.Term.CreateTerm(termReader.Term.Text));
                var facetQueryFilter = new CachingWrapperFilter(new QueryWrapperFilter(facetQuery));
                yield return new FacetValueTermFilter {Term = termReader.Term.Text, Filter = facetQueryFilter};
            } while (termReader.Next());
        }


Not too many rows of code to do that, right?

So the FacetSearcher class in our library offers a SearchWithFacets function. Apart from running the regular query to fetch the actual documents (that query could be “searchword:’hello’ – we shold also fetch all facet values matching that original query. Now the bitmap aritmetics comes into play.

If we generate a new bitmap for the ‘searchword:hello’ filter we might get a 11000 result. Only document 1 and 2 matches that search filter.

Now, Simply ANDing the 01001 with 11000 will tell us that only document 2 matches BLUE and searchword:hello

That’s the basics of Lucene facetting! However, the beauty of my library is how it takes into consideration other facet selection. The “If we generate a new bitmap for the ‘searchword:hello’ filter” is not entirely true, that’s my point.

Lets say we have selected COLOR:BLUE and SIZE:MEDIUM. Other lucene.net facet implementations I’ve seen uses both those filters when calculating facets. Which will completely rule out the possibility to do multi value selection. I.e COLOR:RED OR COLOR:BLUE. Anyway, I’ve talked about that way too much already in my earlier posts, so what I wanna show you is a demo site I whipped together using the MultiFacetLucene library.

Lets go to the search page of  Aviation questions – searching for the word aviation gave us 69 results.

To the right you see facet values. Click one of them and search result is filtered. Facet results are also updated, but one facet value dopes not rule out the other options in that facet field.

Advertisements