Lucene.net facet optimise memory utilization

As I stated in my previous post – memory consumption is inevitable if you wanna do facet calculation against a larger dataset AND still meet somewhat acceptable performance requirements.

After experiencing that myself, seeing memory usage really spike on one of my bigger sites using the MultifacetLucene library I spent some days thinking and experimenting in how it could be optimized without sacrificing too much on the performance side.

Let me start by saying a little about my reference index. I have 240000 documents, two columns are meant to be faceted. One containing around 20000 unique values, the other one only a couple of hundred. On my desktop machine (where I don’t have enough room to place the index on the SSD disk, but therefore use my regular mechanical disk to store the index) I have a test suite running 100 consequtive queries (with facets) and I do get somewhere around 42ms in response time in average. In fact on this particular box I think 50 ms is acceptable.

Now, when it comes to memory – the actual structure to store the facet information and bitsets weighs in at around 111 MB …That is NOT good…

So – I’ve been playing with a LRU Cache for starters. Thinking I should only store just X percent of the bitsets in memory. And calculate the rest when asked for. But I found that in such a performance critical library actually handling the cache could very well turn into a bottleneck/become a resource hog itself.

My second shot was to kind of simplify the actual caching logic. I turned it into a “most likely to be used” cache. Cause, remember, I already do some optimization when calculating the facet values – shortcutting the loop as early as possible. Meaning I do have some clues which facet values will probably calculated the most – the most frequent ones that is! They will be used in most webpages, based on pure logic.

So a really simple implementation, where I just store 50 percent of the bitsets in memory gave 50 percent less memory utilization – and still performance was only hit by 4-5 ms. I think this might be the way to go.

I still have some code refactoring to do before I will release it, but I hope to be able to try it on my own sites anyday now.  My idea is to make the optimization extendable – like being able to feed the FacetSearcher a IMemoryOptimizer instance. So it would be easy enough to override/provide your own strategy.

bool ShouldLazyLoad(string fieldAttributeName, FacetSearcher.FacetValues.FacetValueBitSet bitSet, int index, IEnumerable<FacetSearcher.FacetValues.FacetValueBitSet> allFacetValueBitSets, int totalCount);

I mean, some attributes you might really want to keep in memory. Or you might wanna do a strategy where only attributes where total number of attributes is more than 15000 values or something like that.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s