New version of MultiLiceneNet library, both at Github and Nuget

Just a headsup – a new release is available.

NUGET

GITHUB

– Nuget now supporting Framework 4.5 as well

– nasty memory leak fixed, cussing used memory to half

There are two notable things to comment on when it comes to the memory leak.

a)  facet calculation IS memory intense. It has to be for performance to be acceptable. Basically we need our full dataset bitsets saved in memory for each possible facet value.

b) I did notice the leak cause I am using my library for real. I’m using it on sites with million of documents, And it’s really cool to see acceptable search and facetting performance, despite the large datasets.

I will however think a bit about optimizing it even further. I have some ideas about a more LRU like approach where not all bitsets are stored in memory. However I still need the term and count, even though the actual bitsets for less used facet values might be flushed out from cache and read again at demand. My second idea is simply writing the whole structure to disk. And using a memory mapped file to access it. In a sense I would then leave it to the operating system to do the optimization of memory usage.  Since I already sort the list by most popular values it would give me a sort of clustered access – thereby minimizing paging.

I’ll get back on this one.