Neverending struggle trying to develop Windows Phone Apps

So, I got some spare time during the summer and thought I should take a quick look at Windows Phone Development.

I actually have a few app ideas, and I was – a few years ago – able to create a Android app (from absolute scratch!). Installing IntelliJ, getting it to work with emulator, understanding app lifecycle etc – coding JAVA!. Yes a lot of new things, but in a few weeks I did have a

pretty polished LOB app up and running.

Now how hard could it be to get the same thing working for Windows Phone? I mean, C#/Visual Studio. I live and breathe it. However it turned out to be a neverending(??) nightmare.

As I have decided to use Azure VMs for development a lot more, I went the route of creating an instance of Windows Server 2012 with VS 2015 Community preinstalled.

Everything seemed to be installed, hooray – all the phone tools and sdks. So lets get started – decided to just create a new app from the templates (Hub app or whatever it was called)

First error: need a app developer license. And to be able to request that I was told needed to install the Desktop experience feature…

Finally found it. Install. Reboot box. Run again. Fine.

Now it all compiled fine, but when I tried to debug:

can’t start emulator. Cause it depends on Hyper-V. And we are running in a Hyper-V box so it’s not possible – it seems after hours of googling…

Ok, so I should run it locally on my own box instead. After struggling for a while reclaiming disc space on my old Windows 7 box I finally got VS2015 installed.

But now it doesn’t even look the same as it did on the Azure VM. Not the same project templates. The only Windows phone template I have now is “TestApplication”

and when trying to create a project from it I get “This template attempted to load component assembly ‘Microsoft.VisualStudio.SmartDevice.ProjectSystem.Base'”

I am pretty sure something about running Windows 7 is the problem…

Ok, how about just upgrading my old W7 box then…Yes, I’m waiting for the free W10 upgrade. But that wont be in yet another 12 days.

Maybe I could install the W10 preview? Been there – tried that. The preview program is not open anymore…since it’s so close to release.

Ok, so instead I decided to buy a new MSDN subscription (actually havn’t had a sub for almost 2 years I just found out). I could then quickly download and do a clean install of 8.1 or even the W10 preview I thought.

And so I did. A few hours ago I ordered the Professional subscription. However – now my order has only passed Pending and entered Confirmation status. I will not be able to download from MSDN until it is “activated”. And when checking emails from last time I now released it took 5 days!!!

Oh well. I think I will upgrade my IntelliJ IDE instead and focus on some Android development instead…

Lucene.net facet optimise memory utilization

As I stated in my previous post – memory consumption is inevitable if you wanna do facet calculation against a larger dataset AND still meet somewhat acceptable performance requirements.

After experiencing that myself, seeing memory usage really spike on one of my bigger sites using the MultifacetLucene library I spent some days thinking and experimenting in how it could be optimized without sacrificing too much on the performance side.

Let me start by saying a little about my reference index. I have 240000 documents, two columns are meant to be faceted. One containing around 20000 unique values, the other one only a couple of hundred. On my desktop machine (where I don’t have enough room to place the index on the SSD disk, but therefore use my regular mechanical disk to store the index) I have a test suite running 100 consequtive queries (with facets) and I do get somewhere around 42ms in response time in average. In fact on this particular box I think 50 ms is acceptable.

Now, when it comes to memory – the actual structure to store the facet information and bitsets weighs in at around 111 MB …That is NOT good…

So – I’ve been playing with a LRU Cache for starters. Thinking I should only store just X percent of the bitsets in memory. And calculate the rest when asked for. But I found that in such a performance critical library actually handling the cache could very well turn into a bottleneck/become a resource hog itself.

My second shot was to kind of simplify the actual caching logic. I turned it into a “most likely to be used” cache. Cause, remember, I already do some optimization when calculating the facet values – shortcutting the loop as early as possible. Meaning I do have some clues which facet values will probably calculated the most – the most frequent ones that is! They will be used in most webpages, based on pure logic.

So a really simple implementation, where I just store 50 percent of the bitsets in memory gave 50 percent less memory utilization – and still performance was only hit by 4-5 ms. I think this might be the way to go.

I still have some code refactoring to do before I will release it, but I hope to be able to try it on my own sites anyday now.  My idea is to make the optimization extendable – like being able to feed the FacetSearcher a IMemoryOptimizer instance. So it would be easy enough to override/provide your own strategy.

bool ShouldLazyLoad(string fieldAttributeName, FacetSearcher.FacetValues.FacetValueBitSet bitSet, int index, IEnumerable<FacetSearcher.FacetValues.FacetValueBitSet> allFacetValueBitSets, int totalCount);

I mean, some attributes you might really want to keep in memory. Or you might wanna do a strategy where only attributes where total number of attributes is more than 15000 values or something like that.

New version of MultiLiceneNet library, both at Github and Nuget

Just a headsup – a new release is available.

NUGET

GITHUB

– Nuget now supporting Framework 4.5 as well

– nasty memory leak fixed, cussing used memory to half

There are two notable things to comment on when it comes to the memory leak.

a)  facet calculation IS memory intense. It has to be for performance to be acceptable. Basically we need our full dataset bitsets saved in memory for each possible facet value.

b) I did notice the leak cause I am using my library for real. I’m using it on sites with million of documents, And it’s really cool to see acceptable search and facetting performance, despite the large datasets.

I will however think a bit about optimizing it even further. I have some ideas about a more LRU like approach where not all bitsets are stored in memory. However I still need the term and count, even though the actual bitsets for less used facet values might be flushed out from cache and read again at demand. My second idea is simply writing the whole structure to disk. And using a memory mapped file to access it. In a sense I would then leave it to the operating system to do the optimization of memory usage.  Since I already sort the list by most popular values it would give me a sort of clustered access – thereby minimizing paging.

I’ll get back on this one.

Lucene Facet calculation performance optimizations

Ok, so I did put my MultiFacetLucene library to the test. Downloaded a  pretty large database containing 140000 posts. Two columns of which I wanted to do facetting on. One of them containing around 20000 distinct values. The other just about 100.

So this means – we are gonna do 20000 facet bitset queries and calculations. I.e with the current codebase.

I did try it out and well, no- 600 ms (against a physical disk directory – not RAM) is nothing too be too proud of. Searching without facets is 0 ms…

So what to do to speed things up? First of all – let me tell you I did speed it up. 600 ms down to 35ms. Now, since I (also) changes the API a bit I will in a few days time release a version 2. Containing the optimizations.

Basically it’s all about not calculating everything. For starters it doesn’t makes sense to calculate 20000 facet values and counts for a specific search. In reality – how are you gonna show that in a webpage to a user? 20000 checkboxes?

What you typically want to do is offer checkboxes on the most popular facets. So I modified the api to let you (for each facet) specify a MaxToFetchExcludingSelections parameter. Cause note that facets the user has selected will always be calculated.

Anyway; now with that parameter in palce, how can we use that to end our loop as soon as possible premature?  Well, remember we have two bitsets for each facetvalue. One saying, GREEN exists 14 times in ALL documents while BLUE exists 22.

Then generate a new bitset based on current user search and facets ecluding the current one. This might say GREEN exists in 12 documents and BLUE in 5. And those are ANDED to get the final result.

So the optimization is: sort the first (ALL document) bitsets based on count – descending. This will mean we will start calculating facetcount for the most overall values. We add all calculated facet counts to a list. When the list is full (we asked for say 8 facet values) – we from now on compare the “ALL document bitset count” with the least value in our 8 objects list.

I.E

If the 8th facetc ount says “BLUE” 34 . And we know that YELLOW exists in only 33 documents overall, we can bail out. Since we calculate facets based on the count in the overall bitsets.

I will post some code, describing it better, when I have published the new version of the library.

Facets in Lucene.net – the theory

Ok, so lets talk a a little about the actual facetting theory. Or to be more exact: lets talk about how it’s implemented in Lucene.NET. Cause thanks to Lucene, facet calculation boils down to pretty simple bitmapping arithmetics.

What we do is, when facet calculation for a specific field for the first time is being requested, we read (and cache) bitmaps from the Lucene index. Each possible value is given a bitmap of its own. So for value “BLUE” a bitmap could be 01001. That would mean that for the specific facet field, document number 2 and 5 indeed contains the value “BLUE”. As a side note, fetching all term values is supported from core Lucene.net and it even offers an bitmap abstraction, through the OpenBitSetDISI class

private FacetValues ReadBitSetsForValues(string facetAttributeFieldName)
        {
            var facetValues = new FacetValues();
            facetValues.Term = facetAttributeFieldName;

            facetValues.FacetValueBitSetList.AddRange(
                GetFacetValueTerms(facetAttributeFieldName).Select(fvt => new FacetValues.FacetValueBitSet
                {
                    Value = fvt.Term,
                    Filter = fvt.Filter,
                    OpenBitSetDISI =
                        new OpenBitSetDISI(fvt.Filter.GetDocIdSet(IndexReader).Iterator(), IndexReader.MaxDoc)
                }));

            return facetValues;
        }

        private IEnumerable<FacetValueTermFilter> GetFacetValueTerms(string facetAttributeFieldName)
        {
            var termReader = IndexReader.Terms(new Term(facetAttributeFieldName, String.Empty));
            do
            {
                if (termReader.Term.Field != facetAttributeFieldName)
                    yield break;

                var facetQuery = new TermQuery(termReader.Term.CreateTerm(termReader.Term.Text));
                var facetQueryFilter = new CachingWrapperFilter(new QueryWrapperFilter(facetQuery));
                yield return new FacetValueTermFilter {Term = termReader.Term.Text, Filter = facetQueryFilter};
            } while (termReader.Next());
        }


Not too many rows of code to do that, right?

So the FacetSearcher class in our library offers a SearchWithFacets function. Apart from running the regular query to fetch the actual documents (that query could be “searchword:’hello’ – we shold also fetch all facet values matching that original query. Now the bitmap aritmetics comes into play.

If we generate a new bitmap for the ‘searchword:hello’ filter we might get a 11000 result. Only document 1 and 2 matches that search filter.

Now, Simply ANDing the 01001 with 11000 will tell us that only document 2 matches BLUE and searchword:hello

That’s the basics of Lucene facetting! However, the beauty of my library is how it takes into consideration other facet selection. The “If we generate a new bitmap for the ‘searchword:hello’ filter” is not entirely true, that’s my point.

Lets say we have selected COLOR:BLUE and SIZE:MEDIUM. Other lucene.net facet implementations I’ve seen uses both those filters when calculating facets. Which will completely rule out the possibility to do multi value selection. I.e COLOR:RED OR COLOR:BLUE. Anyway, I’ve talked about that way too much already in my earlier posts, so what I wanna show you is a demo site I whipped together using the MultiFacetLucene library.

Lets go to the search page of  Aviation questions – searching for the word aviation gave us 69 results.

To the right you see facet values. Click one of them and search result is filtered. Facet results are also updated, but one facet value dopes not rule out the other options in that facet field.

Multi facets and lucene – behind the scenes

So how does the code from the previous article on Lucene.net and the MultiFacetLuceneNet library I have published work?

Well the secret lies in how to run the facet queries. Basically when calculating facets from field “X” we need to dynamically create a query – the original MINUS the X term selection. And same thing when calculating facets for field “Y” we need to remove the Y selection from the original query.

Have a look at the CreateFacetedQuery function.

protected Query CreateFacetedQuery(Query baseQueryWithoutFacetDrilldown, IEnumerable<SelectedFacet> selectedFacets, string facetAttributeFieldName)
{
    var facetsToAdd = selectedFacets.Where(x => x.FieldName != facetAttributeFieldName).ToList();
    if (!facetsToAdd.Any()) return baseQueryWithoutFacetDrilldown;
    var booleanQuery = new BooleanQuery {{baseQueryWithoutFacetDrilldown, Occur.MUST}};
    foreach (var selectedFacet in facetsToAdd)
	{
        if (selectedFacet.SelectedValues.Count == 1)
            booleanQuery.Add(new TermQuery(new Term(selectedFacet.FieldName, selectedFacet.SelectedValues[0])), Occur.MUST);
        else
        {
            var valuesQuery = new BooleanQuery();
			foreach (var value in selectedFacet.SelectedValues)
			{
				valuesQuery.Add(new TermQuery(new Term(selectedFacet.FieldName, value)), Occur.SHOULD);
			}
			booleanQuery.Add(valuesQuery, Occur.MUST);
		}
	}
	return booleanQuery;
}

As you can see we feed it with the selected facets (could be “COLOR:blue+green”, “SIZE:Medium” ) but exclude the facet matching the current one we are calculating (facetAttributeFieldName)

Which means when we calculate the COLOR facets we well generate a query saying {original query} AND SIZE:’Medium’

Likewise when we calculate the CATEGORY facets the query will be {original query} AND (COLOR:’blue’ or COLOR:’green’)

Next post will describe the actual facet calculation algoritm, along with the stupid mistake I did which (when found out) gave me 100x times better performance…(the “100x times faster” checkin at  https://github.com/aspcodenet/MultiFacetLuceneNet )

Also: dont miss the NUGET package for MultiFacet Lucene.NET

Lucene.net and multiple facets

Consider a site with a set of “checkbox navigation” groups on a search result page. To let the user drilldown the search result.

Say we were searching for the phrase Programming.

Checkbox navigation would the be, for example

CATEGORIES

====================

Blog (12) []

MVC (4) []

Messaging (3) []

WRITTEN

====================

April 2014(2) []

May 2014(9) []

Jun 2014 (14) []

The whole idea is, as I said, to be able to drill down the search result to say articles matching “Programming”, tagged in category MVC – AND written in April 2014 or May 2014.

The checkboxes inside each group are OR:ed together and AND:ed with other groups. Combined with original search phrase it limits the actual documents returned.

However – when the user clicks on say MVC – then of couse the DATE options – and numbers – update so it matches the correct number. I.e the number is to be though of as a expectation. If i click this checkbox as well I will be given X more documents (hrmm, not entirely true as one document could have multiple categories etc). But my main point is that calculation of facets in each respective group should not include itself in the filter .

This is (better) described at the SOLR site: http://wiki.apache.org/solr/SimpleFacetParameters – scroll down to Tagging and excluding Filters

ANyway, since I have yet to fiond an extension library to Lucene.net offering this kind of facet behaviour I decided to implement it myself.

So here it goes: MultiFacetLueceneNet on github