Archive for .NET

When profiling a loop, look at entire execution time of that loop

Today’s piece of advice will not be backed by concrete examples of code. It will be just loose observations of some profiling session and related conclusions. Let’s assume there is a C# code doing some intense manipulations over in-memory stored data. The operations are done in a loop. If there is more data, there are more iterations of the loop. The aim of the operations is to generate kind of summary of data previously retrieved from a database. It has been proved to work well in test environments. Suddenly, in production, my team noticed this piece of code takes about 20 minutes to execute. It turned out there were about 16 thousand iterations of the loop. It was impossible to test such high load with manual testing. The testing only confirmed correctness of the logic itself.

After an investigation and some experiments it turned out that bad data structure was to blame. The loop did some lookup-heavy operations over List<T>. Which are obviously O(n), as list is implemented over array. Substitution of List<T> for Hashset<T> caused dramatic reduction of execution time to a dozen or so of seconds. This is not as surprising as it may seem because Hashset<T> is implemented over hashtable and has O(1) lookup time. These are some data structures fundamentals taught in every decent computer science lecture. The operation in a loop looked innocent and the author did not bother to try to anticipate future load of the logic. A programmer should always keep in mind the estimation of the amount of data to be processed by their implementation and think of appropriate data structure. By appropriate data structure I mean choice between fundamental structures like array (vector), linked list, hashtable, binary tree. This can be the first important conclusion, but I encouraged my colleagues to perform profiling. The results were not so obvious.

Although the measurement of timing of the entire loop with 16 thousand iterations showed clearly that hashtable based implementation performs orders of magnitude better, but when we stepped over individual loop instructions there was almost no difference in their timings. The loop consisted of several .Where calls over entire collection. These calls took something around 10 milliseconds in both List<T> and Hashset<T> implementations. If we had not bothered to measure entire loop execution, it would have been pretty easy to draw conclusion there is no difference! Even performing such step measurement during the development can me misleading, because at first sight, does 10 milliseconds look suspicious? Of course not. Not only does it look unsuspicious but also it works well. At least in test environment with test data.

As I understand it, the timings might have been similar, because we measured only some beginning iterations of the loop. If the data we are searching for are at the beginning of the array, the lookup can obviously be fast. For some edge cases even faster than doing hashing and looking up corresponding list of slots in a hashtable.

For me there are two important lessons learned here:

  • Always think of data structures and amount of data to be processed.
  • When doing performance profiling, measure everything. Concentrate on amortized timings of entire scope of code in question. Do not try to reason about individual subtotals.

Entity object in EF is partially silently read-only

What this post is all about is the following program written in C# using Entity Framework 6.1.3 throwing at line 25 and not at line 23.

We can see the simplest usage of Entity framework here. There is a Test class and a TestChild class which contains a reference to an instance of Test named Parent. This reference is marked as virtual so that Entity Framework in instructed to load an instance in a lazy manner, i.e. upon first usage of that reference. In DDL model obviously TestId column is a foreign key to Test table.

I create an entity object, save it into database and then I retrieve it at line 21. Because the class uses virtual properties, Entity Framework dynamically creates some custom type in order to be able to implement lazy behavior underneath.

Now let’s suppose I have a need to modify something in an object retrieved from a database. It turns out I can modify Value property, which is of pure string type. What is more, I can also modify Parent property, but… the modification is not preserved!. This program throws at line 25 because an assignment from line 24 is silently ignored by the framework.

I actually have been trapped by this when I was in need of modifying some collection in complicated object graph. I am deeply disappointed the Entity Framework on one hand allows modification of non-virtual properties, but on the other hand it ignores virtual ones. This can make a developer run into a trouble.

Of course I am aware it is not good practice to work on objects of data model classes. I recovered myself from this situation with AutoMapper. But this is kind of a quirk, and a skilled developer has to hesitate to even try to modify something returned by Entity Framework.

The bookmarks problem

I have been using Mozilla based web browsers since 2003. Back in the days, the application was called Mozilla Suite, then in 2004 the Firefox showed up using the same engine, but with completely new front end. I migrated my profile over the years many times, but I always kept bookmarks. Some of my bookmarks surely remember those early days before Firefox (yet, majority of the oldest are no longer valid, because sites were shut down). The total number of my browser bookmarks gathered over that time is over 1k. And this is `the problem`.

I had several attempts to clean up and organise this huge collection. I have tried to remove dead ones and to group them in folders. I have tried using keywords and descriptions to be able to search more effectively. But with no success. Now I have something about dozen of folders, but I still find myself in trouble when I need to search for particular piece of information. The problem boils down to that: I absolutely remember what the site is about, I am absolutely sure I have it in my collection but I cannot find it because either it has some strange title or words in URL are meaningless (Firefox searches only within titles and urls, because obviously that is all it can do).

I realized I need a tool which is much more powerful when it comes to bookmarks searching. I could not find anything to satisfy my requirements so I implemented it myself. Today I am introducing BookmarksBase which is an open source tool written in C# to solve this issue.

BookmarksBase.Search

BookmarksBase embraces a concept that may seem ridiculous: why don’t we pull all textual contents from all sites in bookmarks. Do you think it is lots of data? How much it would be? Even if you were to sacrifice a few hundreds of megs in order to be able to search really effectively, isn’t it worth that space?

Well, it turns out it takes much less space than I originally expected and the tool works surprisingly fast, although it is implemented in managed code without any distinguished optimizations. First we have to run separate tool to collect data (BookmarksBase Importer). Downloading + parsing takes maybe a minute or two. Produced index file containing all text from all sites in bookmarks, which I call bookmarksbase.xml in my case is only 12 MiB (over 1000 bookmarks). Then we can run BookmarksBase Search that allows us to perform actual searching within contents/addresses/titles. Surely, when you have bookmarksbase.xml created you can run whatever serves the purpose for you e.g. grep, findstr (in Windows) or any kind of decent text editor that can handle big amounts of text. I crafted XML so that it can be easily readable by human: there is new lines, and the text is preserved in nice column of fixed width (thanks to Lynx — see source for details).

More details and download link are available on GitHub

My GitHub + first simple project published

I have eventually set up my GitHub account and published some of my code. The URL of the account is:

https://github.com/przemsen.

And the first and very basic project is:

https://github.com/przemsen/WebThermometer.

WebThermometer

WebThermometer

WebThermometer is a WPF application to be used as a desktop gadget. It repeatedly downloads (default is 5 min. interval) current temperature from arbitrary web site and displays it. I personally find it useful as I like to observe current weather conditions right from my computer. I tried to write in a way so that it can easily be modified for use with other data sources. You can also download already compiled and ready to run version from my Polish blog.

My plan is to successively select some of my entire projects and some code snippets which in my opinion are and/or will somehow be valuable to show and demonstrate. You can freely modify and recompile all of the published code providing that you specify it has originally been authored by me.

PS. Today auto updating mechanism of my WordPress failed (apparently this sometimes happens) and I ended up with damaged entire installation. I restored from backup and I apologize for deleting few comments since last 2 months.

The basics do matter

Recently I have spotted the following method in the large C# code base:

This code works and does what it supposed to do. However, I had a slight inconvenience while debugging it. I tend to frequently use Visual Studio DEBUG->Exceptions->CLR Exceptions Thrown (check this out!) functionality which to me is invaluable tool for diagnosing actual source of an exception. The code base relied heavily on this very ConvToInt method, thus it generated lots of exceptions and caused Visual Studio to break in with the debugger over and over again. I then had to disable CLR Exceptions Thrown to protect myself from being hit by flying exceptions all the time. Having switched this off I ended up with somehow incomplete diagnosing capabilities. It is bad either way. So, what I did was basically simple refactoring:

This method also works. One can even argue for better performance of this code, because throwing exceptions is considered to be slow. And this is also correct. Although performance was not key factor here (for line of business applications rarely is), but I measured it anyway. I ran both methods in a for loop 5 million times in release mode having wrapped them with appropriate calls to the methods of Stopwatch class. The results are surely not surprising. For valid string representations of a number, the former method (i.e. one using System.Convert) gave the average result of

663 milliseconds

and the latter (i.e. one using TryParse) gave the average result of

642 milliseconds

We can safely assume both methods have the same performance in this case. Then I ran the test with a not valid string representation of a number (i.e. passing “x” as an argument). Now the TryParse version gave the average result of:

546 milliseconds

And the System.Convert version, which indeed repeatedly threw exceptions gave the (not average, I ran this once) result of

233739 milliseconds

That is a huge difference in 3 orders of magnitude. Then I was fairly convinced my small and undoubtedly not impressive refactoring was right and justified. Except that it is not correct. It has worked well and has been successfully tested. But after a few weeks, when a different set of use cases was being tested, the application called ConvToInt with -1 in the second argument. It turned out, that the method returned 0, not the -1 for invalid string representations of a number. What I want to convey here is:

TryParse sets its out argument to 0, even if it returns false and did not successfully convert a string value to a number.

I scanned the code base and have found this pattern a few times. Apparently I was not the only programmer to not know this little fact about TryParse method. Of course, it is well documented (http://msdn.microsoft.com/en-us/library/f02979c7.aspx). The problem with this very API to me seems even more serious. The 0 value is supposed to be the most frequently used number value when it comes to string conversion failure in general. However, in a construct like this above, it comes from TryParse itself, despite the fact that it is provided by the caller and, more importantly, is primarily expected to be used as a default number value in case of failure. One can easily get into trouble when he or she expects (and passes as argument) different default value, e.g. -1 and still receives 0 because TryParse works this way by definition. Obviously the solution here is to add an if statement:

The performance does not get significantly worse because of this one conditional statement, I measured it and it is roughly the same.

The lessons learned here:

  • Exceptions actually ARE EXPENSIVE. This is NOT a myth.
  • Do not rely on the value passed as out variable to TryParse method in case of a failure. Always back up yourself with an if statement and check for the failure.
  • More general one: learn the APIs, go to the documentation, do not simply assume you know what the method does. Even if it comes to basics. The descriptive and somewhat verbose method name can still turn into an evil when ran under edge cases. Always be 100% sure about what is the contract between the API authors and you, i.e. what the API actually does.

Enabling the net.tcp protocol in WCF running on top of IIS — the checklist

Windows Communication Foundation is becoming sort of obsolete in favor of ASP.NET Web API, which has been advertised as primary technology for building web services. However, the latter obviously cannot serve as a full equivalent of the former. WCF still is a powerful technology for enterprise class service oriented architecture. For that purposes, the decision of switching the transport protocol from http to net.tcp sooner or later must be made clearly for performance reasons. From my experience I can tell having 100% working configuration of a service hosted inside the IIS is surprisingly hard and a developer has to face a series of quirks to bring the services back to life. Let’s sum up all the activities that can help make a service working with the net.tcp.

  1. WCF Non-HTTP Activation service must be installed in Add/Remove programs applet of the Control Panel. It is not obvious, that it is a component of the operating system itself, not of the IIS.
  2. The TCP listener processes must be running. Check netstat -a to see if there is a process listening on the port of your choice (the default is 808), check the following system services and start them if need be: Net.Tcp Listener Service and Net.Tcp Port Sharing. I have observed cases, where those services were unexpectedly shut down, e.g. after restart of the operating system.
  3. IIS management: the application must have net.tcp protocol enabled in its properties, as well as the site must have bindings for that protocol configured. If you have large number of services you can use my simplistic C# program which parses the IIS global configuration file — applicationHist.config. Link to my OneDrive
  4. If this is first try of enabling the net.tcp protocol, run the following tools to ensure the components of the .NET Framework are correctly set up: c:\Windows\Microsoft.NET\Framework\v4.0.30319\aspnet_regiis.exe -i and c:\Windows\Microsoft.NET\Framework\v4.0.30319\servicemodelreg.exe -ia. Use Framework64 for 64 bit system.
  5. Make sure that you are not running on the default endpoint configuration. The default configuration can be recognized in the WSDL of the service. It contains <msf:protectionlevel>EncryptAndSign</msf:protectionlevel> code which is responsible for default authorization settings. These defaults manifest themsevles in a strange symptom of the service working in Test Client and not working in target client application. It is caused by Test Client having successfully recognized default binding configuration from WSDL whereas the target application uses your custom configuration and it is very likely these two do not match.
  6. Check for the equality of the service names in an .svc file and in the Web.config file (assuming that declarative binding is used instead of programmatically created one) in section <system.serviceModel> -> <services> -> <service>
  7. Make sure that IIS has created a virtual directory for the application. Create it from Visual Studio by pressing appropriate button in the application’s properties.