Archive for .NET

In C# interface implementations are not inherited

It may be obvious to some readers, however I was a little bit surprised when I discovered that. Actually, I realized this by looking at a non-trivial class hierarchy in real world application. One can easily think that discussion about inheritance is kind of theoretical area and it primarily appears during job interviews, but it is not true. I will explain real use case and real reasoning behind this hierarchy later in this post, now please take a look at the following program. Generally, the point is that 1) we have to use reference of an interface type and we want more than one specialized implementations of the interface 2) we need to have class B inherit from class A. Without the second requirement it would be obvious: it would be sufficient just to write two separate implementations of IActivity and we are done.

using static System.Console;

namespace ConsoleApplication2
{
    public interface IActivity
    {
        void DoActivity();
    }

    public class A : IActivity
    {
        public void DoActivity()
        {
            WriteLine("A does activity");
        }
    }

    public class B : A
    {
        public new void DoActivity()
        {
            WriteLine("B does activity");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            IActivity ia = new B();
            ia.DoActivity();
            ReadKey();
        }
    }
}

It prints A does activity despite ia variable storing reference to an object of type B and also despite explicitly declaring hiding of base method. It was not clear to me why is it so. It is obvious that the type B has its own implementation, so why is it not run here? To overcome this I initially created base class declared as abstract:

using static System.Console;

namespace ConsoleApplication2
{
    public abstract class Base
    {
        public abstract void DoActivity();
    }

    public interface IActivity
    {
        void DoActivity();
    }

    public class A : Base, IActivity
    {
        public override void DoActivity()
        {
            WriteLine("A does activity");
        }
    }

    public class B : A
    {
        public override void DoActivity()
        {
            WriteLine("B does activity");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            IActivity ia = new B();
            ia.DoActivity();
            ReadKey();
        }
    }
}

It prints B does activity, but it also is overcomplicated. Then I came up with simpler solution — it turns out we have to explicitly mark class B as implementing IActivity.

using static System.Console;

namespace ConsoleApplication2
{
    public interface IActivity
    {
        void DoActivity();
    }

    public class A : IActivity
    {
        public void DoActivity()
        {
            WriteLine("A does activity");
        }
    }

    public class B : A, IActivity
    {
        public new void DoActivity()
        {
            WriteLine("B does activity");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            IActivity ia = new B();
            ia.DoActivity();
            ReadKey();
        }
    }
}

It prints B does activity, but it is not perfect. Method hiding is not a good practice. Finally I ended up with more elegant (and the simplest, I guess) solution:

using static System.Console;

namespace ConsoleApplication2
{
    public interface IActivity
    {
        void DoActivity();
    }

    public class A : IActivity
    {
        public virtual void DoActivity()
        {
            WriteLine("A does activity");
        }
    }

    public class B : A, IActivity
    {
        public override void DoActivity()
        {
            WriteLine("B does activity");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            IActivity ia = new B();
            ia.DoActivity();
            ReadKey();
        }
    }
}

In here we are using virtual and override modifiers to clearly specify an intention of specializing the method. It is better than previous one, because just by looking at class A we are already informed that further specialization is going to occur.

The real usage scenario is that we have an interface representing Entity Framework Core context. We have two distinct implementations: one uses real database and the other uses in-memory database for tests. The latter inherits from the former because what inheritance is all about is copying code. We just want to have the same methods for in-memory implementation like for regular one, but with some slight modifications e.g. in methods executing raw SQL. We also have to use an interface to refer to these objects, because this is how dependency injection containers work.

As you can see, what might seem purely theoretical, actually is used in real world line of business application. Although I have been pretty confident I understand the principles of inheritance in object oriented programming since my undergrad computer science lectures, but as I mentioned, I was surprised discovering that we have to explicitly put : IActivity on class B. The implementation has already been there. Anyway, this example encourages me to be always prepared to verify assumptions I make.

Beware LINQ Group By custom class

Actually this one is pretty obvious. But if you are focused on implementing some complex logic, it is easy to forget about this requirement. Let’s assume there is following piece of code:

var processedData = 
    from q in rawDataFromDb
    group q by new ProductCodeCentreNumPair 
        { ProductCode = q.Code, CentreNumber = q.CentreNumberIdentifier } into g
    select g
    ;

public class ProductCodeCentreNumPair
{
    public string ProductCode;
    public string CentreNumber;
}

This will compile and run, however it will not distinguish ProductCodeCentreNumPair instances. I.e. it will not do the grouping, but produce a sequence of IGroupings, each for corresponding source item. The reason is self-evident, if we try to think for a while. This custom class does not have custom equality comparison logic implemented. Default logic is based on ReferenceEquals so, as each separate object resides at different memory address, they all will be recognized as not equal to each other. Even if they contain the same values in their fields (strings behave like value types, although they are reference types). I used the following set of overridden methods to provide my custom equality comparison logic to solve the problem. It is important to note, that GetHashCode is also needed in order for the grouping to work.

public override bool Equals(object other)
{
    if (ReferenceEquals(null, other)) return false;
    if (ReferenceEquals(this, other)) return true;
    if (this.GetType() != other.GetType())
    	return false;
    return Equals((ProductCodeCentreNumPair)other);
}

public override int GetHashCode()
{
    return ProductCode.GetHashCode() + CentreNumber.GetHashCode();
}

public bool Equals(ProductCodeCentreNumPair other)
{
    var result =
    other.CentreNumber == CentreNumber &&
    other.ProductCode == ProductCode
    ;
    return result;
}

Alternatively you can use anonymous types, I mean:

var processedData = 
    from q in rawDataFromDb
    group q by new { ProductCode = q.Code, CentreNumber = q.CentreNumberIdentifier } into g
    select g
    ;

will just work. This is because instances of anonymous classes have automatically generated equality comparison logic based on values of their fields. Contrary to ReferenceEquals based implementation generated for typical named classes. They are most frequently used in the context of comparisons, so it seems reasonable.

One more alternative is to use a structure instead of a class. But structures should only be used if their fields are value types, because only then you can benefit from binary comparison of their value. And even having structs instead of classes requires implementing custom GetHashCode. By not implementing it, there is a risk that 1) the auto-generated implementation will use reflection or 2) will not be well distributed across int leading to performance problems when adding to HashSet.

When profiling a loop, look at entire execution time of that loop

Today’s piece of advice will not be backed by concrete examples of code. It will be just loose observations of some profiling session and related conclusions. Let’s assume there is a C# code doing some intense manipulations over in-memory stored data. The operations are done in a loop. If there is more data, there are more iterations of the loop. The aim of the operations is to generate kind of summary of data previously retrieved from a database. It has been proved to work well in test environments. Suddenly, in production, my team noticed this piece of code takes about 20 minutes to execute. It turned out there were about 16 thousand iterations of the loop. It was impossible to test such high load with manual testing. The testing only confirmed correctness of the logic itself.

After an investigation and some experiments it turned out that bad data structure was to blame. The loop did some lookup-heavy operations over List<T>. Which are obviously O(n), as list is implemented over array. Substitution of List<T> for Hashset<T> caused dramatic reduction of execution time to a dozen or so of seconds. This is not as surprising as it may seem because Hashset<T> is implemented over hashtable and has O(1) lookup time. These are some data structures fundamentals taught in every decent computer science lecture. The operation in a loop looked innocent and the author did not bother to try to anticipate future load of the logic. A programmer should always keep in mind the estimation of the amount of data to be processed by their implementation and think of appropriate data structure. By appropriate data structure I mean choice between fundamental structures like array (vector), linked list, hashtable, binary tree. This can be the first important conclusion, but I encouraged my colleagues to perform profiling. The results were not so obvious.

Although the measurement of timing of the entire loop with 16 thousand iterations showed clearly that hashtable based implementation performs orders of magnitude better, but when we stepped over individual loop instructions there was almost no difference in their timings. The loop consisted of several .Where calls over entire collection. These calls took something around 10 milliseconds in both List<T> and Hashset<T> implementations. If we had not bothered to measure entire loop execution, it would have been pretty easy to draw conclusion there is no difference! Even performing such step measurement during the development can me misleading, because at first sight, does 10 milliseconds look suspicious? Of course not. Not only does it look unsuspicious but also it works well. At least in test environment with test data.

As I understand it, the timings might have been similar, because we measured only some beginning iterations of the loop. If the data we are searching for are at the beginning of the array, the lookup can obviously be fast. For some edge cases even faster than doing hashing and looking up corresponding list of slots in a hashtable.

For me there are two important lessons learned here:

  • Always think of data structures and amount of data to be processed.
  • When doing performance profiling, measure everything. Concentrate on amortized timings of entire scope of code in question. Do not try to reason about individual subtotals.

Entity object in EF is partially silently read-only

What this post is all about is the following program written in C# using Entity Framework 6.1.3 throwing at line 25 and not at line 23.

We can see the simplest usage of Entity framework here. There is a Test class and a TestChild class which contains a reference to an instance of Test named Parent. This reference is marked as virtual so that Entity Framework in instructed to load an instance in a lazy manner, i.e. upon first usage of that reference. In DDL model obviously TestId column is a foreign key to Test table.

I create an entity object, save it into database and then I retrieve it at line 21. Because the class uses virtual properties, Entity Framework dynamically creates some custom type in order to be able to implement lazy behavior underneath.

Now let’s suppose I have a need to modify something in an object retrieved from a database. It turns out I can modify Value property, which is of pure string type. What is more, I can also modify Parent property, but… the modification is not preserved!. This program throws at line 25 because an assignment from line 24 is silently ignored by the framework.

I actually have been trapped by this when I was in need of modifying some collection in complicated object graph. I am deeply disappointed the Entity Framework on one hand allows modification of non-virtual properties, but on the other hand it ignores virtual ones. This can make a developer run into a trouble.

Of course I am aware it is not good practice to work on objects of data model classes. I recovered myself from this situation with AutoMapper. But this is kind of a quirk, and a skilled developer has to hesitate to even try to modify something returned by Entity Framework.

using System;
using System.Data.Entity;
using System.Linq;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main()
        {
            using (var db = new TestDBContext())
            {
                var t = new Test { Value = "Hello" };
                var c = new TestChild { Value = "Hello from child", Parent = t };
                db.TestChildren.Add(c);
                db.SaveChanges();
            }
            using (var db = new TestDBContext())
            {
                // Type of c is System.Data.Entity.DynamicProxies.TestChild_47042601AE8E209C11CC25521C2746A2B9D93EC625A6F20BA3D60926278A3D21}
                var c = db.TestChildren.First();
                c.Value = string.Empty;
                if (c.Value != string.Empty) throw new Exception();
                c.Parent = null;
                if (c.Parent != null) throw new Exception();
            }
        }
    }

    public class TestDBContext : DbContext
    {
        public TestDBContext() : base("Name=default")
        {
            Database.SetInitializer<TestDBContext>(new CreateDatabaseIfNotExists<TestDBContext>());
        }
        public DbSet<Test> Tests { get; set; }
        public DbSet<TestChild> TestChildren { get; set; }
    }

    public class Test
    {
        public int Id { get; set; }
        public string Value { get; set; }
    }

    public class TestChild
    {
        public int Id { get; set; }
        public int TestId { get; set; }
        public string Value { get; set; }
        public virtual Test Parent { get; set; }
    }

}

The bookmarks problem

I have been using Mozilla based web browsers since 2003. Back in the days, the application was called Mozilla Suite, then in 2004 the Firefox showed up using the same engine, but with completely new front end. I migrated my profile over the years many times, but I always kept bookmarks. Some of my bookmarks surely remember those early days before Firefox (yet, majority of the oldest are no longer valid, because sites were shut down). The total number of my browser bookmarks gathered over that time is over 1k. And this is `the problem`.

I had several attempts to clean up and organise this huge collection. I have tried to remove dead ones and to group them in folders. I have tried using keywords and descriptions to be able to search more effectively. But with no success. Now I have something about dozen of folders, but I still find myself in trouble when I need to search for particular piece of information. The problem boils down to that: I absolutely remember what the site is about, I am absolutely sure I have it in my collection but I cannot find it because either it has some strange title or words in URL are meaningless (Firefox searches only within titles and urls, because obviously that is all it can do).

I realized I need a tool which is much more powerful when it comes to bookmarks searching. I could not find anything to satisfy my requirements so I implemented it myself. Today I am introducing BookmarksBase which is an open source tool written in C# to solve this issue.

BookmarksBase.Search

BookmarksBase embraces a concept that may seem ridiculous: why don’t we pull all textual contents from all sites in bookmarks. Do you think it is lots of data? How much it would be? Even if you were to sacrifice a few hundreds of megs in order to be able to search really effectively, isn’t it worth that space?

Well, it turns out it takes much less space than I originally expected and the tool works surprisingly fast, although it is implemented in managed code without any distinguished optimizations. First we have to run separate tool to collect data (BookmarksBase Importer). Downloading + parsing takes maybe a minute or two. Produced index file containing all text from all sites in bookmarks, which I call bookmarksbase.xml in my case is only 12 MiB (over 1000 bookmarks). Then we can run BookmarksBase Search that allows us to perform actual searching within contents/addresses/titles. Surely, when you have bookmarksbase.xml created you can run whatever serves the purpose for you e.g. grep, findstr (in Windows) or any kind of decent text editor that can handle big amounts of text. I crafted XML so that it can be easily readable by human: there is new lines, and the text is preserved in nice column of fixed width (thanks to Lynx — see source for details).

More details and download link are available on GitHub

My GitHub + first simple project published

I have eventually set up my GitHub account and published some of my code. The URL of the account is:

https://github.com/przemsen.

And the first and very basic project is:

https://github.com/przemsen/WebThermometer.

WebThermometer

WebThermometer

WebThermometer is a WPF application to be used as a desktop gadget. It repeatedly downloads (default is 5 min. interval) current temperature from arbitrary web site and displays it. I personally find it useful as I like to observe current weather conditions right from my computer. I tried to write in a way so that it can easily be modified for use with other data sources. You can also download already compiled and ready to run version from my Polish blog.

My plan is to successively select some of my entire projects and some code snippets which in my opinion are and/or will somehow be valuable to show and demonstrate. You can freely modify and recompile all of the published code providing that you specify it has originally been authored by me.

PS. Today auto updating mechanism of my WordPress failed (apparently this sometimes happens) and I ended up with damaged entire installation. I restored from backup and I apologize for deleting few comments since last 2 months.