Archive for .NET

Things I’ve learned about SQL Server the hard way

In this post I am presenting a couple of things I’ve learned from the analysis of a problem, that manifested itself in an occasional HTTP 500 errors in production instance of an ASP.NET application. This time I don’t aim at exhaustively explaining every single point, because each of them could be a subject of a dedicated blog post.

The story begins with SQL error: SQLEXCEPTION: Transaction was deadlocked on lock resources with another process and has been chosen as the deadlock victim.

  1. In any reasonably modern version of SQL Server Management Studio there is an XEvent session system_health under ManagementExtended Events. It allows for viewing some important server logs, among which xml_deadlock_report in particularly interesting. It is very important to have an access to the production instance of database server in order to be able to watch the logs.
  2. System health XEvent session
  3. In this particular case, these xml_deadlock_reports contained one suspicious attribute: isolationlevel = Serializable (4) and the SQL code was a SELECT. I would not expect my SELECTs running with Serializable isolation level.
  4. Details of a deadlock
  5. The isolation level is an attribute of a connection between a client and the database server. A connection is called session in SQL Server terminology. An explicit BEGIN TRAN is not necessary for the isolation level to be applied. Every SQL statement runs in its own statement-wide transaction. However, for such narrow-scoped transactions, in practice it may not make any difference whether you raise the isolation level or not. The difference can be observed when a transaction is explicit and spans multiple SQL statements.
  6. The cause of setting the serialization level to Serializable was the behaviour of the TransactionScope [1]. If you use it, it raises the isolation level. It is just a peculiarity of this very API of the .NET framework. It is good to know this.
  7. SQL Server, at last in 2012 and some (I am not sure exactly which ones) later versions, does not reset the isolation level when ADO.NET disposes of a connection. A connection returns back to the connection pool [2] and is reused by subsequent SqlConnection objects unless they have different connection string.
  8. The connection pool size, if the connection pooling is active, poses the limit of how many concurrent connections to a database server a .NET application can make. If there are no free connections in the pool, an exception is thrown [3].
  9. Eliminating the usage of TransactionScope did not solve the issue. Even if you run SELECTs under the default Read Committed isolation level, these still issues Shared locks which may deadlock with Exclusive locks of UPDATEs. In any reasonably high production data traffic, where SELECTs span multiple tables, which are also very frequently updated, it is highly probable, that a deadlock will occur.
  10. The difference between running SELECT under Serializable isolation level and Read Committed level is that in the former, the locks are kept from the moment of executing the SELECT until the transaction ends. You can observe it by manually beginning a Serializable transaction, running any SELECT and observing dm_tran_locks DMV and only then committing (or rolling back, whatever) the transaction. With Read Committed level locks are not kept until an explicit transaction ends, they are released immediately after execution of a SELECT finishes. These are the same kind of locks, Shared locks. This implies one cannot observe the difference between executing a SELECT under Serializable and Read Committed, when there is no explicit transaction and thus, there is only a statement-wide transaction which releases locks immediately after the results are returned.
  11. Setting isolation level of Read Uncommitted is practically equivalent to running a SELECT WITH(NOLOCK) hint, even if you don’t explicitly open a transaction.
  12. In Entity Framework a SqlConnection is opened for every materialization of the query, the results are returned, and the connection is immediately closed and returned back to the pool [5]. The connection lifetime is by no means related to the scope of DbContext object. I can see a kind of similarity between how Entity Framework uses SqlConnections and how ASP.NET makes use of threads when executing async methods. A thread is released on every await and can be used for doing something more valuable than waiting. Similarly, a SqlConnection is released right after materialization and can be used for executing different command, in different request (in case of ASP.NET) even before DbContext is disposed of.
  13. It is not that obvious how to reset the isolation level of the connection. You see, every time your C# code using Entity Framework results in sending a SQL to the SQL Server, it can take different connection from the pool (if anyone knows if there is any ordering applied when retrieving connections from the pool, please feel free to comment). It may or may not be the same connection you used previously. Consequently, it is not easy to ‘catch’ the underlying connection using Entity Framework. You can call BeginTransaction every time you use DbContext, and then you are guaranteed to own the connection for all your SQL commands. But that way you are forcing opening transaction when you don’t really need one. What I recommend is to handle StateChange event of DbConnection object as described in [4]. You can do it either on opening the connection or on closing it.
  14. In SQL Server you can monitor open sessions with the following query:
    select
        DB_NAME(s.database_id) as DbName,
        case s.transaction_isolation_level
                WHEN 0 THEN 'Unspecified'
                WHEN 1 THEN 'ReadUncommitted'
                WHEN 2 THEN 'ReadCommitted'
                WHEN 3 THEN 'RepeatableRead'
                WHEN 4 THEN 'Serializable'
                WHEN 5 THEN 'Snapshot'
        end as Level,
        s.session_id as Session
    from sys.dm_exec_sessions s
    where DB_NAME(s.database_id) <> 'master'
    order by s.last_request_end_time desc;  
    

References:

[1]    https://stackoverflow.com/questions/11292763/why-is-system-transactions-transactionscope-default-isolationlevel-serializable
[2]    https://stackoverflow.com/questions/9851415/sql-server-isolation-level-leaks-across-pooled-connections
[3]    https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling
[4]    https://stackoverflow.com/questions/28442558/entity-framework-and-transactionscope-doesnt-revert-the-isolation-level-after-d
[5]    https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/bb896325(v=vs.100)#connections-and-the-entity-framework

The worst Entity Framework pitfall

I work with a quite big enterprise system in my job. Not surprisingly, it uses Entity Framework (Core, but it does not matter) and SQL Server. The system consists of multiple reusable components also in the data access layer. I had to modify DbContext and write some flexible and reusable method accepting a predicate as an argument and apply the predicate on a DbContext. Let’s assume we are using the table A from the previous post. I happily coded the signature of the method to use Func. Let’s simulate this in the LINQPad and run our Func against a DbContext.

It did not work. Or… did it? The picture above shows only generated SQL, but I promise the results show correctly the one record. The problem is, the predicate has been applied in memory after having pulled all the records from table A into memory as well. I am not going to explain what it means for any reasonably sized system. The correct way of doing this is to use Expression<Func<A, bool>>.

The explanation is in fact really obvious for anyone deeply understanding how ORMs work. The data structure which allows for inspecting a predicate on the fly and building final SQL query is Expression. There is already an infrastructure for so-called expression visitors. Please also note, that you can always get your Func from Expression<Func> by calling Compile method on it.

A few random ASP.NET Core and .NET Core tips

I’ve been working with .NET core recently and I’d like to post some random observations on this subject for the future reference.

  1. It is possible to create Nuget package upon build. This option is actually available also from the VS2017 Project properties GUI. Add this code to csproj.

     
      netstandard2.0
      true
      $(UserProfile)
      0.0.1
    
    
  2. It is possible to add local folder as Nuget feed. The folder can also be current user’s profile. This one is actually not Core specific. Nuget.config should look like this:

    < ?xml version="1.0" encoding="utf-8"?>
    
      
        
        
      
      
          
          
        
      
        
          
          
        
      
      
        
      
      
        
      
    
    
  3. You can compile for multiple targets in .NET Core compatible csproj. Please note the trailing s in the tag name. You can also conditionally include items in csproj. Use the following snippets:

      netstandard2.0;net45;net40
    

    and:

        
      
    

    There is a reference documentation for the available targets: here.

  4. The listening port in Kestrel can be configured in multiple ways. It can be read from environment variable or can be passed as command line argument. An asterisk is required to bind to physical interfaces. It is needed e.g. when trying to display the application from mobile phone when being served from development machine. The following are equivalent:

    set ASPNETCORE_URLS=http://*:11399
    --urls http://*:11399
    
  5. The preferred way to pass hosting parameters to Kestrel is launchSettings.json file located in Properties of the solution root. You can select a profile defined there when running:

    dotnet run --launch-profile "Dev"
    

    dotnet run is used to build and run from the directory where csproj resides. It is not a good idea to run the app’s dll directly. Settings file can be missing from bin folder and/or launch profile may not be present there.

In C# interface implementations are not inherited

It may be obvious to some readers, however I was a little bit surprised when I discovered that. Actually, I realized this by looking at a non-trivial class hierarchy in real world application. One can easily think that discussion about inheritance is kind of theoretical area and it primarily appears during job interviews, but it is not true. I will explain real use case and real reasoning behind this hierarchy later in this post, now please take a look at the following program. Generally, the point is that 1) we have to use reference of an interface type and we want more than one specialized implementations of the interface 2) we need to have class B inherit from class A. Without the second requirement it would be obvious: it would be sufficient just to write two separate implementations of IActivity and we are done.

using static System.Console;

namespace ConsoleApplication2
{
    public interface IActivity
    {
        void DoActivity();
    }

    public class A : IActivity
    {
        public void DoActivity()
        {
            WriteLine("A does activity");
        }
    }

    public class B : A
    {
        public new void DoActivity()
        {
            WriteLine("B does activity");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            IActivity ia = new B();
            ia.DoActivity();
            ReadKey();
        }
    }
}

It prints A does activity despite ia variable storing reference to an object of type B and also despite explicitly declaring hiding of base method. It was not clear to me why is it so. It is obvious that the type B has its own implementation, so why is it not run here? To overcome this I initially created base class declared as abstract:

using static System.Console;

namespace ConsoleApplication2
{
    public abstract class Base
    {
        public abstract void DoActivity();
    }

    public interface IActivity
    {
        void DoActivity();
    }

    public class A : Base, IActivity
    {
        public override void DoActivity()
        {
            WriteLine("A does activity");
        }
    }

    public class B : A
    {
        public override void DoActivity()
        {
            WriteLine("B does activity");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            IActivity ia = new B();
            ia.DoActivity();
            ReadKey();
        }
    }
}

It prints B does activity, but it also is overcomplicated. Then I came up with simpler solution — it turns out we have to explicitly mark class B as implementing IActivity.

using static System.Console;

namespace ConsoleApplication2
{
    public interface IActivity
    {
        void DoActivity();
    }

    public class A : IActivity
    {
        public void DoActivity()
        {
            WriteLine("A does activity");
        }
    }

    public class B : A, IActivity
    {
        public new void DoActivity()
        {
            WriteLine("B does activity");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            IActivity ia = new B();
            ia.DoActivity();
            ReadKey();
        }
    }
}

It prints B does activity, but it is not perfect. Method hiding is not a good practice. Finally I ended up with more elegant (and the simplest, I guess) solution:

using static System.Console;

namespace ConsoleApplication2
{
    public interface IActivity
    {
        void DoActivity();
    }

    public class A : IActivity
    {
        public virtual void DoActivity()
        {
            WriteLine("A does activity");
        }
    }

    public class B : A, IActivity
    {
        public override void DoActivity()
        {
            WriteLine("B does activity");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            IActivity ia = new B();
            ia.DoActivity();
            ReadKey();
        }
    }
}

In here we are using virtual and override modifiers to clearly specify an intention of specializing the method. It is better than previous one, because just by looking at class A we are already informed that further specialization is going to occur.

The real usage scenario is that we have an interface representing Entity Framework Core context. We have two distinct implementations: one uses real database and the other uses in-memory database for tests. The latter inherits from the former because what inheritance is all about is copying code. We just want to have the same methods for in-memory implementation like for regular one, but with some slight modifications e.g. in methods executing raw SQL. We also have to use an interface to refer to these objects, because this is how dependency injection containers work.

As you can see, what might seem purely theoretical, actually is used in real world line of business application. Although I have been pretty confident I understand the principles of inheritance in object oriented programming since my undergrad computer science lectures, but as I mentioned, I was surprised discovering that we have to explicitly put : IActivity on class B. The implementation has already been there. Anyway, this example encourages me to be always prepared to verify assumptions I make.

Beware LINQ Group By custom class

Actually this one is pretty obvious. But if you are focused on implementing some complex logic, it is easy to forget about this requirement. Let’s assume there is following piece of code:

var processedData = 
    from q in rawDataFromDb
    group q by new ProductCodeCentreNumPair 
        { ProductCode = q.Code, CentreNumber = q.CentreNumberIdentifier } into g
    select g
    ;

public class ProductCodeCentreNumPair
{
    public string ProductCode;
    public string CentreNumber;
}

This will compile and run, however it will not distinguish ProductCodeCentreNumPair instances. I.e. it will not do the grouping, but produce a sequence of IGroupings, each for corresponding source item. The reason is self-evident, if we try to think for a while. This custom class does not have custom equality comparison logic implemented. Default logic is based on ReferenceEquals so, as each separate object resides at different memory address, they all will be recognized as not equal to each other. Even if they contain the same values in their fields (strings behave like value types, although they are reference types). I used the following set of overridden methods to provide my custom equality comparison logic to solve the problem. It is important to note, that GetHashCode is also needed in order for the grouping to work.

public override bool Equals(object other)
{
    if (ReferenceEquals(null, other)) return false;
    if (ReferenceEquals(this, other)) return true;
    if (this.GetType() != other.GetType())
    	return false;
    return Equals((ProductCodeCentreNumPair)other);
}

public override int GetHashCode()
{
    return ProductCode.GetHashCode() + CentreNumber.GetHashCode();
}

public bool Equals(ProductCodeCentreNumPair other)
{
    var result =
    other.CentreNumber == CentreNumber &&
    other.ProductCode == ProductCode
    ;
    return result;
}

Alternatively you can use anonymous types, I mean:

var processedData = 
    from q in rawDataFromDb
    group q by new { ProductCode = q.Code, CentreNumber = q.CentreNumberIdentifier } into g
    select g
    ;

will just work. This is because instances of anonymous classes have automatically generated equality comparison logic based on values of their fields. Contrary to ReferenceEquals based implementation generated for typical named classes. They are most frequently used in the context of comparisons, so it seems reasonable.

One more alternative is to use a structure instead of a class. But structures should only be used if their fields are value types, because only then you can benefit from binary comparison of their value. And even having structs instead of classes requires implementing custom GetHashCode. By not implementing it, there is a risk that 1) the auto-generated implementation will use reflection or 2) will not be well distributed across int leading to performance problems when adding to HashSet.

When profiling a loop, look at entire execution time of that loop

Today’s piece of advice will not be backed by concrete examples of code. It will be just loose observations of some profiling session and related conclusions. Let’s assume there is a C# code doing some intense manipulations over in-memory stored data. The operations are done in a loop. If there is more data, there are more iterations of the loop. The aim of the operations is to generate kind of summary of data previously retrieved from a database. It has been proved to work well in test environments. Suddenly, in production, my team noticed this piece of code takes about 20 minutes to execute. It turned out there were about 16 thousand iterations of the loop. It was impossible to test such high load with manual testing. The testing only confirmed correctness of the logic itself.

After an investigation and some experiments it turned out that bad data structure was to blame. The loop did some lookup-heavy operations over List<T>. Which are obviously O(n), as list is implemented over array. Substitution of List<T> for Hashset<T> caused dramatic reduction of execution time to a dozen or so of seconds. This is not as surprising as it may seem because Hashset<T> is implemented over hashtable and has O(1) lookup time. These are some data structures fundamentals taught in every decent computer science lecture. The operation in a loop looked innocent and the author did not bother to try to anticipate future load of the logic. A programmer should always keep in mind the estimation of the amount of data to be processed by their implementation and think of appropriate data structure. By appropriate data structure I mean choice between fundamental structures like array (vector), linked list, hashtable, binary tree. This can be the first important conclusion, but I encouraged my colleagues to perform profiling. The results were not so obvious.

Although the measurement of timing of the entire loop with 16 thousand iterations showed clearly that hashtable based implementation performs orders of magnitude better, but when we stepped over individual loop instructions there was almost no difference in their timings. The loop consisted of several .Where calls over entire collection. These calls took something around 10 milliseconds in both List<T> and Hashset<T> implementations. If we had not bothered to measure entire loop execution, it would have been pretty easy to draw conclusion there is no difference! Even performing such step measurement during the development can me misleading, because at first sight, does 10 milliseconds look suspicious? Of course not. Not only does it look unsuspicious but also it works well. At least in test environment with test data.

As I understand it, the timings might have been similar, because we measured only some beginning iterations of the loop. If the data we are searching for are at the beginning of the array, the lookup can obviously be fast. For some edge cases even faster than doing hashing and looking up corresponding list of slots in a hashtable.

For me there are two important lessons learned here:

  • Always think of data structures and amount of data to be processed.
  • When doing performance profiling, measure everything. Concentrate on amortized timings of entire scope of code in question. Do not try to reason about individual subtotals.