Archive for pjsen

Things I’ve learned about SQL Server the hard way

In this post I am presenting a couple of things I’ve learned from the analysis of a problem, that manifested itself in an occasional HTTP 500 errors in production instance of an ASP.NET application. This time I don’t aim at exhaustively explaining every single point, because each of them could be a subject of a dedicated blog post.

The story begins with SQL error: SQLEXCEPTION: Transaction was deadlocked on lock resources with another process and has been chosen as the deadlock victim.

  1. In any reasonably modern version of SQL Server Management Studio there is an XEvent session system_health under ManagementExtended Events. It allows for viewing some important server logs, among which xml_deadlock_report in particularly interesting. It is very important to have an access to the production instance of database server in order to be able to watch the logs.
  2. System health XEvent session
  3. In this particular case, these xml_deadlock_reports contained one suspicious attribute: isolationlevel = Serializable (4) and the SQL code was a SELECT. I would not expect my SELECTs running with Serializable isolation level.
  4. Details of a deadlock
  5. The isolation level is an attribute of a connection between a client and the database server. A connection is called session in SQL Server terminology. An explicit BEGIN TRAN is not necessary for the isolation level to be applied. Every SQL statement runs in its own statement-wide transaction. However, for such narrow-scoped transactions, in practice it may not make any difference whether you raise the isolation level or not. The difference can be observed when a transaction is explicit and spans multiple SQL statements.
  6. The cause of setting the serialization level to Serializable was the behaviour of the TransactionScope [1]. If you use it, it raises the isolation level. It is just a peculiarity of this very API of the .NET framework. It is good to know this.
  7. SQL Server, at last in 2012 and some (I am not sure exactly which ones) later versions, does not reset the isolation level when ADO.NET disposes of a connection. A connection returns back to the connection pool [2] and is reused by subsequent SqlConnection objects unless they have different connection string.
  8. The connection pool size, if the connection pooling is active, poses the limit of how many concurrent connections to a database server a .NET application can make. If there are no free connections in the pool, an exception is thrown [3].
  9. Eliminating the usage of TransactionScope did not solve the issue. Even if you run SELECTs under the default Read Committed isolation level, these still issues Shared locks which may deadlock with Exclusive locks of UPDATEs. In any reasonably high production data traffic, where SELECTs span multiple tables, which are also very frequently updated, it is highly probable, that a deadlock will occur.
  10. The difference between running SELECT under Serializable isolation level and Read Committed level is that in the former, the locks are kept from the moment of executing the SELECT until the transaction ends. You can observe it by manually beginning a Serializable transaction, running any SELECT and observing dm_tran_locks DMV and only then committing (or rolling back, whatever) the transaction. With Read Committed level locks are not kept until an explicit transaction ends, they are released immediately after execution of a SELECT finishes. These are the same kind of locks, Shared locks. This implies one cannot observe the difference between executing a SELECT under Serializable and Read Committed, when there is no explicit transaction and thus, there is only a statement-wide transaction which releases locks immediately after the results are returned.
  11. Setting isolation level of Read Uncommitted is practically equivalent to running a SELECT WITH(NOLOCK) hint, even if you don’t explicitly open a transaction.
  12. In Entity Framework a SqlConnection is opened for every materialization of the query, the results are returned, and the connection is immediately closed and returned back to the pool [5]. The connection lifetime is by no means related to the scope of DbContext object. I can see a kind of similarity between how Entity Framework uses SqlConnections and how ASP.NET makes use of threads when executing async methods. A thread is released on every await and can be used for doing something more valuable than waiting. Similarly, a SqlConnection is released right after materialization and can be used for executing different command, in different request (in case of ASP.NET) even before DbContext is disposed of.
  13. It is not that obvious how to reset the isolation level of the connection. You see, every time your C# code using Entity Framework results in sending a SQL to the SQL Server, it can take different connection from the pool (if anyone knows if there is any ordering applied when retrieving connections from the pool, please feel free to comment). It may or may not be the same connection you used previously. Consequently, it is not easy to ‘catch’ the underlying connection using Entity Framework. You can call BeginTransaction every time you use DbContext, and then you are guaranteed to own the connection for all your SQL commands. But that way you are forcing opening transaction when you don’t really need one. What I recommend is to handle StateChange event of DbConnection object as described in [4]. You can do it either on opening the connection or on closing it.
  14. In SQL Server you can monitor open sessions with the following query:
    select
        DB_NAME(s.database_id) as DbName,
        case s.transaction_isolation_level
                WHEN 0 THEN 'Unspecified'
                WHEN 1 THEN 'ReadUncommitted'
                WHEN 2 THEN 'ReadCommitted'
                WHEN 3 THEN 'RepeatableRead'
                WHEN 4 THEN 'Serializable'
                WHEN 5 THEN 'Snapshot'
        end as Level,
        s.session_id as Session
    from sys.dm_exec_sessions s
    where DB_NAME(s.database_id) <> 'master'
    order by s.last_request_end_time desc;  
    

References:

[1]    https://stackoverflow.com/questions/11292763/why-is-system-transactions-transactionscope-default-isolationlevel-serializable
[2]    https://stackoverflow.com/questions/9851415/sql-server-isolation-level-leaks-across-pooled-connections
[3]    https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-connection-pooling
[4]    https://stackoverflow.com/questions/28442558/entity-framework-and-transactionscope-doesnt-revert-the-isolation-level-after-d
[5]    https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/bb896325(v=vs.100)#connections-and-the-entity-framework

The worst Entity Framework pitfall

I work with a quite big enterprise system in my job. Not surprisingly, it uses Entity Framework (Core, but it does not matter) and SQL Server. The system consists of multiple reusable components also in the data access layer. I had to modify DbContext and write some flexible and reusable method accepting a predicate as an argument and apply the predicate on a DbContext. Let’s assume we are using the table A from the previous post. I happily coded the signature of the method to use Func. Let’s simulate this in the LINQPad and run our Func against a DbContext.

It did not work. Or… did it? The picture above shows only generated SQL, but I promise the results show correctly the one record. The problem is, the predicate has been applied in memory after having pulled all the records from table A into memory as well. I am not going to explain what it means for any reasonably sized system. The correct way of doing this is to use Expression<Func<A, bool>>.

The explanation is in fact really obvious for anyone deeply understanding how ORMs work. The data structure which allows for inspecting a predicate on the fly and building final SQL query is Expression. There is already an infrastructure for so-called expression visitors. Please also note, that you can always get your Func from Expression<Func> by calling Compile method on it.

Where to put condition in SQL?

Let’s suppose I am modeling a business domain with entities A, B and C. These entities have the following properties:

  • An entity A can have an entity B and C
  • An entity A can have only entity B
  • An entity A can exist without B and C
  • An entity B has not null property Active

I am implementing the domain with the following SQL. I omit foreign key constraints for brevity.

CREATE TABLE A
(
	[Id] INT NOT NULL,
	CONSTRAINT [AId] PRIMARY KEY CLUSTERED ([Id] ASC)
);

CREATE TABLE B
(
	[Id] INT NOT NULL,
	[AId] BIGINT NOT NULL,
	[Active] BIT NOT NULL,
	CONSTRAINT [BId] PRIMARY KEY CLUSTERED ([Id] ASC)
);

CREATE TABLE C
(
	[Id] INT NOT NULL,
	[BId] BIGINT NOT NULL,
	CONSTRAINT [CId] PRIMARY KEY CLUSTERED ([Id] ASC)
);

INSERT INTO [A] ([Id]) VALUES (1);
INSERT INTO [A] ([Id]) VALUES (2);
INSERT INTO [A] ([Id]) VALUES (3);
INSERT INTO [A] ([Id]) VALUES (4);
INSERT INTO [A] ([Id]) VALUES (5);

INSERT INTO [B] ([Id], [AId], [Active]) VALUES (10, 1, 1);
INSERT INTO [B] ([Id], [AId], [Active]) VALUES (20, 2, 0);
INSERT INTO [B] ([Id], [AId], [Active]) VALUES (30, 3, 1);
INSERT INTO [B] ([Id], [AId], [Active]) VALUES (40, 4, 1);

INSERT INTO [C] ([Id], [BId]) VALUES (100, 10);
INSERT INTO [C] ([Id], [BId]) VALUES (200, 20);
INSERT INTO [C] ([Id], [BId]) VALUES (300, 30);

Now let’s suppose my task is to perform validity check according to special rules. I am given an Id of an entity A as an input and I have to check:

  1. If the entity exists and
  2. If it is valid

The existence will be checked by simply looking if corresponding row is present in the result set, and for validity check I will write simple CASE statement. These are my rules for my example data:

  • A.1 exists and has active B.10 and has C.100 => exists, correct
  • A.2 exists and has inactive B.20 and has C.200 => exists, incorrect
  • A.3 exists and has active B.30 and has C.300 => exists, correct
  • A.4 exists and has active B.40 and DOES NOT HAVE C => exists, incorrect
  • A.5 exists and DOES NOT HAVE NEITHER B NOR C => exists, incorrect
  • A.6 does not exist, incorrect

I write the following query to do the task:

SELECT 
  a.Id AS AId,
  ( CASE WHEN ((b.Id IS NOT NULL) AND (c.Id IS NOT NULL) AND (b.Active = 1)) THEN 1 ELSE 0 END ) AS Correct
FROM A a
LEFT JOIN B b ON a.Id = b.AId 
LEFT JOIN C c ON b.Id = c.BId
WHERE
    a.Id IN (1, 2, 3, 4, 5, 6) 
    AND b.Active = 1

My rules include checking if B.Active is true, so I just put this into WHERE. The result is:

AId  Correct 
---- --------
1    1       
3    1       
4    0       

The problem is, I have been given the exact set of Ids of A to check: 1, 2, 3, 4, 5, 6. But my result does not include 2, 5, 6. My application logic fails here, because it considers those A records as missing. For 6 this is fine, because it is absent in table A, but 2 and 5 must be present in the results for my validity check. The fix is extremely easy:

SELECT 
  a.Id AS AId,
  ( CASE WHEN ((b.Id IS NOT NULL) AND (c.Id IS NOT NULL) AND (b.Active = 1)) THEN 1 ELSE 0 END ) AS Correct
FROM A a
LEFT JOIN B b ON a.Id = b.AId AND b.Active = 1
LEFT JOIN C c ON b.Id = c.BId
WHERE
    a.Id IN (1, 2, 3, 4, 5, 6) 

Now the result is:

AId  Correct 
---- --------
1    1       
2    0       
3    1       
4    0       
5    0       

It is very easy to understand, that WHERE is applied to filter all the results, no matter what my intention for JOIN was. When a record is LEFT JOINed, the condition is not met, because values from B are null. But I still need to have A record in my results. Thus, what I have to do is to include my condition in JOIN.

It is also very easy to fall into this trap of thoughtlessly writing all intended conditions in the WHERE clause.

A few random ASP.NET Core and .NET Core tips

I’ve been working with .NET core recently and I’d like to post some random observations on this subject for the future reference.

  1. It is possible to create Nuget package upon build. This option is actually available also from the VS2017 Project properties GUI. Add this code to csproj.

     
      netstandard2.0
      true
      $(UserProfile)
      0.0.1
    
    
  2. It is possible to add local folder as Nuget feed. The folder can also be current user’s profile. This one is actually not Core specific. Nuget.config should look like this:

    < ?xml version="1.0" encoding="utf-8"?>
    
      
        
        
      
      
          
          
        
      
        
          
          
        
      
      
        
      
      
        
      
    
    
  3. You can compile for multiple targets in .NET Core compatible csproj. Please note the trailing s in the tag name. You can also conditionally include items in csproj. Use the following snippets:

      netstandard2.0;net45;net40
    

    and:

        
      
    

    There is a reference documentation for the available targets: here.

  4. The listening port in Kestrel can be configured in multiple ways. It can be read from environment variable or can be passed as command line argument. An asterisk is required to bind to physical interfaces. It is needed e.g. when trying to display the application from mobile phone when being served from development machine. The following are equivalent:

    set ASPNETCORE_URLS=http://*:11399
    --urls http://*:11399
    
  5. The preferred way to pass hosting parameters to Kestrel is launchSettings.json file located in Properties of the solution root. You can select a profile defined there when running:

    dotnet run --launch-profile "Dev"
    

    dotnet run is used to build and run from the directory where csproj resides. It is not a good idea to run the app’s dll directly. Settings file can be missing from bin folder and/or launch profile may not be present there.

How to run Tmux in GIT Bash on Windows

Tmux running under Git Bash default terminal with two shell processes

I know everyone uses Cmder, but it didn’t work for me. It hung a few times, it has way too many options, it has issues sending signal to kill a process. I gave up on using it. I work with carefully configured default Windows console and believe it or not, it serves the purpose. I also know you can use Windows Subsystem For Linux under Windows 10, which is truly amazing, but I am just talking about the cases where you need standard Git for Windows installation.

When I worked with Unix I liked GNU Screen, which is terminal multiplexer. It gives you a bunch of keyboard shortcuts to create separate shell processes under the same terminal window. The problem is, it is not available under GIT Bash. But it turns out, its alternative — Tmux is.

I did a little research and have found that GIT Bash uses MINGW compilation of GNU tools. It uses only selected ones. You can install the whole distribution of the tools from https://www.msys2.org/ and run a command to install Tmux. And then copy some files to installation folder of Git. This is what you do:

  1. Install before-mentioned msys2 package and run bash shell
  2. Install tmux using the following command: pacman -S tmux
  3. Go to msys2 directory, in my case it is C:\msys64\usr\bin
  4. Copy tmux.exe and msys-event-2-1-4.dll to your Git for Windows directory, mine is C:\Program Files\Git\usr\bin. Please note, that in future, you can see this file with the version number higher than 2-1-4

And you are ready to go. Please note, that I do this on 64-bit installations of Git and MSYS . Now when you run Git Bash enter tmux. My most frequently used commands are:

  • CTRL+B, (release and then) C — create new shell within existing terminal window
  • CTRL+B, N — switch between shells
  • CTRL+B, a digit — switch to the chosen shell by the corresponding number
  • CTRL+B, " — split current window horizontally into panels (panels are inside windows)
  • CTRL+B, o — switch between panels in current window
  • CTRL+B, x — close panel

This is everything you need to know to start using it. Simple. There are many other options which you can explore yourself, for example here http://hyperpolyglot.org/multiplexers.

Update 1: Users in comments are reporting the method not always works. If you have any experiences with this method please feel free to comment, so that we can figure out what are the circumstances under which it works

Update2: I managed to run this on Windows 7, Windows 2012 R2 and Windows 10. My Git installation is set up to use MinTTy console and tmux works only when run from this console, not from default Windows command line console. Still haven’t figured out what are precise requirements for this trick

UPDATE with JOIN subtle bug

I have been diagnosing very subtle bug in SQL code which led to unexpected results. It happens under rare circumstances, when you do update with join and you want to increase some number by one. You just write value = value + 1. The thing is, you are willing to increase the value by the number of joined rows. The SQL code kind of expresses your intent. However, what actually happens is, the existing value is read only once. It is updated 3 times, indeed. But with the same value, incremented only by one.

declare @UpdateTarget table(Id bigint, Capacity int); 
insert into @UpdateTarget (Id, Capacity) values (10, 0); -- Lets assume this is our counter of available resources
 
declare @HowManyTimesUpdate table(TargetId bigint);
insert into @HowManyTimesUpdate (TargetId) values (10); -- Lets assume this is a reservation of a resource
insert into @HowManyTimesUpdate (TargetId) values (10); -- Lets assume this is a reservation of a resource
insert into @HowManyTimesUpdate (TargetId) values (10); -- Lets assume this is a reservation of a resource
 
-- Now, we are releasing resources. Increase the counter for those 3 reservations
update ut
set ut.Capacity = ut.Capacity + 1
from @UpdateTarget ut join @HowManyTimesUpdate hmt on ut.Id = hmt.TargetId;
 
-- We expect result 3, because 3 resources should be released, but there is only 1
select Capacity from @UpdateTarget;
 
-- Reset and start again
update @UpdateTarget set Capacity = 0;
 
-- Do it right
update ut
set ut.Capacity = ut.Capacity + (select count(1) from @HowManyTimesUpdate where TargetId = [Id])
from @UpdateTarget ut join (select distinct TargetId from @HowManyTimesUpdate) hmt on hmt.[TargetId] = [Id];
 
-- We expect 3, there is 3
select Capacity from @UpdateTarget