Let’s suppose I am modeling a business domain with entities A, B and C. These entities have the following properties:
- An entity A can have an entity B and C
- An entity A can have only entity B
- An entity A can exist without B and C
- An entity B has not null property Active
I am implementing the domain with the following SQL. I omit foreign key constraints for brevity.
CREATE TABLE A ( [Id] INT NOT NULL, CONSTRAINT [AId] PRIMARY KEY CLUSTERED ([Id] ASC) ); CREATE TABLE B ( [Id] INT NOT NULL, [AId] BIGINT NOT NULL, [Active] BIT NOT NULL, CONSTRAINT [BId] PRIMARY KEY CLUSTERED ([Id] ASC) ); CREATE TABLE C ( [Id] INT NOT NULL, [BId] BIGINT NOT NULL, CONSTRAINT [CId] PRIMARY KEY CLUSTERED ([Id] ASC) ); INSERT INTO [A] ([Id]) VALUES (1); INSERT INTO [A] ([Id]) VALUES (2); INSERT INTO [A] ([Id]) VALUES (3); INSERT INTO [A] ([Id]) VALUES (4); INSERT INTO [A] ([Id]) VALUES (5); INSERT INTO [B] ([Id], [AId], [Active]) VALUES (10, 1, 1); INSERT INTO [B] ([Id], [AId], [Active]) VALUES (20, 2, 0); INSERT INTO [B] ([Id], [AId], [Active]) VALUES (30, 3, 1); INSERT INTO [B] ([Id], [AId], [Active]) VALUES (40, 4, 1); INSERT INTO [C] ([Id], [BId]) VALUES (100, 10); INSERT INTO [C] ([Id], [BId]) VALUES (200, 20); INSERT INTO [C] ([Id], [BId]) VALUES (300, 30);
Now let’s suppose my task is to perform validity check according to special rules. I am given an Id of an entity
- If the entity exists and
- If it is valid
The existence will be checked by simply looking if corresponding row is present in the result set, and for validity check I will write simple CASE
statement. These are my rules for my example data:
- A.1 exists and has active B.10 and has C.100 => exists, correct
- A.2 exists and has inactive B.20 and has C.200 => exists, incorrect
- A.3 exists and has active B.30 and has C.300 => exists, correct
- A.4 exists and has active B.40 and DOES NOT HAVE C => exists, incorrect
- A.5 exists and DOES NOT HAVE NEITHER B NOR C => exists, incorrect
- A.6 does not exist, incorrect
I write the following query to do the task:
SELECT a.Id AS AId, ( CASE WHEN ((b.Id IS NOT NULL) AND (c.Id IS NOT NULL) AND (b.Active = 1)) THEN 1 ELSE 0 END ) AS Correct FROM A a LEFT JOIN B b ON a.Id = b.AId LEFT JOIN C c ON b.Id = c.BId WHERE a.Id IN (1, 2, 3, 4, 5, 6) AND b.Active = 1
My rules include checking if B.Active is true, so I just put this into WHERE
. The result is:
AId Correct ---- -------- 1 1 3 1 4 0
The problem is, I have been given the exact set of Ids of A to check: 1, 2, 3, 4, 5, 6
. But my result does not include 2, 5, 6
. My application logic fails here, because it considers those A records as missing. For 6
this is fine, because it is absent in table A, but 2
and 5
must be present in the results for my validity check. The fix is extremely easy:
SELECT a.Id AS AId, ( CASE WHEN ((b.Id IS NOT NULL) AND (c.Id IS NOT NULL) AND (b.Active = 1)) THEN 1 ELSE 0 END ) AS Correct FROM A a LEFT JOIN B b ON a.Id = b.AId AND b.Active = 1 LEFT JOIN C c ON b.Id = c.BId WHERE a.Id IN (1, 2, 3, 4, 5, 6)
Now the result is:
AId Correct ---- -------- 1 1 2 0 3 1 4 0 5 0
It is very easy to understand, that WHERE
is applied to filter all the results, no matter what my intention for JOIN
was. When a record is LEFT JOIN
ed, the condition is not met, because values from B are null. But I still need to have A record in my results. Thus, what I have to do is to include my condition in JOIN
.
It is also very easy to fall into this trap of thoughtlessly writing all intended conditions in the WHERE
clause.