Where to put condition in SQL?

Let’s suppose I am modeling a business domain with entities A, B and C. These entities have the following properties:

  • An entity A can have an entity B and C
  • An entity A can have only entity B
  • An entity A can exist without B and C
  • An entity B has not null property Active

I am implementing the domain with the following SQL. I omit foreign key constraints for brevity.

CREATE TABLE A
(
	[Id] INT NOT NULL,
	CONSTRAINT [AId] PRIMARY KEY CLUSTERED ([Id] ASC)
);

CREATE TABLE B
(
	[Id] INT NOT NULL,
	[AId] BIGINT NOT NULL,
	[Active] BIT NOT NULL,
	CONSTRAINT [BId] PRIMARY KEY CLUSTERED ([Id] ASC)
);

CREATE TABLE C
(
	[Id] INT NOT NULL,
	[BId] BIGINT NOT NULL,
	CONSTRAINT [CId] PRIMARY KEY CLUSTERED ([Id] ASC)
);

INSERT INTO [A] ([Id]) VALUES (1);
INSERT INTO [A] ([Id]) VALUES (2);
INSERT INTO [A] ([Id]) VALUES (3);
INSERT INTO [A] ([Id]) VALUES (4);
INSERT INTO [A] ([Id]) VALUES (5);

INSERT INTO [B] ([Id], [AId], [Active]) VALUES (10, 1, 1);
INSERT INTO [B] ([Id], [AId], [Active]) VALUES (20, 2, 0);
INSERT INTO [B] ([Id], [AId], [Active]) VALUES (30, 3, 1);
INSERT INTO [B] ([Id], [AId], [Active]) VALUES (40, 4, 1);

INSERT INTO [C] ([Id], [BId]) VALUES (100, 10);
INSERT INTO [C] ([Id], [BId]) VALUES (200, 20);
INSERT INTO [C] ([Id], [BId]) VALUES (300, 30);

Now let’s suppose my task is to perform validity check according to special rules. I am given an Id of an entity A as an input and I have to check:

  1. If the entity exists and
  2. If it is valid

The existence will be checked by simply looking if corresponding row is present in the result set, and for validity check I will write simple CASE statement. These are my rules for my example data:

  • A.1 exists and has active B.10 and has C.100 => exists, correct
  • A.2 exists and has inactive B.20 and has C.200 => exists, incorrect
  • A.3 exists and has active B.30 and has C.300 => exists, correct
  • A.4 exists and has active B.40 and DOES NOT HAVE C => exists, incorrect
  • A.5 exists and DOES NOT HAVE NEITHER B NOR C => exists, incorrect
  • A.6 does not exist, incorrect

I write the following query to do the task:

SELECT 
  a.Id AS AId,
  ( CASE WHEN ((b.Id IS NOT NULL) AND (c.Id IS NOT NULL) AND (b.Active = 1)) THEN 1 ELSE 0 END ) AS Correct
FROM A a
LEFT JOIN B b ON a.Id = b.AId 
LEFT JOIN C c ON b.Id = c.BId
WHERE
    a.Id IN (1, 2, 3, 4, 5, 6) 
    AND b.Active = 1

My rules include checking if B.Active is true, so I just put this into WHERE. The result is:

AId  Correct 
---- --------
1    1       
3    1       
4    0       

The problem is, I have been given the exact set of Ids of A to check: 1, 2, 3, 4, 5, 6. But my result does not include 2, 5, 6. My application logic fails here, because it considers those A records as missing. For 6 this is fine, because it is absent in table A, but 2 and 5 must be present in the results for my validity check. The fix is extremely easy:

SELECT 
  a.Id AS AId,
  ( CASE WHEN ((b.Id IS NOT NULL) AND (c.Id IS NOT NULL) AND (b.Active = 1)) THEN 1 ELSE 0 END ) AS Correct
FROM A a
LEFT JOIN B b ON a.Id = b.AId AND b.Active = 1
LEFT JOIN C c ON b.Id = c.BId
WHERE
    a.Id IN (1, 2, 3, 4, 5, 6) 

Now the result is:

AId  Correct 
---- --------
1    1       
2    0       
3    1       
4    0       
5    0       

It is very easy to understand, that WHERE is applied to filter all the results, no matter what my intention for JOIN was. When a record is LEFT JOINed, the condition is not met, because values from B are null. But I still need to have A record in my results. Thus, what I have to do is to include my condition in JOIN.

It is also very easy to fall into this trap of thoughtlessly writing all intended conditions in the WHERE clause.

Leave a Reply

Your email address will not be published. Required fields are marked *

Protection against spam * Time limit is exhausted. Please reload CAPTCHA.