SUMMARY: NOT IN issue
От | Marc SCHAEFER |
---|---|
Тема | SUMMARY: NOT IN issue |
Дата | |
Msg-id | Pine.LNX.3.96.1010705094527.1060A-100000@defian.alphanet.ch обсуждение исходный текст |
Список | pgsql-general |
The issue was that NOT IN doesn't work as intuitively expected (or at least to me) if there are NULLs in the set, because IN returns NULL instead of FALSE in that case. Simplified setup to see the issue: CREATE TABLE personne (id SERIAL, PRIMARY KEY(id), UNIQUE(id)); CREATE TABLE utilisateur_news (id INT4 REFERENCES personne); INSERT INTO personne VALUES(1); INSERT INTO personne VALUES(2); INSERT INTO personne VALUES(3); INSERT INTO utilisateur_news VALUES(2); INSERT INTO utilisateur_news VALUES(NULL); Goal: determine what are the id in relation personne that are not referenced by any utilisateur_news. The wrong NOT IN variant (will fail as soon are there is any NULL in the id field of utilisateur_news): SELECT id FROM personne p WHERE p.id NOT IN (SELECT DISTINCT un.id FROM utilisateur_news un); The correct EXISTS variant (which is anyway recommended because it can be implemented as an index search, see EXPLAIN on big tables). SELECT p.id FROM personne p WHERE NOT EXISTS (SELECT un.id FROM utilisateur_news un WHERE (un.id = p.id)); The correct NOT IN version: SELECT id FROM personne p WHERE p.id NOT IN (SELECT DISTINCT un.id FROM utilisateur_news un WHERE (un.id IS NOT NULL)); Note that in the general case, a NOT NULL in the REFERENCES id of the utilisateur_news will avoid this degenerated case. The final db I am using has this constraints. However, during the migration from the previous db (using utilisateur_news only) to the new, I lifted that restriction, to allow utilisateur_news without personne linked to fill it later. When all old will have filled, I will add a NOT NULL constraint, and create both in a transaction. Thank you for help. References: http://fts.postgresql.org/db/mw/msg.html?mid=122788
В списке pgsql-general по дате отправления: