Re: [SQL] "SELECT IN" Still Broken in 7.4b
От | Tom Lane |
---|---|
Тема | Re: [SQL] "SELECT IN" Still Broken in 7.4b |
Дата | |
Msg-id | 25062.1061498540@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: [SQL] "SELECT IN" Still Broken in 7.4b
Re: [SQL] "SELECT IN" Still Broken in 7.4b Re: [SQL] "SELECT IN" Still Broken in 7.4b Re: [SQL] "SELECT IN" Still Broken in 7.4b Re: [SQL] "SELECT IN" Still Broken in 7.4b |
Список | pgsql-hackers |
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > Within the scope of the new hashed IN stuff I believe so in at least some > cases. I have a few million row table of integers where searching for > values IN (~10000 values) takes longer than creating the temp table, > copying into it and doing the in subquery. I did some profiling and soon realized that the main problem is the executor's trick for not returning the same row multiple times in a multiple-indexscan plan node. The point is that givenWHERE a = 1 OR b = 1 you could create a plan that first indexscans on a, then indexscans on b --- but you mustn't return any tuples in the second scan that you already returned in the first. IndexNext solves this by evaluating the prior-scan index conditions to see if they are true. Which is okay if there aren't too many of them. But when you have an N-element IN list this means you are going to do O(N^2) index expression evaluations. In the 10000-element IN-list test case, ExecQual gets called almost 50 million times :-( I'm toying with the notion of ripping out that logic and instead building an in-memory hashtable of already-returned TIDs. This could theoretically run out of memory if the multiple indexscan returns enough tuples, but I think in practice that wouldn't happen because the planner wouldn't choose an indexscan when very large numbers of tuples are being selected. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: