Re: UPDATE with subquery too slow

Поиск
Список
Период
Сортировка
От Eric Jain
Тема Re: UPDATE with subquery too slow
Дата
Msg-id 013401c3f653$14f95970$c300000a@caliente
обсуждение исходный текст
Ответ на UPDATE with subquery too slow  ("Eric Jain" <Eric.Jain@isb-sib.ch>)
Список pgsql-performance
> I can't get the following statement to complete with reasonable time.

Upgraded to 7.4.1, and realized that NOT IN is far more efficient than
IN, EXISTS or NOT EXISTS, at least for the amount and distribution of
data that I have. Here are some numbers from before and after performing
the problematic clean up operation:

                        | Before    | After
------------------------+-----------+-----------
COUNT(*)                | 6'104'075 | 6'104'075
COUNT(session)          | 5'945'272 | 3'640'659
COUNT(DISTINCT session) | 2'865'570 |   560'957

The following query completes within less than three hours on a machine
with a high load, versa many many hours for any of the alternatives:

UPDATE requests
SET session = NULL
WHERE session NOT IN
(
  SELECT r.session
  FROM requests r
  WHERE r.session IS NOT NULL
  GROUP BY r.session
  HAVING COUNT(*) > 1
);

Note that in order to correctly reverse an IN subquery, IS NOT NULL
needs to be added.

Interestingly, the query planner believes that using EXISTS would be
more efficient than NOT IN, and IN only slightly less efficient; I
assume the query planner is not able to accurately estimate the number
of rows returned by the subquery.

EXISTS            351'511
NOT IN            376'577
IN                386'780
LEFT JOIN      18'263'826
NOT EXISTS  7'241'815'330


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Optimizer difference using function index between 7.3 and 7.4
Следующее
От: Josh Berkus
Дата:
Сообщение: Forcing filter/join order?