Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Дата	21 декабря 2008 г. 23:26:04
Msg-id	603c8f070812211925hdb4db79o3aeed87c601199a3@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets ("Bryce Cutt" <pandasuit@gmail.com>)
Ответы	Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets
Список	pgsql-hackers

Дерево обсуждения

[Some performance testing.]

I ran this query 10x with this patch applied, and then 10x again with
enable_hashjoin_usestatmvcs set to false to disable the optimization:

select sum(1) from (select * from part, lineitem where p_partkey = l_partkey) x;

With the optimization enabled, the query took between 26.6 and 38.3
seconds with an average of 31.6.  With the optimization disabled, the
query took between 48.3 and 69.0 seconds with an average of 60.0
seconds.

It appears that the 100 entries in pg_statistic cover about 32% of l_partkey:

tpch=# WITH x AS (       SELECT stanumbers1, array_length(stanumbers1, 1) AS len           FROM pg_statistic WHERE
starelid='lineitem'::regclass              AND staattnum = (SELECT attnum FROM pg_attribute                       WHERE
attrelid='lineitem'::regclassAND
 
attname='l_partkey')
)       SELECT sum(x.stanumbers1[y.g]) FROM x,               (select generate_series(1, x.len) g from x) y; sum
--------0.3276
(1 row)

(there's probably a better way to write that query...)

stadistinct for l_partkey is 23,050; the actual number of distinct
values is 199,919.  IOW, 0.0005% of the distinct values account for
32.76% of the table.  That's a lot of skew, but not unrealistic - I've
seen tables where more than half of the rows were covered by a single
value.

...Robert

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Proposed Patch to Improve Performance of Multi-Batch Hash Join for Skewed Data Sets