Re: [GENERAL] query not scaling

Поиск

Список

Период

Сортировка

От	Rob Sargent
Тема	Re: [GENERAL] query not scaling
Дата	31 октября 2017 г. 19:16:14
Msg-id	b605e78f-477e-879b-5a49-182d74503f2c@gmail.com обсуждение исходный текст
Ответ на	[GENERAL] query not scaling (Rob Sargent <robjsargent@gmail.com>)
Список	pgsql-general

Дерево обсуждения

On 10/31/2017 03:12 AM, Laurenz Albe wrote:
> Rob Sargent wrote:
>>> I think your biggest problem is the join condition
>>>      on m.basepos between s.startbase and s.endbase
>>>
>>> That forces a nested loop join, which cannot be performed efficiently.
>> Agree! 800,000 * 4,000 = 3,200,000,000.  It's just that I thought I had
>> corralled that problem which indexing but apparently not.  I was hoping
>> some kind soul might point out a missing index or similar correction. I
>> have completely reworked the process, but not sure yet if it's correct.
>> (The slow answer is correct, once it comes in.)
> You can create indexes that are useful for this query:
>
>     ON sui.segment(chrom, markerset_id)
>     ON sui.probandset(people_id)
>
> But that probably won't make a big difference, because the sequential
> scans take only a small fraction of your query time.
>
> A little less than half of the query time is spent in the nested loop
> join, and a little more than half of the time is spent doing the
> GROUP BY.
>
> Perhaps the biggest improvement you can easily make would be to
> get rid of "numeric" for the computation.  I suspect that this is
> where a lot of time is spent, since the hash aggregate is over
> less than 15000 rows.
>
> Unless you really need the precision of "numeric", try
>
> CREATE OR REPLACE FUNCTION pv(l bigint, e bigint, g bigint, o int)
>     RETURNS double precision LANGUAGE sql AS
> $$SELECT (g+e+o)::double precision / (l+e+g+o)::double precision$$;
>
> Yours,
> Laurenz Albe
In practice markersets are always aligned with one chromosome so I would 
not expect this to have an effect.  There's no constraint on this 
however, and there can be more than one markerset per chromosome.  I 
have played with indexing on segment.markerset_id.

In all the data sets used in the examples (runtimes, explains etc) there 
has been a in single people_id across the existing segment data.  Down 
the road this of course will not be the case and I can see the value of 
an index on probandset.people_id eventually. I can certainly add it now 
for a test.  I'm currently writing a probandset loader hoping to get a 
test case for the problem with gin indexing mentioned up-thread.

I think I'm most surprise at the notion that the arithmetic is the 
problem and will happily test your suggestion to force floating point 
values.  The value can get small (10^-12 on a good day!) but we don't 
need many digits of precision.

Thanks



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [GENERAL] query not scaling