Re: [GENERAL] query not scaling
От | Rob Sargent |
---|---|
Тема | Re: [GENERAL] query not scaling |
Дата | |
Msg-id | b605e78f-477e-879b-5a49-182d74503f2c@gmail.com обсуждение исходный текст |
Ответ на | [GENERAL] query not scaling (Rob Sargent <robjsargent@gmail.com>) |
Список | pgsql-general |
On 10/31/2017 03:12 AM, Laurenz Albe wrote: > Rob Sargent wrote: >>> I think your biggest problem is the join condition >>> on m.basepos between s.startbase and s.endbase >>> >>> That forces a nested loop join, which cannot be performed efficiently. >> Agree! 800,000 * 4,000 = 3,200,000,000. It's just that I thought I had >> corralled that problem which indexing but apparently not. I was hoping >> some kind soul might point out a missing index or similar correction. I >> have completely reworked the process, but not sure yet if it's correct. >> (The slow answer is correct, once it comes in.) > You can create indexes that are useful for this query: > > ON sui.segment(chrom, markerset_id) > ON sui.probandset(people_id) > > But that probably won't make a big difference, because the sequential > scans take only a small fraction of your query time. > > A little less than half of the query time is spent in the nested loop > join, and a little more than half of the time is spent doing the > GROUP BY. > > Perhaps the biggest improvement you can easily make would be to > get rid of "numeric" for the computation. I suspect that this is > where a lot of time is spent, since the hash aggregate is over > less than 15000 rows. > > Unless you really need the precision of "numeric", try > > CREATE OR REPLACE FUNCTION pv(l bigint, e bigint, g bigint, o int) > RETURNS double precision LANGUAGE sql AS > $$SELECT (g+e+o)::double precision / (l+e+g+o)::double precision$$; > > Yours, > Laurenz Albe In practice markersets are always aligned with one chromosome so I would not expect this to have an effect. There's no constraint on this however, and there can be more than one markerset per chromosome. I have played with indexing on segment.markerset_id. In all the data sets used in the examples (runtimes, explains etc) there has been a in single people_id across the existing segment data. Down the road this of course will not be the case and I can see the value of an index on probandset.people_id eventually. I can certainly add it now for a test. I'm currently writing a probandset loader hoping to get a test case for the problem with gin indexing mentioned up-thread. I think I'm most surprise at the notion that the arithmetic is the problem and will happily test your suggestion to force floating point values. The value can get small (10^-12 on a good day!) but we don't need many digits of precision. Thanks -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
В списке pgsql-general по дате отправления: