Re: Performance improvement hints + measurement
От | devik@cdi.cz |
---|---|
Тема | Re: Performance improvement hints + measurement |
Дата | |
Msg-id | 39BF8314.8A790C34@cdi.cz обсуждение исходный текст |
Ответ на | Performance improvement hints (devik@cdi.cz) |
Ответы |
Re: Performance improvement hints + measurement
|
Список | pgsql-hackers |
> > But indexscan always lookups actual record in heap even if > > all needed attributes are contained in the index. > > Oracle and even MSSQL reads attributes directly from index > > without looking for actual tuple at heap. > > Doesn't work in Postgres' storage management scheme --- the heap > tuple must be consulted to see if it's still valid. yes, I just spent another day by looking into sources and it seems that we need xmin, xmax stuff. What do you think about this approach: 1) add all validity & tx fields from heap tuple into index tuple too 2) when generating plan for index scan try to determine whether we can satisfy target list using only data from index tuples,if yes then compute cost without accounting random heap page reads - it will lead into much lower cost 3) whenever you update/delete heap tuple's tx fields, update then also in indices (you don't have to delete them from index) It will cost more storage space and slightly more work when updating indices but should give excelent performance when index is used. Measurements: I've table with about 2 mil. rows declared as bigrel(namex varchar(50),cnt integer,sale datetime). I regulary need to run this query against it: select nazev,sum(cnt) from bigrel group by name; It took (in seconds): Server\Index YES NO pg7.01 linux 58 264 MSSQL7 winnt 17 22 I compared on the same machine (PII/375,128RAM) using WINNT under VMWARE and native linux 2.2. pq was vaccum analyzed. Why is pgsql so slow ? The mssql plan without index uses hash aggregating but pg sorts while relation. With index, in pg there is a big overhead of heap tuple reading - mssql uses data directly from scanned index. Also I noticed another problem, when I added where nazev<'0' it took 110ms on pg when I used set enable_seqscan=on;. Without is, planner still tried to use seqscan+sort which took 27s in this case. I'm not sure how complex the proposed changes are. Another way would be to implement another aggregator like HashAgg which will use hashing. But it could be even more complicated as one has to use temp relation to store all hash buckets .. Still I think that direct index reads should give us huge speed improvement for all indexed queries. I'm prepared to implement it but I'd like to know your hints/complaints. Regards, devik
В списке pgsql-hackers по дате отправления: