Re: count(*) performance improvement ideas
От | Tom Lane |
---|---|
Тема | Re: count(*) performance improvement ideas |
Дата | |
Msg-id | 12841.1205813890@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: count(*) performance improvement ideas ("Stephen Denne" <Stephen.Denne@datamail.co.nz>) |
Ответы |
Re: count(*) performance improvement ideas
Re: count(*) performance improvement ideas |
Список | pgsql-hackers |
"Stephen Denne" <Stephen.Denne@datamail.co.nz> writes: > I initially thought that what you meant by "having transactions enter "delta" entries" was that I have a trigger that wouldcreate a row each time it was called, writing how many records where inserted or deleted. I didn't understand how thiswould be much of an improvement, as each of my rows would contain either +1 or -1. Well, ideally you'd aggregate all the deltas caused by a particular transaction into one entry in the counting table. Whether or not that happens, though, the point of the concept is that some background task aggregates all the deltas from long-gone transactions into just one base row, and then deletes the old delta entries. To get a valid value of COUNT(*), what onlookers must do is SUM() the base row and delta records from all transactions that they can "see" under MVCC rules. The amount of work involved is proportional to the number of recent updates, not the total size of the underlying table. > However I'm not after a fast count(*) from table, but more like a fast > select grouping_id, count(*) from my_table group by grouping_id You could apply the same technique across each group id, though this certainly is getting beyond what any built-in feature might offer. > Can you clarify the lack of MVCC problems? The point there is that the "right answer" actually depends on the observer, since each observer might have a different snapshot and therefore be able to "see" a different set of committed rows in the underlying table. The multiple-delta table handles this automatically, because you can "see" a delta entry if and only if you could "see" the underlying-table changes it represents. > Does this idea apply with the same efficiency in pre 8.3, non-HOT implementations? I didn't claim it was amazingly efficient in any implementation ;-). HOT in particular is nearly useless since most rows in the count table will never be updated, only inserted and eventually deleted. You might get some mileage on the base row, but that'd be about it. The count table will need frequent vacuums as well as frequent aggregation scans. It should beat scanning a large underlying table, but it's hardly gonna be free. regards, tom lane
В списке pgsql-hackers по дате отправления: