Re: why does count take so long?
От | Christopher Browne |
---|---|
Тема | Re: why does count take so long? |
Дата | |
Msg-id | m3ekypvn5i.fsf@chvatal.cbbrowne.com обсуждение исходный текст |
Ответ на | why does count take so long? (Joseph Shraibman <jks@selectacast.net>) |
Список | pgsql-general |
In the last exciting episode, jllachan@nsd.ca (Jean-Luc Lachance) wrote: > How about keeping counts of inserts, deletes and updates per table per > transaction as part of the live statistics? Aye, there's the rub. That's exactly what _won't_ work, and that's exactly the case that is somewhat pathological under MVCC. With MVCC, data "appears as if by magic" when its transaction COMMITs, thereby revealing the rows to the world. Unfortunately, there's no simple way of making updates to counts "simply appear" when they take effect, not without turning the updates into a concurrency bottleneck. Here's a bit of a wild thought... Assume a table with schema as follows: create table pg_count ( xid integer, --- Actually, it's not an integer, right, Andrew? :-( reltype oid, count integer ); Every transaction, "xid," affects some number of tuples. So that for a transaction, #2134151 that adds 5 rows to table with oid 12345 and deletes 4 rows from table with 45678, part of the transaction would include inserting these rows: insert into pg_count (xid, reltype, count) values (2134151, 12345, 5); insert into pg_count (xid, reltype, count) values (2134151, 45678, -4); In order to get the count for table 12345, you could then go to pg_count and request: select sum(count) from pg_count where reltype = 12345; The size of this table would increase every time a transaction gets committed, so presumably part of VACUUM TABLE would be a collection/summarization, thus... -- Collect existing stats into 1 row insert into pg_count(xid, reltype, count) values (currxid, currtable, select sum(count) from pg_count where reltype = currtable); -- Delete the old stats delete from pg_count where reltype = currtable and xid <> currxid; This will cope with concurrency reasonably well (modulo having to make sure that the "collect/delete" transaction is a serialized one). Unfortunately, if a table is updated frequently, the summary select sum(count) from pg_count where reltype = 12345; will have to collect together quite a large number of entries, which makes this "less cheap." That suggests an optimization; any time the COUNT is selected, the old stats can and should be collected into 1 row and the old data deleted. -- wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca'). http://www3.sympatico.ca/cbbrowne/nonrdbms.html Sturgeon's Law: 90% of *EVERYTHING* is crud.
В списке pgsql-general по дате отправления: