[PoC] pgstattuple2: block sampling to reduce physical read
От | Satoshi Nagayasu |
---|---|
Тема | [PoC] pgstattuple2: block sampling to reduce physical read |
Дата | |
Msg-id | 51EE1FBB.2050502@uptime.jp обсуждение исходный текст |
Ответы |
Re: [PoC] pgstattuple2: block sampling to reduce physical
read
|
Список | pgsql-hackers |
Hi, I've been working on new pgstattuple function to allow block sampling [1] in order to reduce block reads while scanning a table. A PoC patch is attached. [1] Re: [RFC] pgstattuple/pgstatindex enhancement http://www.postgresql.org/message-id/CA+TgmoaxJhGZ2c4AYfbr9muUVNhGWU4co-cthqpZRwwDtamvhw@mail.gmail.com This new function, pgstattuple2(), samples only 3,000 blocks (which accounts 24MB) from the table randomly, and estimates several parameters of the entire table. The function calculates the averages of the samples, estimates the parameters (averages and SDs), and shows "standard errors (SE)" to allow estimating status of the table with statistical approach. And, of course, it reduces number of physical block reads while scanning a bigger table. The following example shows that new pgstattuple2 function runs x100 faster than the original pgstattuple function with well-estimated results. ---------------------------------------------- postgres=# select * from pgstattuple('pgbench_accounts'); -[ RECORD 1 ]------+----------- table_len | 1402642432 tuple_count | 10000000 tuple_len | 1210000000 tuple_percent | 86.27 dead_tuple_count | 182895 dead_tuple_len | 22130295 dead_tuple_percent | 1.58 free_space | 21012328 free_percent | 1.5 Time: 1615.651 ms postgres=# select * from pgstattuple2('pgbench_accounts'); NOTICE: pgstattuple2: SE tuple_count 2376.47, tuple_len 287552.58, dead_tuple_count 497.63, dead_tuple_len 60213.08, free_space 289752.38 -[ RECORD 1 ]------+----------- table_len | 1402642432 tuple_count | 9978074 tuple_len | 1207347074 tuple_percent | 86.08 dead_tuple_count | 187315 dead_tuple_len | 22665208 dead_tuple_percent | 1.62 free_space | 23400431 free_percent | 1.67 Time: 15.026 ms postgres=# ---------------------------------------------- In addition to that, see attached chart to know how pgstattuple2 estimates well during repeating (long-running) pgbench. I understand that pgbench would generate "random" transactions, and those update operations might not have any skew over the table, so estimating table status seems to be easy in this test. However, I'm still curious to know whether it would work in "real-world" worklaod. Is it worth having this? Any comment or suggestion? Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Вложения
В списке pgsql-hackers по дате отправления: