Re: REPACK and naming
От | David Rowley |
---|---|
Тема | Re: REPACK and naming |
Дата | |
Msg-id | CAApHDvoeW9ecNxbaaXX0QDUS3i3u+Q684C+qU80paa8qqPHzxA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: REPACK and naming (Álvaro Herrera <alvherre@alvh.no-ip.org>) |
Ответы |
Re: REPACK and naming
|
Список | pgsql-hackers |
On Thu, 18 Sept 2025 at 03:03, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote: > So there two operations here. One is > REPACK tab USING INDEX idx > which we currently call CLUSTER, and there is also > REPACK TAB > (no index specified) which we currently call VACUUM FULL. I was just thinking about how much of a heap-ism cluster using an index is. If we were to ever have an index organised table AM, what would it mean to REPACK tab USING INDEX idx? Would that "secondary" index then go away and the table would become that index? or would both continue to exist and the secondary index would be surplus? I do understand that heap is well ingrained in our code (still), but at least things like system catalogue tables/columns can evolve over time. e.g pg_index.indisclustered I could imagine evolving (or disappearing) if we had an IOT-AM. I do think locking in syntax is going to be quite a bit more permanent and needs to be considered very carefully. Something like REPACK tab ORDER BY col1; seems a bit more future proof. table_relation_copy_for_cluster() does support both use of an Index to get presorted results and sorting by the index's key columns, so it doesn't seem impossible that the ability to cluster a table *specifically* by an index couldn't easily go away at some point. Locking us deeper into a syntax for that, I do have concerns for. But maybe you've thought about all this already and I'm just not aware... I'm also trying to keep something like a column store in mind here where you might not have any indexes, and efficient filtering is done via the pruning of "chunks", which works by each chunk recording the min/max (or maybe a dictionary of) values it contains for the columns. I imagine something like that very much would want the ability to have something like REPACK tbl ORDER BY col; if you think how efficient run-length encoding would be for some orders and now inefficient it could be for other orders. Anyway, I'm not intentionally trying to make your job here any more complex. I'm just trying to help make sure we don't end up with some new syntax that also won't stand up to the test of time. David
В списке pgsql-hackers по дате отправления: