Re: REPACK and naming

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: REPACK and naming
Дата	19 сентября 03:58:57
Msg-id	CAApHDvoeW9ecNxbaaXX0QDUS3i3u+Q684C+qU80paa8qqPHzxA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: REPACK and naming (Álvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы	Re: REPACK and naming
Список	pgsql-hackers

Дерево обсуждения

On Thu, 18 Sept 2025 at 03:03, Álvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> So there two operations here.  One is
> REPACK tab USING INDEX idx
> which we currently call CLUSTER, and there is also
> REPACK TAB
> (no index specified) which we currently call VACUUM FULL.

I was just thinking about how much of a heap-ism cluster using an
index is. If we were to ever have an index organised table AM, what
would it mean to REPACK tab USING INDEX idx? Would that "secondary"
index then go away and the table would become that index? or would
both continue to exist and the secondary index would be surplus?

I do understand that heap is well ingrained in our code (still), but
at least things like system catalogue tables/columns can evolve over
time. e.g pg_index.indisclustered I could imagine evolving (or
disappearing) if we had an IOT-AM. I do think locking in syntax is
going to be quite a bit more permanent and needs to be considered very
carefully. Something like REPACK tab ORDER BY col1; seems a bit more
future proof. table_relation_copy_for_cluster() does support both use
of an Index to get presorted results and sorting by the index's key
columns, so it doesn't seem impossible that the ability to cluster a
table *specifically* by an index couldn't easily go away at some
point. Locking us deeper into a syntax for that, I do have concerns
for. But maybe you've thought about all this already and I'm just not
aware...

I'm also trying to keep something like a column store in mind here
where you might not have any indexes, and efficient filtering is done
via the pruning of "chunks", which works by each chunk recording the
min/max (or maybe a dictionary of) values it contains for the columns.
I imagine something like that very much would want the ability to have
something like REPACK tbl ORDER BY col; if you think how efficient
run-length encoding would be for some orders and now inefficient it
could be for other orders.

Anyway, I'm not intentionally trying to make your job here any more
complex. I'm just trying to help make sure we don't end up with some
new syntax that also won't stand up to the test of time.

David

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: REPACK and naming