Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation
Дата
Msg-id 20191030190255.hxtqtxcjset3l3pz@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Hi,

On 2019-10-30 11:33:21 -0700, Peter Geoghegan wrote:
> On Mon, Apr 22, 2019 at 9:35 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2019-04-21 17:46:09 -0700, Peter Geoghegan wrote:
> > > Andres has suggested that I work on teaching nbtree to accommodate
> > > variable-width, logical table identifiers, such as those required for
> > > indirect indexes, or clustered indexes, where secondary indexes must
> > > use a logical primary key value instead of a heap TID.
> 
> I'm revisiting this thread now because it may have relevance to the
> nbtree deduplication patch. If nothing else, the patch further commits
> us to the current heap TID format by making assumptions about the
> width of posting lists with 6 byte TIDs.

I'd much rather not entrench this further, even leaving global indexes
aside. The 4 byte block number is a significant limitation for heap
tables too, and we should lift that at some point not too far away.
Then there's also other AMs that could really use a wider tid space.


> Though I suppose a posting list almost has to have fixed width TIDs to
> perform acceptably.

Hm. It's not clear to me why that is?


> > I think it's two more cases:
> >
> > - table AMs that want to support tables that are bigger than 32TB. That
> >   used to be unrealistic, but it's not anymore. Especially when the need
> >   to VACUUM etc is largely removed / reduced.
> 
> Can we steal some bits that are currently used for offset number
> instead? 16 bits is far more than we ever need to use for heap offset
> numbers in practice.

I think that's a terrible idea. For one, some AMs will have significant
higher limits, especially taking compression and larger block sizes into
account. Also not all AMs need identifiers tied so closely to a disk
position, e.g. zedstore does not.  We shouldn't hack evermore
information into the offset, given that background.


> (I wonder if this would also have benefits for the representation of
> in-memory bitmaps?)

Hm. Not sure how?


> > - global indexes (for cross-partition unique constraints and such),
> >   which need a partition identifier as part of the tid (or as part of
> >   the index key, but I think that actually makes interaction with
> >   indexam from other layers more complicated - the inside of the index
> >   maybe may want to represent it as a column, but to the outside that
> >   ought not to be visible)
> 
> Can we just use an implementation level attribute for this? Would it
> be so bad if we weren't able to jump straight to the partition number
> without walking through the tuple when the tuple has varwidth
> attributes? (If that isn't acceptable, then we can probably make it
> work for global indexes without having to generalize everything.)

Having to walk through the index tuple might be acceptable - in all
likelihood we'll have to do so anyway.  It does however not *really*
resolve the issue that we still need to pass something tid back from the
indexam, so we can fetch the associated tuple from the heap, or add the
tid to a bitmap. But that could be done separately from the index
internal data structures.


> Generalizing the nbtree AM to be able to work with an arbitrary type
> of table row identifier that isn't at all like a TID raises tricky
> definitional questions. It would have to work in a way that made the
> new variety of table row identifier stable, which is a significant new
> requirement (and one that zheap is clearly not interested in).

Hm. I don't see why a different types of TID would imply them being
stable?


> I am not suggesting that these issues are totally insurmountable. What
> I am saying is this: If we already had "stable logical" TIDs instead
> of "mostly physical TIDs", then generalizing nbtree index tuples to
> store arbitrary table row identifiers would more or less be all about
> the data structure managed by nbtree. But that isn't the case, and
> that strongly discourages me from working on this -- we shouldn't talk
> about the problem as if it is mostly just a matter of settling of the
> best index tuple format.



> Frankly I am not very enthusiastic about working on a project that has
> unclear scope and unclear benefits for users.

Why would properly supporting AMs like zedstore, global indexes,
"indirect" indexes etc benefit users?

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Thoughts on nbtree with logical/varwidth table identifiers, v12on-disk representation
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Parallel leader process info in EXPLAIN