Re: [Proposal] Global temporary tables

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: [Proposal] Global temporary tables
Дата
Msg-id f0c8abe7-2407-513a-f901-d2de8535be85@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [Proposal] Global temporary tables  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [Proposal] Global temporary tables  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: [Proposal] Global temporary tables  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers

On 05.02.2020 17:10, Robert Haas wrote:
> On Wed, Feb 5, 2020 at 2:28 AM Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> There is very important reason (from my point of view): allow other
>> sessions to use created index and
>> so provide compatible behavior with regular tables (and with Oracle).
>> So we should be able to populate index with existed GTT data.
>> And ambuild will do it.
> I don't understand. A global temporary table, as I understand it, is a
> table for which each session sees separate contents. So you would
> never need to populate it with existing data.
Session 1:
create global temp table gtt(x integer);
insert into gtt values (generate_series(1,100000));

Session 2:
insert into gtt values (generate_series(1,200000));

Session1:
create index on gtt(x);
explain select * from gtt where x = 1;

Session2:
explain select * from gtt where x = 1;
??? Should we use index here?

My answer is - yes.
Just because:
- Such behavior is compatible with regular tables. So it will not 
confuse users and doesn't require some complex explanations.
- It is compatible with Oracle.
- It is what DBA usually want when creating index.
-
There are several arguments against such behavior:
- Concurrent building of index in multiple sessions can consume a lot of 
memory
- Building index can increase query execution time (which can be not 
expected by clients)

I have discussion about it with Pavel here in Pgcon Moscow but we can 
not convince each other.
May be we should provide a choice to the user, by means of GUC or index 
creating parameter.


>
> Besides, even if you did, how are you going to get the data for the
> table? If you get the table data by flat-copying the table, then you
> could copy the index files too. And you would want to, because if the
> table contains a large amount of data, building indexes will be
> expensive. If the index is *empty*, a file copy will not be much
> cheaper than calling ambuild(), but if it's got a lot of data in it,
> it will.

Sorry, I do not understand you.
ambuild is called locally by each backend on first access to the GTT index.
It is done at the moment of building query execution plan when we check 
whether index is valid.
May be it will be sensible to postpone this check and do it for indexes 
which are actually used in query execution plan.

>
>> Sorry, I do not understand the benefits of such optimization. It seems
>> to be very rare situation when session will try to access temp table
>> which was not previously filled with data. But even if it happen,
>> keeping "master" copy will not safe much: we in any case have shared
>> metadata and no data. Yes, with current approach, first access to GTT
>> will cause creation of empty indexes. But It is just initialization of
>> 1-3 pages. I do not think that delaying index initialization can be
>> really useful.
> You might be right, but you're misunderstanding the nature of my
> concern. We probably can't allow DDL on a GTT unless no sessions are
> attached. Having sessions that just read the empty GTT be considered
> as "not attached" might make it easier for some users to find a time
> when no backend is attached and thus DDL is possible.

Ok, now I understand the problem your are going to address.
But still I never saw use cases when empty temp tables are accessed.
Usually we save in temp table some intermediate results of complex query.
Certainly it can happen that query returns empty result.
But usually temp table are used when we expect huge result (otherwise 
materializing result in temp table is not needed).
So I do not think that such optimization can help much in performing DDL 
for GTT.



>
>> In any case, calling ambuild is the simplest and most universal
>> approach, providing desired and compatible behavior.
> Calling ambuild is definitely not simpler than a plain file copy. I
> don't know how you can contend otherwise.
>

This is code fragment whichbuild GTT index on demand:

     if (index->rd_rel->relpersistence == RELPERSISTENCE_SESSION)
     {
         Buffer metapage = ReadBuffer(index, 0);
         bool isNew = PageIsNew(BufferGetPage(metapage));
         ReleaseBuffer(metapage);
         if (isNew)
         {
             Relation heap;
DropRelFileNodeAllLocalBuffers(index->rd_smgr->smgr_rnode.node);
             heap = RelationIdGetRelation(index->rd_index->indrelid);
             index->rd_indam->ambuild(heap, index, BuildIndexInfo(index));
             RelationClose(heap);
         }
     }

That is all - just 10 line of code.
I can make a bet that maintaining separate fork for indexes and copying 
data from it will require much more coding.







В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Is custom MemoryContext prohibited?
Следующее
От: Chapman Flack
Дата:
Сообщение: Re: Is custom MemoryContext prohibited?