Re: [HACKERS] Declarative partitioning - another take
От | Amit Langote |
---|---|
Тема | Re: [HACKERS] Declarative partitioning - another take |
Дата | |
Msg-id | 1bd459d9-4c0c-197a-346e-e5e59e217d97@lab.ntt.co.jp обсуждение исходный текст |
Ответ на | Re: [HACKERS] Declarative partitioning - another take (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: [HACKERS] Declarative partitioning - another take
Re: [HACKERS] Declarative partitioning - another take |
Список | pgsql-hackers |
Sorry about the delay in replying. On 2016/12/23 8:08, Robert Haas wrote: > On Thu, Dec 22, 2016 at 3:35 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> While working on that, I discovered yet-another-bug having to do with the >> tuple descriptor that's used as we route a tuple down a partition tree. If >> attnums of given key attribute(s) are different on different levels, it >> would be incorrect to use the original slot's (one passed by ExecInsert()) >> tuple descriptor to inspect the original slot's heap tuple, as we go down >> the tree. It might cause spurious "partition not found" at some level due >> to looking at incorrect field in the input tuple because of using the >> wrong tuple descriptor (root table's attnums not always same as other >> partitioned tables in the tree). Patch 0001 fixes that including a test. > > I committed this, but I'm a bit uncomfortable with it: should the > TupleTableSlot be part of the ModifyTableState rather than the EState? Done that way in 0001 of the attached patches. So, instead of making the standalone partition_tuple_slot a field of EState (with the actual TupleTableSlot in its tupleTable), it is now allocated within ModifyTableState and CopyState, and released when ModifyTable node or CopyFrom finishes, respectively. >> It also addresses the problem I mentioned previously that once >> tuple-routing is done, we failed to switch to a slot with the leaf >> partition's tupdesc (IOW, continued to use the original slot with root >> table's tupdesc causing spurious failures due to differences in attums >> between the leaf partition and the root table). >> >> Further patches 0002, 0003 and 0004 fix bugs that I sent one-big-patch for >> in my previous message. Each patch has a test for the bug it's meant to fix. > > Regarding 0002, I think that this is kind of a strange fix. Wouldn't > it be better to get hold of the original tuple instead of reversing > the conversion? And what of the idea of avoiding the conversion in > the (probably very common) case where we can? To get hold of the original tuple, how about adding an argument orig_slot to ExecConstraints()? I've implemented that approach in the new 0002. Regarding the possibility of avoiding the conversion in very common cases, I think that could be done considering the following: If the mapping from the attribute numbers of the parent table to that of a child table is an identity map, we don't need to convert tuples. Currently however, convert_tuples_by_name() also requires tdtypeid of the input and output TupleDescs to be equal. The reason cited for that is that we may fail to "inject the right OID into the tuple datum" if the types don't match. In case of partitioning, hasoid status must match between the parent and its partitions at all times, so the aforementioned condition is satisfied without requiring that tdtypeid are same. And oid column (if present) is always located at a given position in HeapTuple, so need not map that. Based on the above argument, patch 0006 teaches convert_tuples_by_name() to *optionally* not require tdtypeid of input and output tuple descriptors to be equal. It's implemented by introducing a new argument to convert_tuples_by_name() named 'consider_typeid'. We pass 'false' only for the partitioning cases. (Perhaps, the following should be its own new thread) I noticed that ExecProcessReturning() doesn't work properly after tuple routing (example shows how returning tableoid currently fails but I mention some other issues below): create table p (a int, b int) partition by range (a); create table p1 partition of p for values from (1) to (10); insert into p values (1) returning tableoid::regclass, *; tableoid | a | b ----------+---+--- - | 1 | (1 row) INSERT 0 1 I tried to fix that in 0007 to get: insert into p values (1) returning tableoid::regclass, *; tableoid | a | b ----------+---+--- p | 1 | (1 row) INSERT 0 1 But I think it *may* be wrong to return the root table OID for tuples inserted into leaf partitions, because with select we get partition OIDs: select tableoid::regclass, * from p; tableoid | a | b ----------+---+--- p1 | 1 | (1 row) If so, that means we should build the projection info (corresponding to the returning list) for each target partition somehow. ISTM, that's going to have to be done within the planner by appropriate inheritance translation of the original returning targetlist. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
- 0001-Allocate-partition_tuple_slot-in-respective-nodes.patch
- 0002-Make-ExecConstraints-show-the-correct-row-in-error-m.patch
- 0003-Fix-a-bug-in-how-we-generate-partition-constraints.patch
- 0004-Fix-a-bug-of-insertion-into-an-internal-partition.patch
- 0005-Add-some-more-tests-for-tuple-routing.patch
- 0006-Avoid-tuple-coversion-in-common-partitioning-cases.patch
- 0007-Fix-RETURNING-to-work-correctly-after-tuple-routing.patch
В списке pgsql-hackers по дате отправления: