Обсуждение: [HACKERS] Effect of dropping a partitioned table's column over time

Поиск
Список
Период
Сортировка

[HACKERS] Effect of dropping a partitioned table's column over time

От
Thomas Munro
Дата:
Hi hackers,

If you drop a column from a partitioned table then it has a TupleDesc
that matches existing partitions, but new partitions created after
that have non-same TupleDescs (according to convert_tuples_by_name)
because they don't have the dropped column.  That means that inserts
to partitions created later need to go via the deform->remap->form
code path in tupconvert.c.  If you're using a time-based partitioning
scheme where you add a new partition for each month and mostly insert
into the current month, as is very common, then after dropping a
column you'll eventually finish up sending ALL your inserts through
tupconvert.c for the rest of time.

For example, having hacked my tree to print out a message to tell me
if it had to convert a tuple:

postgres=# create table parent (a int, b int) partition by list (b);
CREATE TABLE
postgres=# create table child1 partition of parent for values in (1);
CREATE TABLE
postgres=# alter table parent drop column a;
ALTER TABLE
postgres=# create table child2 partition of parent for values in (2);
CREATE TABLE
postgres=# insert into parent values (1);
NOTICE:  no map
INSERT 0 1
postgres=# insert into parent values (2);
NOTICE:  map!
INSERT 0 1

Of course there are other usage patterns where you might prefer it
this way, because you'll mostly be inserting into partitions created
before the change.  In general, would it be better for the partitioned
table's TupleDesc to match partitions created before or after a
change?  Since partitioned tables have no storage themselves, is there
any technical reason we couldn't remove a partitioned table's dropped
pg_attribute so that its TupleDesc matches partitions created later?
Is there some way that tupconvert.c could make this type of difference
moot?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] Effect of dropping a partitioned table's column over time

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> Since partitioned tables have no storage themselves, is there
> any technical reason we couldn't remove a partitioned table's dropped
> pg_attribute so that its TupleDesc matches partitions created later?

You'd break views referring to the partitioned table, or at least to
any columns after the dropped one.

There's been talk of separating column identity (think OID) from column
logical and physical positions.  If we did that, and had Vars using the
column identity number while tupdescs were sorted according to physical
position, then what you're thinking of could be made to work.  But a
couple of people have attacked that problem and been unable to finish
it :-(
        regards, tom lane



Re: [HACKERS] Effect of dropping a partitioned table's column overtime

От
Amit Langote
Дата:
Hi Thomas,

On 2017/08/07 10:58, Thomas Munro wrote:
> Hi hackers,
> 
> If you drop a column from a partitioned table then it has a TupleDesc
> that matches existing partitions, but new partitions created after
> that have non-same TupleDescs (according to convert_tuples_by_name)
> because they don't have the dropped column.  That means that inserts
> to partitions created later need to go via the deform->remap->form
> code path in tupconvert.c.  If you're using a time-based partitioning
> scheme where you add a new partition for each month and mostly insert
> into the current month, as is very common, then after dropping a
> column you'll eventually finish up sending ALL your inserts through
> tupconvert.c for the rest of time.

That's good observation.

> For example, having hacked my tree to print out a message to tell me
> if it had to convert a tuple:
> 
> postgres=# create table parent (a int, b int) partition by list (b);
> CREATE TABLE
> postgres=# create table child1 partition of parent for values in (1);
> CREATE TABLE
> postgres=# alter table parent drop column a;
> ALTER TABLE
> postgres=# create table child2 partition of parent for values in (2);
> CREATE TABLE
> postgres=# insert into parent values (1);
> NOTICE:  no map
> INSERT 0 1
> postgres=# insert into parent values (2);
> NOTICE:  map!
> INSERT 0 1
> 
> Of course there are other usage patterns where you might prefer it
> this way, because you'll mostly be inserting into partitions created
> before the change.  In general, would it be better for the partitioned
> table's TupleDesc to match partitions created before or after a
> change?  Since partitioned tables have no storage themselves, is there
> any technical reason we couldn't remove a partitioned table's dropped
> pg_attribute so that its TupleDesc matches partitions created later?

That means the parent's TupleDesc will begin mismatching that of all of
the existing partitions and they will suddenly need a map where they
didn't before.

I guess you considered it, but optimizing for the common case of range
partitioning where most of the inserts go to the newest partition will
hurt the other partitioning methods, like hash, where that won't
necessarily be true.

> Is there some way that tupconvert.c could make this type of difference
> moot?

Do you mean the difference arising due to dropped columns in either the
partitioned table or the table attached as a partition?

Thanks,
Amit




Re: [HACKERS] Effect of dropping a partitioned table's column over time

От
Thomas Munro
Дата:
On Mon, Aug 7, 2017 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> Since partitioned tables have no storage themselves, is there
>> any technical reason we couldn't remove a partitioned table's dropped
>> pg_attribute so that its TupleDesc matches partitions created later?
>
> You'd break views referring to the partitioned table, or at least to
> any columns after the dropped one.

I will put a huge sign up next to my desk: "What about the rules?"

> There's been talk of separating column identity (think OID) from column
> logical and physical positions.  If we did that, and had Vars using the
> column identity number while tupdescs were sorted according to physical
> position, then what you're thinking of could be made to work.  But a
> couple of people have attacked that problem and been unable to finish
> it :-(

Hmm, yeah I see.  I have seen that[1] and I hope it comes back.  It
seems like it might be a step on the path towards incremental
materialized views (at least in one proposal) which is why I asked
about it on this list recently[2].

[1] https://www.postgresql.org/message-id/flat/20141209174146.GP1768@alvh.no-ip.org
[2]
https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEnVbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] Effect of dropping a partitioned table's column over time

От
Thomas Munro
Дата:
On Mon, Aug 7, 2017 at 2:35 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/08/07 10:58, Thomas Munro wrote:
>> Of course there are other usage patterns where you might prefer it
>> this way, because you'll mostly be inserting into partitions created
>> before the change.  In general, would it be better for the partitioned
>> table's TupleDesc to match partitions created before or after a
>> change?  Since partitioned tables have no storage themselves, is there
>> any technical reason we couldn't remove a partitioned table's dropped
>> pg_attribute so that its TupleDesc matches partitions created later?
>
> That means the parent's TupleDesc will begin mismatching that of all of
> the existing partitions and they will suddenly need a map where they
> didn't before.

True, that doesn't sound great, but eventually you'd stop doing it in
common usage patterns.

> I guess you considered it, but optimizing for the common case of range
> partitioning where most of the inserts go to the newest partition will
> hurt the other partitioning methods, like hash, where that won't
> necessarily be true.

Right, you wouldn't want to do it there.  I guess with hash
partitioning you wouldn't typically be rotating partitions (dropped
old ones and creating new ones), so the TupleDescs stay in lock-step.

>> Is there some way that tupconvert.c could make this type of difference
>> moot?
>
> Do you mean the difference arising due to dropped columns in either the
> partitioned table or the table attached as a partition?

Yeah.  I can't think of any way.  I thought for a moment about fast
column removal path involving sliding memory rather than full
deform/reform, but that's uninspiring.  I am not actually proposing
any change here since I have no evidence that there's any real
practical problem.  I just wanted to share the realisation I had
during an off-list discussion about tuple deforming, when considering
whether we actually expect to hit the tuple conversion case often.
Depending on the history of your schema the answer may be: never,
sometimes or always, and once you reach this always state you'll never
get out of it.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] Effect of dropping a partitioned table's column over time

От
Craig Ringer
Дата:
On 7 August 2017 at 11:25, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Mon, Aug 7, 2017 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> Since partitioned tables have no storage themselves, is there
>> any technical reason we couldn't remove a partitioned table's dropped
>> pg_attribute so that its TupleDesc matches partitions created later?
>
> You'd break views referring to the partitioned table, or at least to
> any columns after the dropped one.

I will put a huge sign up next to my desk: "What about the rules?"

> There's been talk of separating column identity (think OID) from column
> logical and physical positions.  If we did that, and had Vars using the
> column identity number while tupdescs were sorted according to physical
> position, then what you're thinking of could be made to work.  But a
> couple of people have attacked that problem and been unable to finish
> it :-(

Hmm, yeah I see.  I have seen that[1] and I hope it comes back.  It
seems like it might be a step on the path towards incremental
materialized views (at least in one proposal) which is why I asked
about it on this list recently[2].

Can we instead create the new partitions with the same dropped columns?

Ensure that every partition, parent and child, has the same column-set? 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Effect of dropping a partitioned table's column over time

От
Robert Haas
Дата:
On Sun, Aug 6, 2017 at 11:38 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Can we instead create the new partitions with the same dropped columns?
>
> Ensure that every partition, parent and child, has the same column-set?

We could, but that has costs of its own.  It means that those calls
are stored in the tuple as nulls, which means that there will be a
null bitmap where you wouldn't otherwise have one, and every time you
deform a tuple you'll have to worry about those columns even though
they're never used for anything.

It's not clear whether this cost is more or less than the tuple
conversion cost.  It seems ugly to me, though, either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company