Обсуждение: Partitioning WIP patch (was: Partitioning: issues/ideas)

Поиск
Список
Период
Сортировка

Partitioning WIP patch (was: Partitioning: issues/ideas)

От
Amit Langote
Дата:
On 21-01-2015 PM 07:26, Amit Langote wrote:
>
> Ok, I will limit myself to focusing on following things at the moment:
>
> * Provide syntax in CREATE TABLE to declare partition key
> * Provide syntax in CREATE TABLE to declare a table as partition of a
> partitioned table and values it contains
> * Arrange to have partition key and values stored in appropriate
> catalogs (existing or new)
> * Arrange to cache partitioning info of partitioned tables in relcache
>

Here is an experimental patch that attempts to implement this.

It implements the following syntax:

* Syntax for defining partition key:
CREATE TABLE table_name(columns)PARTITION BY {RANGE|LIST} ON (key_spec);

where key_spec consists of partition key column names and optional
operator class per column. Currently, there are restrictions on the
key_spec such as allowing only column names (not arbitrary expressions
of them), only one column for list strategy, etc.

* Syntax for declaring a table as partition of a partitioned table:
CREATE TABLE table_name PARTITION OF parent_name FOR VALUES values_clause;

where values_clause can be:

IN (list_of_values), or
BETWEEN (range_lower_bounds) AND (range_upper_bounds);

The semantics for a range is [range_lower_bounds,range_upper_bounds),
that is, lower inclusive, upper exclusive. (this might later change
subject to choice regarding preferred/desired syntax)

Additionally, a partition can itself be further partitioned (though I
have not worked on the implementation details of multilevel partitioning
yet):

CREATE TABLE table_name PARTITION OF parent_name PARTITION BY
{RANGE|LIST} ON(key_columns) FOR VALUES values_clause;

There are two system catalogs pg_partitioned_rel and pg_partition which
store partition key spec and partition value spec, respectively.
(remember to initdb if interested in trying)

Please note that the above syntax and/or catalog may not be very
appealing nor that they won't change/disappear. I am posting the patch
more for examining the approach of internal representation of the
metadata for partitioning and get some general comments.

The approach I have implemented in this patch consists of loading
the partition key info and a list of partitions into the relation
descriptor for a partitioned table. In case of range partitioning, this
list is sorted on the range max bound. To see if that works any good, I
hacked ExecInsert() to make it find a partition for a tuple by
adding a ExecFindPartition() just before heap_insert(). It accepts
resultRelInfo of the parent and a tuple slot. It binary-searches for and
returns the descriptor of the chosen partition which ExecInsert() then
uses to perform heap_insert() and inserting index tuples. If no
partition is found, parent relation itself is used. heap_insert() and
ExecInsertIndexTuples() are the only things for which partition relation
is used. All of this is just experimental and most probably wrong in
details; but is done just to see what kind of performance to expect from
the chosen internal representation. Another thing is the approach that
tuple-routing (& partition-pruning) is a function of partitioned
relation and the tuple (or restrict quals). It will be more significant
when we'll get to implementing a partition-pruning function.

See below an example to show that having an extra ExecFindPartition()
does not degrade the performance of inserting a tuple much:

-- a plain table
CREATE TABLE parent_monthly(year int, month int, day int);

-- a partitioned table
-- xxxxx: number of partitions
CREATE TABLE parent_monthly_xxxxx(LIKE parent_monthly) PARTITION BY
RANGE ON(year, month);

-- partitions
CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

CREATE TABLE parent_monthly_xxxxx_201402 PARTITION OF
parent_monthly_00100_201402 FOR VALUES BETWEEN (2014, 2) AND (2014, 3);

CREATE TABLE parent_monthly_xxxxx_201403 PARTITION OF
parent_monthly_00100_201403 FOR VALUES BETWEEN (2014, 3) AND (2014, 4);

<snip>

CREATE TABLE parent_monthly_xxxxx_yyyymm PARTITION OF
parent_monthly_00100_yyyymm FOR VALUES BETWEEN (yyyy, mm AND (yyyy, mm);

-- insert 1 tuple into the plain table
INSERT INTO parent_monthly VALUES (2013, 12, 01);
INSERT 0 1
Time: 3.303 ms


-- insert 1 tuple into the partitioned table
-- #part: number of partitions
-- case 1: find no valid partition
-- case 2: find a valid partition

 #parts      case 1      case 2
========    ========    ========
10        3.248 ms    3.509 ms
100        3.546 ms    3.269 ms
500        3.497 ms    3.048 ms
1000        3.364 ms    5.379 ms
10000        4.943 ms    5.076 ms

Thoughts?

Thanks,
Amit

Вложения

Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

От
Amit Langote
Дата:
On 24-02-2015 PM 05:13, Amit Langote wrote:
> -- partitions
> CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
> parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);
> 
> CREATE TABLE parent_monthly_xxxxx_201402 PARTITION OF
> parent_monthly_00100_201402 FOR VALUES BETWEEN (2014, 2) AND (2014, 3);
> 
> CREATE TABLE parent_monthly_xxxxx_201403 PARTITION OF
> parent_monthly_00100_201403 FOR VALUES BETWEEN (2014, 3) AND (2014, 4);
> 
> <snip>
> 
> CREATE TABLE parent_monthly_xxxxx_yyyymm PARTITION OF
> parent_monthly_00100_yyyymm FOR VALUES BETWEEN (yyyy, mm AND (yyyy, mm);
> 

Oops, hand-edited "PARTITION OF parent_monthly_" wrongly. Supposed to be -

CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_xxxxx FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

and so on.

Thanks,
Amit




Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

От
Amit Langote
Дата:
On 24-02-2015 PM 05:13, Amit Langote wrote:
> Additionally, a partition can itself be further partitioned (though I
> have not worked on the implementation details of multilevel partitioning
> yet):
> 
> CREATE TABLE table_name PARTITION OF parent_name PARTITION BY
> {RANGE|LIST} ON(key_columns) FOR VALUES values_clause;

One more blunder, supposed to be:

CREATE TABLE table_name PARTITION OF parent_name FOR VALUES
values_clause PARTITION BY {RANGE|LIST} ON(key_columns);

Thanks,
Amit




Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

От
Corey Huinker
Дата:
I think it's confusing to use BETWEEN to mean [low,high) when it already means [low,high] in WHERE clauses.

Why not leverage range notation instead?

CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
parent_monthly_xxxxx FOR VALUES IN RANGE '[2014-04-01,2014-05-01)'

"IN RANGE" could easily be "WITHIN RANGE" or "WITHIN" or something else.

Clearly, this example above assumes that the partitioning is on a single column.

For partitioning on a set of columns you're essentially creating a custom composite type with major-minor collation, could that custom type be created at table creation time? Could an existing composite type be declared as the partition key?

CREATE TYPE year_month( year int, month int );

(CREATE OPERATOR... for < = > )

CREATE TABLE parent_monthly(year int, month int, day int) PARTITION BY RANGE ON year_month(year, month); 






On Tue, Feb 24, 2015 at 5:53 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 24-02-2015 PM 05:13, Amit Langote wrote:
> Additionally, a partition can itself be further partitioned (though I
> have not worked on the implementation details of multilevel partitioning
> yet):
>
> CREATE TABLE table_name PARTITION OF parent_name PARTITION BY
> {RANGE|LIST} ON(key_columns) FOR VALUES values_clause;

One more blunder, supposed to be:

CREATE TABLE table_name PARTITION OF parent_name FOR VALUES
values_clause PARTITION BY {RANGE|LIST} ON(key_columns);

Thanks,
Amit



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

От
Amit Langote
Дата:
On 25-02-2015 AM 01:13, Corey Huinker wrote:
> I think it's confusing to use BETWEEN to mean [low,high) when it already
> means [low,high] in WHERE clauses.
> 

Yeah, I'm not really attached to that syntax.

> Why not leverage range notation instead?
> 
> CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
> parent_monthly_xxxxx FOR VALUES IN RANGE '[2014-04-01,2014-05-01)'
> 
> "IN RANGE" could easily be "WITHIN RANGE" or "WITHIN" or something else.
> 
> Clearly, this example above assumes that the partitioning is on a single
> column.
> 
> For partitioning on a set of columns you're essentially creating a custom
> composite type with major-minor collation, could that custom type be
> created at table creation time? Could an existing composite type be
> declared as the partition key?
> 

The answer to the latter is yes as long as there is an operator class
that supports a strategy compatible with the chosen partitioning
strategy. For example, record/composite type has built-in support for
various btree strategies via record_ops.

As for the former, I tend to think creating new user-space
types/operators transparently might not be a very good idea. One concern
would be pg_dump related. Though I also agree there is some duplication
of major-minor collation logic in case of multi-column keys.

> CREATE TYPE year_month( year int, month int );
> 
> (CREATE OPERATOR... for < = > )
> 
> CREATE TABLE parent_monthly(year int, month int, day int) PARTITION BY
> RANGE ON year_month(year, month);
> 

Regards,
Amit




Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

От
Amit Langote
Дата:
On 24-02-2015 PM 05:13, Amit Langote wrote:
> On 21-01-2015 PM 07:26, Amit Langote wrote:
>>
>> Ok, I will limit myself to focusing on following things at the moment:
>>
>> * Provide syntax in CREATE TABLE to declare partition key
>> * Provide syntax in CREATE TABLE to declare a table as partition of a
>> partitioned table and values it contains
>> * Arrange to have partition key and values stored in appropriate
>> catalogs (existing or new)
>> * Arrange to cache partitioning info of partitioned tables in relcache
>>
>
> Here is an experimental patch that attempts to implement this.

I divided the patch into two for convenience:

1) 0001_partition_syntax_catalog - adds commands, catalog and
partitioned table relation descriptor related WIP code

2) 0002_tuple-routing-poc - an experimental patch to test how well
binary search approach works for tuple routing in ExecInsert().

Please take a look.

Thanks,
Amit

Вложения

Re: Partitioning WIP patch

От
Josh Berkus
Дата:
On 02/24/2015 12:13 AM, Amit Langote wrote:
> Here is an experimental patch that attempts to implement this.

This looks awesome.  I would love to have it for 9.5, but I guess the
patch isn't nearly baked enough for that?

> It implements the following syntax:
> 
> * Syntax for defining partition key:
> CREATE TABLE table_name(columns)PARTITION BY {RANGE|LIST} ON (key_spec);
> 
> where key_spec consists of partition key column names and optional
> operator class per column. Currently, there are restrictions on the
> key_spec such as allowing only column names (not arbitrary expressions
> of them), only one column for list strategy, etc.

What's the obstacle to supporting expressions and/or IMMUTABLE
functions?  I think it's fine to add this feature without them
initially, I'm just asking about the roadmap for eventually supporting
expressions in the key spec.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Partitioning WIP patch (was: Partitioning: issues/ideas)

От
Jim Nasby
Дата:
On 2/24/15 2:13 AM, Amit Langote wrote:
> -- a plain table
> CREATE TABLE parent_monthly(year int, month int, day int);
>
> -- a partitioned table
> -- xxxxx: number of partitions
> CREATE TABLE parent_monthly_xxxxx(LIKE parent_monthly) PARTITION BY
> RANGE ON(year, month);

To be clear, in this example parent_table_xxxxx is in no way related to 
parent_monthly, just like a normal CREATE TABLE (LIKE table), right?

> -- partitions
> CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
> parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

And the partitions are still inheritance children?

Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still 
operate the same as today? I'd like to see us continue to support that, 
but perhaps it would be wise to not paint ourselves into that corner 
just yet.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Partitioning WIP patch

От
Amit Langote
Дата:
On 26-02-2015 AM 09:28, Jim Nasby wrote:
> On 2/24/15 2:13 AM, Amit Langote wrote:
>> -- a plain table
>> CREATE TABLE parent_monthly(year int, month int, day int);
>>
>> -- a partitioned table
>> -- xxxxx: number of partitions
>> CREATE TABLE parent_monthly_xxxxx(LIKE parent_monthly) PARTITION BY
>> RANGE ON(year, month);
> 
> To be clear, in this example parent_table_xxxxx is in no way related to
> parent_monthly, just like a normal CREATE TABLE (LIKE table), right?
> 

Yes, there is no relation at all. I was maybe just trying to save few
keystrokes. Sorry, that may be confusing.

parent_monthly is just a regular table, part of the example.

>> -- partitions
>> CREATE TABLE parent_monthly_xxxxx_201401 PARTITION OF
>> parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);
> 
> And the partitions are still inheritance children?
> 

At this point, they *almost* are; more for the sake of Append. Though
sooner than later, we will have to invent a version of Append for
partitioned tables that does more than just append the outputs of
underlying plans. For example, it would use partitioninfo cached in
relation descriptor of the parent to drive partition-pruning for
starters. Pruning child relations individually by way of
constraint_exclusion doesn't scale as is well known.

To clarify things a bit more, transformCreateStmt() transforms PARTITION
OF into LIKE INCLUDING ALL, not INHERITS. And as mentioned, for Append
to work, I have made ATExecAddInherit() to do some of the things
ATExecAttachPartition() does. Again, that is a temporary arrangement.

> Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still
> operate the same as today? I'd like to see us continue to support that,
> but perhaps it would be wise to not paint ourselves into that corner
> just yet.

Nothing prevents that from working, at least at the moment.

CREATE TABLE parent_monthly_00012(LIKE parent_monthly) PARTITION BY
RANGE ON(year, month);

CREATE TABLE parent_monthly_00012_201401 PARTITION OF
parent_monthly_00012 FOR VALUES BETWEEN (2014, 1) AND (2014, 2);

<snip>

CREATE TABLE parent_monthly_00012_201412 PARTITION OF
parent_monthly_00012 FOR VALUES BETWEEN (2014, 12) AND (2015, 1);

# INSERT INTO parent_monthly_00012 VALUES (2014, 07, 01);
INSERT 0 1

# SELECT * FROM parent_monthly_00012;year | month | day
------+-------+-----2014 |     7 |   1
(1 row)

# INSERT INTO parent_monthly_00012 VALUES (2014, 08, 01);
INSERT 0 1

# ALTER TABLE parent_monthly_00012_201408 ADD COLUMN hour int;
ALTER TABLE

# INSERT INTO parent_monthly_00012_201408 VALUES (2014, 08, 01, 10);
INSERT 0 1

# SELECT * FROM parent_monthly_00012;year | month | day
------+-------+-----2014 |     7 |   12014 |     8 |   12014 |     8 |   1
(3 rows)

# SELECT * FROM parent_monthly_00012_201408;year | month | day | hour
------+-------+-----+------2014 |     8 |   1 |2014 |     8 |   1 |   10
(2 rows)

Thanks,
Amit




Re: Partitioning WIP patch

От
Amit Langote
Дата:
On 26-02-2015 AM 10:24, Amit Langote wrote:
> To clarify things a bit more, transformCreateStmt() transforms PARTITION
> OF into LIKE INCLUDING ALL, not INHERITS. And as mentioned, for Append
> to work, I have made ATExecAddInherit() to do some of the things
> ATExecAttachPartition() does. Again, that is a temporary arrangement.
> 

I misspoke. Should have said:

... for Append to work, I have made ATExecAttachPartition() to do some
of the things ATExecAddInherit() does. Again, that is a temporary
arrangement.

Thanks,
Amit




Re: Partitioning WIP patch

От
Jim Nasby
Дата:
On 2/25/15 7:24 PM, Amit Langote wrote:
>> >Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still
>> >operate the same as today? I'd like to see us continue to support that,
>> >but perhaps it would be wise to not paint ourselves into that corner
>> >just yet.
> Nothing prevents that from working, at least at the moment.

Ok, but is that what we really want? If we release it that way we'll be 
stuck with it forever.

I would certainly prefer that we support the capabilities of inheritance 
along with partitioning (because in some cases you want both). But it's 
going to limit what we can do internally.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Partitioning WIP patch

От
Amit Langote
Дата:
On 26-02-2015 AM 10:31, Jim Nasby wrote:
> On 2/25/15 7:24 PM, Amit Langote wrote:
>>> >Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still
>>> >operate the same as today? I'd like to see us continue to support that,
>>> >but perhaps it would be wise to not paint ourselves into that corner
>>> >just yet.
>> Nothing prevents that from working, at least at the moment.
> 
> Ok, but is that what we really want? If we release it that way we'll be
> stuck with it forever.
> 

AIUI, as far as we stay with a design where partitions (children) are
individually planned, that might be OK. But, I guess things will get
more complicated. I think the role of a parent in planning would remain
limited to drive partition-pruning. Am I missing something?

> I would certainly prefer that we support the capabilities of inheritance
> along with partitioning (because in some cases you want both). But it's
> going to limit what we can do internally.

Just to clarify are you referring to inheritance relationship between a
partitioned table and partitions?

With explicit partitioning, shouldn't we go in direction where we remove
some restrictions imposed by inheritance (think multiple inheritance)? I
recall a link Alvaro had started the discussion with think link to a
Tom's remark about something very related:

http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us

Thanks,
Amit




Re: Partitioning WIP patch

От
Amit Langote
Дата:
On 26-02-2015 AM 05:15, Josh Berkus wrote:
> On 02/24/2015 12:13 AM, Amit Langote wrote:
>> Here is an experimental patch that attempts to implement this.
> 
> This looks awesome. 

Thanks!

> I would love to have it for 9.5, but I guess the
> patch isn't nearly baked enough for that?
> 

I'm not quite sure what would qualify as baked enough for 9.5 though we
can surely try to reach some consensus on various implementation aspects
and perhaps even get it ready in time for 9.5.

>> where key_spec consists of partition key column names and optional
>> operator class per column. Currently, there are restrictions on the
>> key_spec such as allowing only column names (not arbitrary expressions
>> of them), only one column for list strategy, etc.
> 
> What's the obstacle to supporting expressions and/or IMMUTABLE
> functions?  I think it's fine to add this feature without them
> initially, I'm just asking about the roadmap for eventually supporting
> expressions in the key spec.
> 

Only one concern I can remember someone had raised is that having to
evaluate an expression for every row during bulk-inserts may end up
being pretty expensive. Though, we might have to live with that.

I think one idea is to learn from ability to use expressions in indexes.

Thanks,
Amit




Re: Partitioning WIP patch

От
Jim Nasby
Дата:
On 2/25/15 7:57 PM, Amit Langote wrote:
> On 26-02-2015 AM 10:31, Jim Nasby wrote:
>> On 2/25/15 7:24 PM, Amit Langote wrote:
>>>>> Does ALTER TABLE parent_monthly_xxxxx_201401 ADD COLUMN foo still
>>>>> operate the same as today? I'd like to see us continue to support that,
>>>>> but perhaps it would be wise to not paint ourselves into that corner
>>>>> just yet.
>>> Nothing prevents that from working, at least at the moment.
>>
>> Ok, but is that what we really want? If we release it that way we'll be
>> stuck with it forever.
>>
>
> AIUI, as far as we stay with a design where partitions (children) are
> individually planned, that might be OK. But, I guess things will get
> more complicated. I think the role of a parent in planning would remain
> limited to drive partition-pruning. Am I missing something?

Isn't the point of adding explicit partitioning to make it faster than 
plain inheritance? Presumably as part of that we'll eventually want to 
NOT plan children individually.

>> I would certainly prefer that we support the capabilities of inheritance
>> along with partitioning (because in some cases you want both). But it's
>> going to limit what we can do internally.
>
> Just to clarify are you referring to inheritance relationship between a
> partitioned table and partitions?

Yes. If it helps, the exact use-case I have in mind is using list-based 
partitioning + additional columns in some/all children (different 
between children). For example, if you need to track different types of 
customer payment methods, you'd have a payment parent table, a list 
partition for credit & debit cards, a different list partition for bank 
accounts, etc.

The reason I'd like to do this with partitioning vs plain inheritance is 
presumably as we build out partitioning we'll get very useful things 
like the ability to have FKs to properly partitioned tables. Insert 
tuple routing could also be useful.

> With explicit partitioning, shouldn't we go in direction where we remove
> some restrictions imposed by inheritance (think multiple inheritance)? I
> recall a link Alvaro had started the discussion with think link to a
> Tom's remark about something very related:
>
> http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us

That post looks like Tom figured out a way to eliminate a problem that 
hurts inheritance, so that's good.

My fear is that at some point we'll hit a problem with partitioning that 
we can't solve in the inheritance model. If we allow inheritance 
features into partitioning now we'll painted into a corner. If we 
disallow those features now we can always re-enable them if we get to 
the point where we're in the clear.

Does that make sense?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Partitioning WIP patch

От
Amit Langote
Дата:
On 26-02-2015 PM 05:20, Jim Nasby wrote:
> On 2/25/15 7:57 PM, Amit Langote wrote:
>> AIUI, as far as we stay with a design where partitions (children) are
>> individually planned, that might be OK. But, I guess things will get
>> more complicated. I think the role of a parent in planning would remain
>> limited to drive partition-pruning. Am I missing something?
> 
> Isn't the point of adding explicit partitioning to make it faster than
> plain inheritance? Presumably as part of that we'll eventually want to
> NOT plan children individually.
> 

Yes, we'd definitely want to get to a point where planning children
individually is not necessary. But I am afraid we will have to get there
a step at a time. IMHO, solving one problem of partition-pruning would
be a good start. And that will definitely be part of parent's planning
using partition bounds list (not pruning children one-by-one with
relation_excluded_by_constraints()).

>>> I would certainly prefer that we support the capabilities of inheritance
>>> along with partitioning (because in some cases you want both). But it's
>>> going to limit what we can do internally.
>>
>> Just to clarify are you referring to inheritance relationship between a
>> partitioned table and partitions?
> 
> Yes. If it helps, the exact use-case I have in mind is using list-based
> partitioning + additional columns in some/all children (different
> between children). For example, if you need to track different types of
> customer payment methods, you'd have a payment parent table, a list
> partition for credit & debit cards, a different list partition for bank
> accounts, etc.
> 
> The reason I'd like to do this with partitioning vs plain inheritance is
> presumably as we build out partitioning we'll get very useful things
> like the ability to have FKs to properly partitioned tables. Insert
> tuple routing could also be useful.
> 

Unless I'm missing something again, isn't allowing partitions to have
heterogeneous rowtypes a problem in the long run? I'm afraid I'm
confused as to your stand regarding inheritance vs. new partitioning. To
be specific, children with heterogeneous schemas sounds much like what
inheritance would be good for as you say. But then isn't that why we
have to plan children individually which you said new partitioning
should get away from?

>> With explicit partitioning, shouldn't we go in direction where we remove
>> some restrictions imposed by inheritance (think multiple inheritance)? I
>> recall a link Alvaro had started the discussion with think link to a
>> Tom's remark about something very related:
>>
>> http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us
> 
> That post looks like Tom figured out a way to eliminate a problem that
> hurts inheritance, so that's good.
> 
> My fear is that at some point we'll hit a problem with partitioning that
> we can't solve in the inheritance model. If we allow inheritance
> features into partitioning now we'll painted into a corner. If we
> disallow those features now we can always re-enable them if we get to
> the point where we're in the clear.
> 
> Does that make sense?

Yes, it does. In fact, I do intend to keep them separate the first
attempt of which is to choose to NOT transform a PARTITION OF parent
clause into INHERITS parent. Any code that may look like it's trying to
do that is because the patch is not fully baked yet.

Regards,
Amit




Re: Partitioning WIP patch

От
Andres Freund
Дата:
On 2015-02-26 02:20:21 -0600, Jim Nasby wrote:
> The reason I'd like to do this with partitioning vs plain inheritance is
> presumably as we build out partitioning we'll get very useful things like
> the ability to have FKs to properly partitioned tables. Insert tuple routing
> could also be useful.

The problem there imo isn't so much inheritance, but lack of working
unique checks across partitions. That's something we can implement
independent of this, it's just not trivial.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Partitioning WIP patch

От
Jim Nasby
Дата:
On 2/26/15 3:09 AM, Amit Langote wrote:
>> Yes. If it helps, the exact use-case I have in mind is using list-based
>> >partitioning + additional columns in some/all children (different
>> >between children). For example, if you need to track different types of
>> >customer payment methods, you'd have a payment parent table, a list
>> >partition for credit & debit cards, a different list partition for bank
>> >accounts, etc.
>> >
>> >The reason I'd like to do this with partitioning vs plain inheritance is
>> >presumably as we build out partitioning we'll get very useful things
>> >like the ability to have FKs to properly partitioned tables. Insert
>> >tuple routing could also be useful.
>> >
> Unless I'm missing something again, isn't allowing partitions to have
> heterogeneous rowtypes a problem in the long run? I'm afraid I'm
> confused as to your stand regarding inheritance vs. new partitioning. To
> be specific, children with heterogeneous schemas sounds much like what
> inheritance would be good for as you say. But then isn't that why we
> have to plan children individually which you said new partitioning
> should get away from?

Apologies if I haven't been clear enough. What I'd like to see is the 
best of both worlds; fast partitioning when not using inheritance, and 
perhaps somewhat slower when using inheritance, but still with the 
features partitioning gives you.

But my bigger concern from a project standpoint is that we not put 
ourselves in a position of supporting something that we really don't 
want to support (a partitioning system that's got inheritance mixed in). 
As much as I'd personally like to have both features together, I think 
it would be bad for the community to go down that road without careful 
thought.

>>> >>With explicit partitioning, shouldn't we go in direction where we remove
>>> >>some restrictions imposed by inheritance (think multiple inheritance)? I
>>> >>recall a link Alvaro had started the discussion with think link to a
>>> >>Tom's remark about something very related:
>>> >>
>>> >>http://www.postgresql.org/message-id/1598.1399826841@sss.pgh.pa.us
>> >
>> >That post looks like Tom figured out a way to eliminate a problem that
>> >hurts inheritance, so that's good.
>> >
>> >My fear is that at some point we'll hit a problem with partitioning that
>> >we can't solve in the inheritance model. If we allow inheritance
>> >features into partitioning now we'll painted into a corner. If we
>> >disallow those features now we can always re-enable them if we get to
>> >the point where we're in the clear.
>> >
>> >Does that make sense?
> Yes, it does. In fact, I do intend to keep them separate the first
> attempt of which is to choose to NOT transform a PARTITION OF parent
> clause into INHERITS parent. Any code that may look like it's trying to
> do that is because the patch is not fully baked yet.

Ok, good to know. That's why I was asking about ALTER TABLE ADD COLUMN 
on a partition. If we release something without that being restricted 
it'll probably cause trouble later on.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Partitioning WIP patch

От
Jim Nasby
Дата:
On 2/26/15 3:22 AM, Andres Freund wrote:
> On 2015-02-26 02:20:21 -0600, Jim Nasby wrote:
>> The reason I'd like to do this with partitioning vs plain inheritance is
>> presumably as we build out partitioning we'll get very useful things like
>> the ability to have FKs to properly partitioned tables. Insert tuple routing
>> could also be useful.
>
> The problem there imo isn't so much inheritance, but lack of working
> unique checks across partitions. That's something we can implement
> independent of this, it's just not trivial.

There's been discussion of allowing for uniqueness when we can guarantee 
no overlap between partitions, and the partition key is part of the 
unique constraint. That's the particular use case I was thinking of.

I suspect there's other partitioning features that would be useful in a 
generic inheritance setup as well; that's why I'd love to see both 
features work together... but I fear there's enough work to get there 
that it may not happen, and I don't want us to accidentally start mixing 
the two and have users start relying on it.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Partitioning WIP patch

От
Josh Berkus
Дата:
On 02/25/2015 07:15 PM, Amit Langote wrote:
> On 26-02-2015 AM 05:15, Josh Berkus wrote:
>> On 02/24/2015 12:13 AM, Amit Langote wrote:
>>> Here is an experimental patch that attempts to implement this.
>>
>> This looks awesome. 
> 
> Thanks!
> 
>> I would love to have it for 9.5, but I guess the
>> patch isn't nearly baked enough for that?
>>
> 
> I'm not quite sure what would qualify as baked enough for 9.5 though we
> can surely try to reach some consensus on various implementation aspects
> and perhaps even get it ready in time for 9.5.

Well, we don't have long at all to do that.  I guess I'm asking what
kind of completeness of code we have; is this basically done pending API
changes and bugs, or are there major bits (like, say, pg_dump  and
EXPLAIN support) which are completely unimplemented?

>>> where key_spec consists of partition key column names and optional
>>> operator class per column. Currently, there are restrictions on the
>>> key_spec such as allowing only column names (not arbitrary expressions
>>> of them), only one column for list strategy, etc.
>>
>> What's the obstacle to supporting expressions and/or IMMUTABLE
>> functions?  I think it's fine to add this feature without them
>> initially, I'm just asking about the roadmap for eventually supporting
>> expressions in the key spec.
>>
> 
> Only one concern I can remember someone had raised is that having to
> evaluate an expression for every row during bulk-inserts may end up
> being pretty expensive. Though, we might have to live with that.

Well, it's not more expensive than having to materialize the value from
a trigger and store it on disk.  The leading one here would be functions
over timestamp; for example, the data has a timestamptz, but you want to
partition by week.

> 
> I think one idea is to learn from ability to use expressions in indexes.

Sure.  So a feature to implement for the 2nd release.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Partitioning WIP patch

От
Andres Freund
Дата:
On 2015-02-26 12:15:17 +0900, Amit Langote wrote:
> On 26-02-2015 AM 05:15, Josh Berkus wrote:
> > On 02/24/2015 12:13 AM, Amit Langote wrote:
> >> Here is an experimental patch that attempts to implement this.

> > I would love to have it for 9.5, but I guess the
> > patch isn't nearly baked enough for that?

> I'm not quite sure what would qualify as baked enough for 9.5 though we
> can surely try to reach some consensus on various implementation aspects
> and perhaps even get it ready in time for 9.5.

I think it's absolutely unrealistic to get this into 9.5. There's barely
been any progress on the current (last!) commitfest - where on earth
should the energy come to make this patch ready? And why would that be
fair against all the others that have submitted in time?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Partitioning WIP patch

От
Amit Langote
Дата:
On 27-02-2015 AM 03:24, Andres Freund wrote:
> On 2015-02-26 12:15:17 +0900, Amit Langote wrote:
>> On 26-02-2015 AM 05:15, Josh Berkus wrote:
>>> I would love to have it for 9.5, but I guess the
>>> patch isn't nearly baked enough for that?
> 
>> I'm not quite sure what would qualify as baked enough for 9.5 though we
>> can surely try to reach some consensus on various implementation aspects
>> and perhaps even get it ready in time for 9.5.
> 
> I think it's absolutely unrealistic to get this into 9.5. There's barely
> been any progress on the current (last!) commitfest - where on earth
> should the energy come to make this patch ready? And why would that be
> fair against all the others that have submitted in time?
> 

I realize and I apologize that it was irresponsible of me to have said
that; maybe got a bit too excited. I do not want to unduly draw people's
time on something that's not quite ready while there are other things
people have worked hard on to get in time. In all earnestness, I say we
spend time perfecting those things.

I'll add this into CF-JUN'15. I will keep posting updates meanwhile so
that when that commitfest finally starts, we will have something worth
considering.

Thanks,
Amit




Re: Partitioning WIP patch

От
Amit Langote
Дата:
On 27-02-2015 AM 03:18, Josh Berkus wrote:
> On 02/25/2015 07:15 PM, Amit Langote wrote:
>> I'm not quite sure what would qualify as baked enough for 9.5 though we
>> can surely try to reach some consensus on various implementation aspects
>> and perhaps even get it ready in time for 9.5.
> 
> Well, we don't have long at all to do that.  I guess I'm asking what
> kind of completeness of code we have; is this basically done pending API
> changes and bugs, or are there major bits (like, say, pg_dump  and
> EXPLAIN support) which are completely unimplemented?
> 

I would say I am not entirely sure/satisfied about some decisions I have
made (or not) when writing even the basic patch. Yes,
pg_dump/EXPLAIN/psql, etc. are not touched. So, it seems it might not be
fair to claim it's actually something for 9.5. Let me just call it WIP
for a while while keep I working on it and receive feedback.

>> Only one concern I can remember someone had raised is that having to
>> evaluate an expression for every row during bulk-inserts may end up
>> being pretty expensive. Though, we might have to live with that.
> 
> Well, it's not more expensive than having to materialize the value from
> a trigger and store it on disk.  The leading one here would be functions
> over timestamp; for example, the data has a timestamptz, but you want to
> partition by week.
> 
>>
>> I think one idea is to learn from ability to use expressions in indexes.
> 
> Sure.  So a feature to implement for the 2nd release.

Actually, I'm trying to add that and see how it works. I will post an
updated patch soon if it looks good enough.

Thanks,
Amit





Re: Partitioning WIP patch

От
Michael Paquier
Дата:
On Fri, Feb 27, 2015 at 3:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-02-26 12:15:17 +0900, Amit Langote wrote:
>> On 26-02-2015 AM 05:15, Josh Berkus wrote:
>> > On 02/24/2015 12:13 AM, Amit Langote wrote:
>> >> Here is an experimental patch that attempts to implement this.
>
>> > I would love to have it for 9.5, but I guess the
>> > patch isn't nearly baked enough for that?
>
>> I'm not quite sure what would qualify as baked enough for 9.5 though we
>> can surely try to reach some consensus on various implementation aspects
>> and perhaps even get it ready in time for 9.5.
>
> I think it's absolutely unrealistic to get this into 9.5. There's barely
> been any progress on the current (last!) commitfest - where on earth
> should the energy come to make this patch ready? And why would that be
> fair against all the others that have submitted in time?

+1. There are many other patches pending the in CF app waiting for
feedback, while this one showed up after the last CF deadline for 9.5
and needs design and spec decisions that should not be taken lightly
at the end of a major release development cycle. Please let's not rush
into something we may regret.
-- 
Michael



Re: Partitioning WIP patch

От
Amit Langote
Дата:
On 27-02-2015 AM 03:01, Jim Nasby wrote:
> On 2/26/15 3:09 AM, Amit Langote wrote:
>> Unless I'm missing something again, isn't allowing partitions to have
>> heterogeneous rowtypes a problem in the long run? I'm afraid I'm
>> confused as to your stand regarding inheritance vs. new partitioning. To
>> be specific, children with heterogeneous schemas sounds much like what
>> inheritance would be good for as you say. But then isn't that why we
>> have to plan children individually which you said new partitioning
>> should get away from?
> 
> Apologies if I haven't been clear enough. What I'd like to see is the
> best of both worlds; fast partitioning when not using inheritance, and
> perhaps somewhat slower when using inheritance, but still with the
> features partitioning gives you.
> 

I get the distinction, thanks.

Actually I wasn't quite thinking of altering the way any part of the
current partitioning based on inheritance works nor am I proposing to
get rid of it. It all stays as is. Not sure how we could say if it will
support features of the new partitioning before those features actually
begin to materialize.

> But my bigger concern from a project standpoint is that we not put
> ourselves in a position of supporting something that we really don't
> want to support (a partitioning system that's got inheritance mixed in).
> As much as I'd personally like to have both features together, I think
> it would be bad for the community to go down that road without careful
> thought.
> 
>> Yes, it does. In fact, I do intend to keep them separate the first
>> attempt of which is to choose to NOT transform a PARTITION OF parent
>> clause into INHERITS parent. Any code that may look like it's trying to
>> do that is because the patch is not fully baked yet.
> 
> Ok, good to know. That's why I was asking about ALTER TABLE ADD COLUMN
> on a partition. If we release something without that being restricted
> it'll probably cause trouble later on.

Yes, I agree. More generally, I think the patch/approach is in need of a
clear separation of internal implementation concerns and user-facing
notions even at this point. This may be one of them. For example, with
the patch, a partition is defined as:

CREATE "TABLE" name PARTITION OF parent ...

Unless that turns into something like:

CREATE PARTITION name OF parent ...

we may not be able to put all the restrictions we'd want to put on a
partition for the sake of what would be partitioning internals.

Thanks,
Amit




Re: Partitioning WIP patch

От
Bruce Momjian
Дата:
On Fri, Feb 27, 2015 at 09:09:35AM +0900, Amit Langote wrote:
> On 27-02-2015 AM 03:24, Andres Freund wrote:
> > On 2015-02-26 12:15:17 +0900, Amit Langote wrote:
> >> On 26-02-2015 AM 05:15, Josh Berkus wrote:
> >>> I would love to have it for 9.5, but I guess the
> >>> patch isn't nearly baked enough for that?
> > 
> >> I'm not quite sure what would qualify as baked enough for 9.5 though we
> >> can surely try to reach some consensus on various implementation aspects
> >> and perhaps even get it ready in time for 9.5.
> > 
> > I think it's absolutely unrealistic to get this into 9.5. There's barely
> > been any progress on the current (last!) commitfest - where on earth
> > should the energy come to make this patch ready? And why would that be
> > fair against all the others that have submitted in time?
> > 
> 
> I realize and I apologize that it was irresponsible of me to have said
> that; maybe got a bit too excited. I do not want to unduly draw people's
> time on something that's not quite ready while there are other things
> people have worked hard on to get in time. In all earnestness, I say we
> spend time perfecting those things.
> 
> I'll add this into CF-JUN'15. I will keep posting updates meanwhile so
> that when that commitfest finally starts, we will have something worth
> considering.

I am _very_ glad you have started on this.  There is a huge need for
this, and I am certainly excited about it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +