Обсуждение: TODO item: WAL replay of CREATE TABLESPACE with differing directory structure

Поиск
Список
Период
Сортировка

TODO item: WAL replay of CREATE TABLESPACE with differing directory structure

От
Patrick Krecker
Дата:
Hi Hackers --

I am searching for a way to make a contribution to Postgres and I came
across this TODO item (I realize there has been some controversy
around the TODO list [1], and I hope that my use of it doesn't spark
another discussion about removing it altogether):

"Allow WAL replay of CREATE TABLESPACE to work when the directory
structure on the recovery computer is different from the original"

Currently it looks like tablespaces have to live inside the data
directory on the replica, notwithstanding administrator intervention
by manipulating the tablespace directory with symlinks after (or even
before?) it has been created via replay.

Is the idea behind this task to allow the master to instruct the
replica where to put the tablespace on its filesystem, so as to allow
it to live outside of the data directory without direct manipulation
of the filesystem?

If this task is a worthwhile endeavor, I would be happy to take it on.
If not, I am open to other ideas :)

Thanks,
Patrick

[1] https://www.postgresql.org/message-id/CA+TgmoZC3CyzZDY1fWChRNOY-5SBjUkB1w=C6y6JmQHtAVukjw@mail.gmail.com


Re: TODO item: WAL replay of CREATE TABLESPACE with differingdirectory structure

От
Michael Paquier
Дата:
On Tue, Feb 13, 2018 at 01:44:34PM -0800, Patrick Krecker wrote:
> I am searching for a way to make a contribution to Postgres and I came
> across this TODO item (I realize there has been some controversy
> around the TODO list [1], and I hope that my use of it doesn't spark
> another discussion about removing it altogether):

Well, it will point out again that TODO items are hard, complicated and
mostly impossible projects.

> "Allow WAL replay of CREATE TABLESPACE to work when the directory
> structure on the recovery computer is different from the original"
>
> Currently it looks like tablespaces have to live inside the data
> directory on the replica, notwithstanding administrator intervention
> by manipulating the tablespace directory with symlinks after (or even
> before?) it has been created via replay.

Let's be clear here. There is no hard restriction with tablespace paths
within the data directory, though you should not do that, and you get a
nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6).
This also causes pg_basebackup to fail.  It is also bad design to create
tablespaces within the data directory as those are aimed at making hot
paths work on different partitions with different I/O properties.

> Is the idea behind this task to allow the master to instruct the
> replica where to put the tablespace on its filesystem, so as to allow
> it to live outside of the data directory without direct manipulation
> of the filesystem?

WAL records associated to CREATE TABLESPACE (xl_tblspc_create_rec)
register the location where a tablespace is located.  The location of a
tablespace is not saved in the system catalogs, which offers flexibility
in the way the symlink from pg_tblspc can be handled.  This is where the
tablespace path remapping of pg_basebackup becomes handy, because you
can repurpose paths easily when taking a base backup, but this forces
you to create tablespaces first, and then create standbys.  We have also
a set of existing problems:
1) If a primary and its standby are on the same server and you issue a
CREATE TABLESPACE, then they would try to write to the same paths.
2) How do we design at DDL level a command which allows for specifying
different paths depending on the node where the recovery happens.

You would need in both cases a sort of ability to define a node name, so
as for 1) you append the node name to the path and both primary and
standby can use the same tablespace path, but with different sub-paths.
And for 2), you can enforce a patch name by defining as well a path
associated to a node name so as when xl_tblspc_create_rec records are
replayed at recovery, you know which path to create.  Just designing
that the right way as its own set of complications.

> If this task is a worthwhile endeavor, I would be happy to take it on.
> If not, I am open to other ideas :)

This is part of the difficult, perhaps-not-worth doing impossible
problems.  As a first contribution, you may want something easier.
--
Michael

Вложения

Re: TODO item: WAL replay of CREATE TABLESPACE with differingdirectory structure

От
Patrick Krecker
Дата:
On Tue, Feb 13, 2018 at 8:24 PM, Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Feb 13, 2018 at 01:44:34PM -0800, Patrick Krecker wrote:
>> I am searching for a way to make a contribution to Postgres and I came
>> across this TODO item (I realize there has been some controversy
>> around the TODO list [1], and I hope that my use of it doesn't spark
>> another discussion about removing it altogether):
>
> Well, it will point out again that TODO items are hard, complicated and
> mostly impossible projects.
>
>> "Allow WAL replay of CREATE TABLESPACE to work when the directory
>> structure on the recovery computer is different from the original"
>>
>> Currently it looks like tablespaces have to live inside the data
>> directory on the replica, notwithstanding administrator intervention
>> by manipulating the tablespace directory with symlinks after (or even
>> before?) it has been created via replay.
>
> Let's be clear here. There is no hard restriction with tablespace paths
> within the data directory, though you should not do that, and you get a
> nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6).
> This also causes pg_basebackup to fail.  It is also bad design to create
> tablespaces within the data directory as those are aimed at making hot
> paths work on different partitions with different I/O properties.

Sorry, my language was imprecise here. What I meant is that the
pg_tablespace directory contains no symlinks when a tablespace
creation is streamed to a replica, i.e. the data files reside within
pg_tablespace on the replica.

>> Is the idea behind this task to allow the master to instruct the
>> replica where to put the tablespace on its filesystem, so as to allow
>> it to live outside of the data directory without direct manipulation
>> of the filesystem?
>
> WAL records associated to CREATE TABLESPACE (xl_tblspc_create_rec)
> register the location where a tablespace is located.  The location of a
> tablespace is not saved in the system catalogs, which offers flexibility
> in the way the symlink from pg_tblspc can be handled.  This is where the
> tablespace path remapping of pg_basebackup becomes handy, because you
> can repurpose paths easily when taking a base backup, but this forces
> you to create tablespaces first, and then create standbys.  We have also
> a set of existing problems:
> 1) If a primary and its standby are on the same server and you issue a
> CREATE TABLESPACE, then they would try to write to the same paths.
> 2) How do we design at DDL level a command which allows for specifying
> different paths depending on the node where the recovery happens.
>
> You would need in both cases a sort of ability to define a node name, so
> as for 1) you append the node name to the path and both primary and
> standby can use the same tablespace path, but with different sub-paths.
> And for 2), you can enforce a patch name by defining as well a path
> associated to a node name so as when xl_tblspc_create_rec records are
> replayed at recovery, you know which path to create.  Just designing
> that the right way as its own set of complications.
>
>> If this task is a worthwhile endeavor, I would be happy to take it on.
>> If not, I am open to other ideas :)
>
> This is part of the difficult, perhaps-not-worth doing impossible
> problems.  As a first contribution, you may want something easier.

Thank you for the response. I would suggest that we link to it from
the wiki so as to provide clarification to future readers of the todo
list.

> --
> Michael


Re: TODO item: WAL replay of CREATE TABLESPACE with differingdirectory structure

От
Michael Paquier
Дата:
On Sun, Feb 18, 2018 at 04:43:38PM -0800, Patrick Krecker wrote:
> On Tue, Feb 13, 2018 at 8:24 PM, Michael Paquier <michael@paquier.xyz> wrote:
>> Let's be clear here. There is no hard restriction with tablespace paths
>> within the data directory, though you should not do that, and you get a
>> nice warning when trying to do so with CREATE TABLESPACE (see 33cb8ff6).
>> This also causes pg_basebackup to fail.  It is also bad design to create
>> tablespaces within the data directory as those are aimed at making hot
>> paths work on different partitions with different I/O properties.
>
> Sorry, my language was imprecise here. What I meant is that the
> pg_tablespace directory contains no symlinks when a tablespace
> creation is streamed to a replica, i.e. the data files reside within
> pg_tablespace on the replica.

There is nothing preventing you to do so I think, and base backups
should work properly as basebackup.c just loops through the paths of the
tablespace links, which leads to errors if the links are within the data
folder itself.  I don't think that we would want a mode where CREATE
TABLESPACE does not create a link at recovery as well, be it controlled
by a system-wide GUC or a switch at DDL level.  That's more likely to
trap users by putting hot data on the same partition as the data folder.

> Thank you for the response. I would suggest that we link to it from
> the wiki so as to provide clarification to future readers of the todo
> list.

Good idea!  I have just updated the wiki page with a link to my previous
post.
--
Michael

Вложения