Обсуждение: [HACKERS] Server Crashes if try to provide slot_name='none' at the time ofcreating subscription.

Поиск
Список
Период
Сортировка
Hi,

Server Crashes if we try to provide slot_name='none' at the time of 
creating subscription -

postgres=#  create subscription sub2 connection 'dbname=postgres 
port=5000 user=centos password=f' publication abc with (slot_name='none');
NOTICE:  synchronized table states
server closed the connection unexpectedly    This probably means the server terminated abnormally    before or while
processingthe request.
 
The connection to the server was lost. Attempting reset: Failed.
!>

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




On Mon, May 15, 2017 at 6:41 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
> Hi,
>
> Server Crashes if we try to provide slot_name='none' at the time of creating
> subscription -
>
> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
> user=centos password=f' publication abc with (slot_name='none');
> NOTICE:  synchronized table states
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>

Thank you for reporting.
I think create_slot and enabled should be set to false forcibly when
slot_name = 'none'. Attached patch fixes it, more test and regression
test case are needed though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, May 15, 2017 at 6:41 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
>> Hi,
>>
>> Server Crashes if we try to provide slot_name='none' at the time of creating
>> subscription -
>>
>> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
>> user=centos password=f' publication abc with (slot_name='none');
>> NOTICE:  synchronized table states
>> server closed the connection unexpectedly
>>     This probably means the server terminated abnormally
>>     before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
>>
>
> Thank you for reporting.
> I think create_slot and enabled should be set to false forcibly when
> slot_name = 'none'. Attached patch fixes it, more test and regression
> test case are needed though.
>
I think the patch doesn't solve the problem completely. For example,
postgres=# create subscription sub3 connection 'dbname=postgres
port=5432 user=edb password=edb' publication abc with
(slot_name='none',create_slot=true);
ERROR:  slot_name = NONE and create slot are mutually exclusive options
postgres=# create subscription sub3 connection 'dbname=postgres
port=5432 user=edb password=edb' publication abc with
(create_slot=true,slot_name='none');
ERROR:  could not connect to the publisher: could not connect to
server: Connection refused
    Is the server running locally and accepting
    connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

Changing the order of subscription parameter changes the output. I
think the modifications should be done at the end of the function.
Attached a patch for the same.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On Mon, May 15, 2017 at 8:06 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
> On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Mon, May 15, 2017 at 6:41 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
>>> Hi,
>>>
>>> Server Crashes if we try to provide slot_name='none' at the time of creating
>>> subscription -
>>>
>>> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
>>> user=centos password=f' publication abc with (slot_name='none');
>>> NOTICE:  synchronized table states
>>> server closed the connection unexpectedly
>>>     This probably means the server terminated abnormally
>>>     before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> !>
>>>
>>
>> Thank you for reporting.
>> I think create_slot and enabled should be set to false forcibly when
>> slot_name = 'none'. Attached patch fixes it, more test and regression
>> test case are needed though.
>>
> I think the patch doesn't solve the problem completely. For example,
> postgres=# create subscription sub3 connection 'dbname=postgres
> port=5432 user=edb password=edb' publication abc with
> (slot_name='none',create_slot=true);
> ERROR:  slot_name = NONE and create slot are mutually exclusive options
> postgres=# create subscription sub3 connection 'dbname=postgres
> port=5432 user=edb password=edb' publication abc with
> (create_slot=true,slot_name='none');
> ERROR:  could not connect to the publisher: could not connect to
> server: Connection refused
>     Is the server running locally and accepting
>     connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
>
> Changing the order of subscription parameter changes the output. I
> think the modifications should be done at the end of the function.
> Attached a patch for the same.
>

Yes, you're patch fixes it correctly.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Mon, May 15, 2017 at 4:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

While testing with logical replication, I've found that the server
hangs if we create a subscription to the same server. For example,

$ ./pg_ctl -D Test start -o "-p 5432"
$ ./psql -p 5432 postgres -c "CREATE PUBLICATION abc for all tables
with (publish='delete');"
$ ./psql -p 5432 postgres -c "create subscription sub connection
'dbname=postgres port=5432 user=edb password=edb' publication abc with
(slot_name='abcslot');"
NOTICE:  synchronized table states
LOG:  logical decoding found initial starting point at 0/162DF18
DETAIL:  Waiting for transactions (approximately 1) older than 560 to end.

And, it hangs. Is this an expected behavior?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



On Mon, May 15, 2017 at 8:22 PM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
> On Mon, May 15, 2017 at 4:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> While testing with logical replication, I've found that the server
> hangs if we create a subscription to the same server. For example,
>
> $ ./pg_ctl -D Test start -o "-p 5432"
> $ ./psql -p 5432 postgres -c "CREATE PUBLICATION abc for all tables
> with (publish='delete');"
> $ ./psql -p 5432 postgres -c "create subscription sub connection
> 'dbname=postgres port=5432 user=edb password=edb' publication abc with
> (slot_name='abcslot');"
> NOTICE:  synchronized table states
> LOG:  logical decoding found initial starting point at 0/162DF18
> DETAIL:  Waiting for transactions (approximately 1) older than 560 to end.
>
> And, it hangs. Is this an expected behavior?

I guess this is what we're discussing on [1].
Petr answered on that,
===
Yes that's result of how logical replication slots work, the transaction
that needs to finish is your transaction. It can be worked around by
creating the slot manually via the SQL interface for example and create
the subscription using WITH (NOCREATE SLOT, SLOT NAME = 'your slot') .
===

[1] https://www.postgresql.org/message-id/20170426165954.GK14000%40momjian.us

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



On Mon, May 15, 2017 at 5:04 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, May 15, 2017 at 8:22 PM, Kuntal Ghosh
> <kuntalghosh.2007@gmail.com> wrote:
>> On Mon, May 15, 2017 at 4:39 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>
>> While testing with logical replication, I've found that the server
>> hangs if we create a subscription to the same server. For example,
>>
>> $ ./pg_ctl -D Test start -o "-p 5432"
>> $ ./psql -p 5432 postgres -c "CREATE PUBLICATION abc for all tables
>> with (publish='delete');"
>> $ ./psql -p 5432 postgres -c "create subscription sub connection
>> 'dbname=postgres port=5432 user=edb password=edb' publication abc with
>> (slot_name='abcslot');"
>> NOTICE:  synchronized table states
>> LOG:  logical decoding found initial starting point at 0/162DF18
>> DETAIL:  Waiting for transactions (approximately 1) older than 560 to end.
>>
>> And, it hangs. Is this an expected behavior?
>
> I guess this is what we're discussing on [1].
> Petr answered on that,
> ===
> Yes that's result of how logical replication slots work, the transaction
> that needs to finish is your transaction. It can be worked around by
> creating the slot manually via the SQL interface for example and create
> the subscription using WITH (NOCREATE SLOT, SLOT NAME = 'your slot') .
> ===
>
> [1] https://www.postgresql.org/message-id/20170426165954.GK14000%40momjian.us
>
Ohh I see. Thank you.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



On Mon, May 15, 2017 at 8:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, May 15, 2017 at 8:06 PM, Kuntal Ghosh
> <kuntalghosh.2007@gmail.com> wrote:
>> On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Mon, May 15, 2017 at 6:41 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
>>>> Hi,
>>>>
>>>> Server Crashes if we try to provide slot_name='none' at the time of creating
>>>> subscription -
>>>>
>>>> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
>>>> user=centos password=f' publication abc with (slot_name='none');
>>>> NOTICE:  synchronized table states
>>>> server closed the connection unexpectedly
>>>>     This probably means the server terminated abnormally
>>>>     before or while processing the request.
>>>> The connection to the server was lost. Attempting reset: Failed.
>>>> !>
>>>>
>>>
>>> Thank you for reporting.
>>> I think create_slot and enabled should be set to false forcibly when
>>> slot_name = 'none'. Attached patch fixes it, more test and regression
>>> test case are needed though.
>>>
>> I think the patch doesn't solve the problem completely. For example,
>> postgres=# create subscription sub3 connection 'dbname=postgres
>> port=5432 user=edb password=edb' publication abc with
>> (slot_name='none',create_slot=true);
>> ERROR:  slot_name = NONE and create slot are mutually exclusive options
>> postgres=# create subscription sub3 connection 'dbname=postgres
>> port=5432 user=edb password=edb' publication abc with
>> (create_slot=true,slot_name='none');
>> ERROR:  could not connect to the publisher: could not connect to
>> server: Connection refused
>>     Is the server running locally and accepting
>>     connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
>>
>> Changing the order of subscription parameter changes the output. I
>> think the modifications should be done at the end of the function.
>> Attached a patch for the same.
>>
>
> Yes, you're patch fixes it correctly.
>

I've updated Kuntal's patch, added regression test for option
combination and updated documentation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
Added to open item lists.

On Tue, May 16, 2017 at 6:35 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, May 15, 2017 at 8:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Mon, May 15, 2017 at 8:06 PM, Kuntal Ghosh
>> <kuntalghosh.2007@gmail.com> wrote:
>>> On Mon, May 15, 2017 at 4:00 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Mon, May 15, 2017 at 6:41 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Server Crashes if we try to provide slot_name='none' at the time of creating
>>>>> subscription -
>>>>>
>>>>> postgres=#  create subscription sub2 connection 'dbname=postgres port=5000
>>>>> user=centos password=f' publication abc with (slot_name='none');
>>>>> NOTICE:  synchronized table states
>>>>> server closed the connection unexpectedly
>>>>>     This probably means the server terminated abnormally
>>>>>     before or while processing the request.
>>>>> The connection to the server was lost. Attempting reset: Failed.
>>>>> !>
>>>>>
>>>>
>>>> Thank you for reporting.
>>>> I think create_slot and enabled should be set to false forcibly when
>>>> slot_name = 'none'. Attached patch fixes it, more test and regression
>>>> test case are needed though.
>>>>
>>> I think the patch doesn't solve the problem completely. For example,
>>> postgres=# create subscription sub3 connection 'dbname=postgres
>>> port=5432 user=edb password=edb' publication abc with
>>> (slot_name='none',create_slot=true);
>>> ERROR:  slot_name = NONE and create slot are mutually exclusive options
>>> postgres=# create subscription sub3 connection 'dbname=postgres
>>> port=5432 user=edb password=edb' publication abc with
>>> (create_slot=true,slot_name='none');
>>> ERROR:  could not connect to the publisher: could not connect to
>>> server: Connection refused
>>>     Is the server running locally and accepting
>>>     connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
>>>
>>> Changing the order of subscription parameter changes the output. I
>>> think the modifications should be done at the end of the function.
>>> Attached a patch for the same.
>>>
>>
>> Yes, you're patch fixes it correctly.
>>
>
> I've updated Kuntal's patch, added regression test for option
> combination and updated documentation.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



On 05/16/2017 06:35 AM, Masahiko Sawada wrote:
> I've updated Kuntal's patch, added regression test for option
> combination and updated documentation.
While testing the patch - I  found that after dump/restore , we are 
getting an error in the log file once we enable the subscription

\\create subscription

postgres=# CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 
' PUBLICATION qdd WITH (slot_name='none');
NOTICE:  synchronized table states
CREATE SUBSCRIPTION

\\take the dump
[centos@centos-cpula bin]$ ./pg_dump -Fp  -p 9000 postgres > /tmp/d.c
\\check the syntax
[centos@centos-cpula bin]$ cat /tmp/d.c |grep 'create subsc*' -i
CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 ' 
PUBLICATION qdd WITH (connect = false, slot_name = '');
\\execute this same syntax against a new database
postgres=# create database  test;
CREATE DATABASE
postgres=# \c test
You are now connected to database "test" as user "centos".
test=# CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 ' 
PUBLICATION qdd WITH (connect = false, slot_name = '');
WARNING:  tables were not subscribed, you will have to run ALTER 
SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
CREATE SUBSCRIPTION

test=# alter subscription m1 refresh publication ;
ERROR:  ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled 
subscriptions
test=# alter subscription m1 enable ;
ALTER SUBSCRIPTION

Check the message in  log file

017-05-16 14:04:48.373 BST [18219] LOG:  logical replication apply for 
subscription m1 started
2017-05-16 14:04:48.381 BST [18219] ERROR:  could not start WAL 
streaming: ERROR:  replication slot name "" is too short
2017-05-16 14:04:48.382 BST [17843] LOG:  worker process: logical 
replication worker for subscription 16386 (PID 18219) exited with exit 
code 1
2017-05-16 14:04:53.388 BST [17850] LOG:  starting logical replication 
worker for subscription "m1"
2017-05-16 14:04:53.396 BST [18224] LOG:  logical replication apply for 
subscription m1 started
2017-05-16 14:04:53.403 BST [18224] ERROR:  could not start WAL 
streaming: ERROR:  replication slot name "" is too short

Is this error message (ERROR:  replication slot name "" is too short ) 
is expected now ?

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




On Tue, May 16, 2017 at 10:06 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 05/16/2017 06:35 AM, Masahiko Sawada wrote:
>>
>> I've updated Kuntal's patch, added regression test for option
>> combination and updated documentation.
>
> While testing the patch - I  found that after dump/restore , we are getting
> an error in the log file once we enable the subscription
>
> \\create subscription
>
> postgres=# CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 '
> PUBLICATION qdd WITH (slot_name='none');
> NOTICE:  synchronized table states
> CREATE SUBSCRIPTION
>
> \\take the dump
> [centos@centos-cpula bin]$ ./pg_dump -Fp  -p 9000 postgres > /tmp/d.c
> \\check the syntax
> [centos@centos-cpula bin]$ cat /tmp/d.c |grep 'create subsc*' -i
> CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 ' PUBLICATION
> qdd WITH (connect = false, slot_name = '');
> \\execute this same syntax against a new database
> postgres=# create database  test;
> CREATE DATABASE
> postgres=# \c test
> You are now connected to database "test" as user "centos".
> test=# CREATE SUBSCRIPTION m1 CONNECTION 'dbname=postgres port=5000 '
> PUBLICATION qdd WITH (connect = false, slot_name = '');
> WARNING:  tables were not subscribed, you will have to run ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION to subscribe the tables
> CREATE SUBSCRIPTION
>
> test=# alter subscription m1 refresh publication ;
> ERROR:  ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled
> subscriptions
> test=# alter subscription m1 enable ;
> ALTER SUBSCRIPTION
>
> Check the message in  log file
>
> 017-05-16 14:04:48.373 BST [18219] LOG:  logical replication apply for
> subscription m1 started
> 2017-05-16 14:04:48.381 BST [18219] ERROR:  could not start WAL streaming:
> ERROR:  replication slot name "" is too short
> 2017-05-16 14:04:48.382 BST [17843] LOG:  worker process: logical
> replication worker for subscription 16386 (PID 18219) exited with exit code
> 1
> 2017-05-16 14:04:53.388 BST [17850] LOG:  starting logical replication
> worker for subscription "m1"
> 2017-05-16 14:04:53.396 BST [18224] LOG:  logical replication apply for
> subscription m1 started
> 2017-05-16 14:04:53.403 BST [18224] ERROR:  could not start WAL streaming:
> ERROR:  replication slot name "" is too short
>
> Is this error message (ERROR:  replication slot name "" is too short ) is
> expected now ?
>

I think there are two bugs; pg_dump should dump slot_name = NONE
instead of '' and subscription should not be created if given slot
name is invalid. The validation check for replication slot name is
done when creating it actually but I think it's more safer to check
when CREATE SUBSCRIPTION. The bug related to setting slot_name = NONE
should be fixed by attached 001 patch, and 002 patch prevents to be
specified invalid replication slot name when CREATE SUBSCRIPTION and
ALTER SUBSCRIPTION SET.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения
On 5/16/17 22:21, Masahiko Sawada wrote:
> I think there are two bugs; pg_dump should dump slot_name = NONE
> instead of '' and subscription should not be created if given slot
> name is invalid. The validation check for replication slot name is
> done when creating it actually but I think it's more safer to check
> when CREATE SUBSCRIPTION. The bug related to setting slot_name = NONE
> should be fixed by attached 001 patch, and 002 patch prevents to be
> specified invalid replication slot name when CREATE SUBSCRIPTION and
> ALTER SUBSCRIPTION SET.

I have worked through these issues and came up with slightly different
patches.

The issue in pg_dump should have been checking PQgetisnull() before
reading the slot name.  That is now fixed.

The issue with slot_name = NONE causing a crash was fixed by adding
additional error checking.  I did not change it so that slot_name = NONE
would change the defaults of enabled and create_slot.  I think it's
confusing if options have dependencies like that.  In any case, creating
a subscription with slot_name = NONE is probably not useful anyway, so
as long as it doesn't crash, we don't need to make it excessively
user-friendly.

I don't think the subscription side should check the validity of the
replication slot name.  That is the responsibility of the publication
side.  The rules may well be different if you replicate between
different versions or different build options.  This is currently
working correctly: If the publication side doesn't like the slot you
specify, either because the name is invalid or the slot doesn't exist,
you get an appropriate error message.

Please check whether everything is working OK for you now.  I think this
open item is closed now.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Thu, May 18, 2017 at 10:33 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 5/16/17 22:21, Masahiko Sawada wrote:
>> I think there are two bugs; pg_dump should dump slot_name = NONE
>> instead of '' and subscription should not be created if given slot
>> name is invalid. The validation check for replication slot name is
>> done when creating it actually but I think it's more safer to check
>> when CREATE SUBSCRIPTION. The bug related to setting slot_name = NONE
>> should be fixed by attached 001 patch, and 002 patch prevents to be
>> specified invalid replication slot name when CREATE SUBSCRIPTION and
>> ALTER SUBSCRIPTION SET.
>
> I have worked through these issues and came up with slightly different
> patches.
>
> The issue in pg_dump should have been checking PQgetisnull() before
> reading the slot name.  That is now fixed.

Agreed.

>
> The issue with slot_name = NONE causing a crash was fixed by adding
> additional error checking.  I did not change it so that slot_name = NONE
> would change the defaults of enabled and create_slot.  I think it's
> confusing if options have dependencies like that.  In any case, creating
> a subscription with slot_name = NONE is probably not useful anyway, so
> as long as it doesn't crash, we don't need to make it excessively
> user-friendly.

Also agreed.

> I don't think the subscription side should check the validity of the
> replication slot name.  That is the responsibility of the publication
> side.  The rules may well be different if you replicate between
> different versions or different build options.  This is currently
> working correctly: If the publication side doesn't like the slot you
> specify, either because the name is invalid or the slot doesn't exist,
> you get an appropriate error message.

Yeah, now I understood. Agreed.

> Please check whether everything is working OK for you now.  I think this
> open item is closed now.

I've checked these changes, everything is working fine. Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center