Обсуждение: BUG #15074: psql client never returns when creating index (longrunning operation)

Поиск
Список
Период
Сортировка

BUG #15074: psql client never returns when creating index (longrunning operation)

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15074
Logged by:          Lars Vonk
Email address:      lars.vonk@gmail.com
PostgreSQL version: 9.6.7
Operating system:   ubuntu 14.04 LTS
Description:

When creating an index on a table containing approx. 100M records the psql
client never returns when the index is eventually created on the server.  
This happens both in a psql client (9.6.6) on mac osx and using the libpq
client via the ruby pg gem when using active records migrations. 

When running the command directly op the postgres database server using the
psql client it works as expected.

The logs from the server:
2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
LOG:  duration: 999618.165 ms
2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
LOG:  could not receive data from client: Connection reset by peer

The psql client just hangs, when trying to cancel the query client side
(ctrl-c) the server logs says:
2018-02-16 16:13:53 CET:82.161.205.254(59961):[unknown]@[unknown]:[25738]:
LOG:  PID 23141 in cancel request did not match any process


Re: BUG #15074: psql client never returns when creating index (longrunning operation)

От
Michael Paquier
Дата:
On Mon, Feb 19, 2018 at 08:26:48AM +0000, PG Bug reporting form wrote:
> When creating an index on a table containing approx. 100M records the psql
> client never returns when the index is eventually created on the server.
> This happens both in a psql client (9.6.6) on mac osx and using the libpq
> client via the ruby pg gem when using active records migrations.

Are you sure that the process got stuck?  Perhaps it was spending a lot
of time processing and you did not wait for it to finish.

> When running the command directly op the postgres database server using the
> psql client it works as expected.
>
> The logs from the server:
> 2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
> LOG:  duration: 999618.165 ms
> 2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
> LOG:  could not receive data from client: Connection reset by peer
>
> The psql client just hangs, when trying to cancel the query client side
> (ctrl-c) the server logs says:
> 2018-02-16 16:13:53 CET:82.161.205.254(59961):[unknown]@[unknown]:[25738]:
> LOG:  PID 23141 in cancel request did not match any process

How is defined your schema?  It could be possible that a
CHECK_FOR_INTERRUPTS() added in the code paths where the type of the
index you are creating is involved could help in making the system more
responsive to cancel requests.
--
Michael

Вложения

Re: BUG #15074: psql client never returns when creating index (long running operation)

От
Andrew Gierth
Дата:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> When creating an index on a table containing approx. 100M records
 PG> the psql client never returns when the index is eventually created
 PG> on the server. This happens both in a psql client (9.6.6) on mac
 PG> osx and using the libpq client via the ruby pg gem when using
 PG> active records migrations.

 PG> When running the command directly op the postgres database server
 PG> using the psql client it works as expected.

 PG> The logs from the server:
 PG> 2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
 PG> LOG:  duration: 999618.165 ms
 PG> 2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
 PG> LOG:  could not receive data from client: Connection reset by peer

This looks like a network issue; most likely you have a NAT or stateful
firewall in between the client and server which is dropping idle
connections, with a timeout set shorter than your keepalive interval on
either client or server.

The server gets "connection reset" because it tried to send the result
of the statement, plus the "ready for query" message, and the
NAT/firewall sent back a reset because it no longer recognized the
connection as valid. The psql client itself continues to wait for input
(and will do so until the keepalive timeout is reached if keepalive is
enabled, otherwise indefinitely) because the NAT/firewall doesn't
typically send any proactive reset messages.

-- 
Andrew (irc:RhodiumToad)


Re: BUG #15074: psql client never returns when creating index (longrunning operation)

От
Lars Vonk
Дата:
Hi,

Thanks for replying. I will double check the network settings, but couldn't find anything.

Are there any psql / postgres settings I can check, I couldn't find any that might be affecting this.

Lars

Op ma 19 feb. 2018 om 13:27 schreef Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>>>> "PG" == PG Bug reporting form <noreply@postgresql.org> writes:

 PG> When creating an index on a table containing approx. 100M records
 PG> the psql client never returns when the index is eventually created
 PG> on the server. This happens both in a psql client (9.6.6) on mac
 PG> osx and using the libpq client via the ruby pg gem when using
 PG> active records migrations.

 PG> When running the command directly op the postgres database server
 PG> using the psql client it works as expected.

 PG> The logs from the server:
 PG> 2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
 PG> LOG:  duration: 999618.165 ms
 PG> 2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
 PG> LOG:  could not receive data from client: Connection reset by peer

This looks like a network issue; most likely you have a NAT or stateful
firewall in between the client and server which is dropping idle
connections, with a timeout set shorter than your keepalive interval on
either client or server.

The server gets "connection reset" because it tried to send the result
of the statement, plus the "ready for query" message, and the
NAT/firewall sent back a reset because it no longer recognized the
connection as valid. The psql client itself continues to wait for input
(and will do so until the keepalive timeout is reached if keepalive is
enabled, otherwise indefinitely) because the NAT/firewall doesn't
typically send any proactive reset messages.

--
Andrew (irc:RhodiumToad)

Re: BUG #15074: psql client never returns when creating index (longrunning operation)

От
Lars Vonk
Дата:
Hi Michael,

Thanks for replying. We waited as least as long as it took to create the index. It is reproducible when creating this particular index. To be clear, the index was created, the client just did not get notified.

> How is defined your schema?  

What schema are you referring to?

> It could be possible that a CHECK_FOR_INTERRUPTS() added in the code paths where the type of the
index you are creating is involved could help in making the system more
responsive to cancel requests.

I am not sure I understand what this means? It is just psql that we are using connecting to a remote database.

Op ma 19 feb. 2018 om 10:21 schreef Michael Paquier <michael@paquier.xyz>:
On Mon, Feb 19, 2018 at 08:26:48AM +0000, PG Bug reporting form wrote:
> When creating an index on a table containing approx. 100M records the psql
> client never returns when the index is eventually created on the server.
> This happens both in a psql client (9.6.6) on mac osx and using the libpq
> client via the ruby pg gem when using active records migrations.

Are you sure that the process got stuck?  Perhaps it was spending a lot
of time processing and you did not wait for it to finish.

> When running the command directly op the postgres database server using the
> psql client it works as expected.
>
> The logs from the server:
> 2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
> LOG:  duration: 999618.165 ms
> 2018-02-16 16:11:05 CET:82.161.205.254(59584):REDACTED@REDACTED_DB:[23141]:
> LOG:  could not receive data from client: Connection reset by peer
>
> The psql client just hangs, when trying to cancel the query client side
> (ctrl-c) the server logs says:
> 2018-02-16 16:13:53 CET:82.161.205.254(59961):[unknown]@[unknown]:[25738]:
> LOG:  PID 23141 in cancel request did not match any process

How is defined your schema?  It could be possible that a
CHECK_FOR_INTERRUPTS() added in the code paths where the type of the
index you are creating is involved could help in making the system more
responsive to cancel requests.
--
Michael

Re: BUG #15074: psql client never returns when creating index (long running operation)

От
Andrew Gierth
Дата:
>>>>> "Lars" == Lars Vonk <lars.vonk@gmail.com> writes:

 Lars> Hi,
 Lars> Thanks for replying. I will double check the network settings,
 Lars> but couldn't find anything.

I've seen this so many times that I'd bet a fair sum on it.

 Lars> Are there any psql / postgres settings I can check, I couldn't
 Lars> find any that might be affecting this.

The only relevant pg settings are the tcp_keepalives_* ones, which (on
operating systems that support this, i.e. not Windows) cause pg to
request a different keepalive timeout from the OS's default. Note that
OS defaults themselves default to a bit over 2 hours unless you
explicitly configure something else.

Likewise on the client side, if you are using an OS that supports it,
you can use keepalives_idle= in the connection string.

-- 
Andrew (irc:RhodiumToad)