Обсуждение: Batches of single-insert statements vs batches of multi-insert statements

Поиск
Список
Период
Сортировка

Batches of single-insert statements vs batches of multi-insert statements

От
Christopher Deckers
Дата:
Hi all,

I tried all sorts of query rewrite to boost a huge sequence of insert statements and here are my findings. All is run on a local Postgres 9.5.3, on a 64-bit Windows box with 32GB RAM, i7@3.6GHz.

1. Batches of single insert statements (original case):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?)
-> 52 seconds

2. Batches of single insert statements to an unlogged table, then copy:
CREATE UNLOGGED TABLE SomeUnloggedTable AS SELECT * FROM SomeTable WHERE 1 = 0
Batch of: INSERT INTO SomeUnloggedTable (Col1, Col2) VALUES (?, ?)
INSERT INTO SomeTable SELECT * FROM SomeUnloggedTable
DROP TABLE SomeUnloggedTable
-> 43 seconds (15 seconds in the copy from unlogged to target table)

3. Batches of 100 multi-insert (followed by batches of single inserts for remainder):
Batch of: INSERT INTO SomeTable (Col1, Col2) VALUES (?, ?), (?, ?), (?, ?)...
-> 30 seconds

So, rewriting the batch of single insert statements to batches of 100 multi-insert statements almost doubled the speed.

Could someone educate me as to what happens internally to the JDBC single-insert batch that makes it so different from a multi-insert batch?

Rewriting our code to use multi-value inserts is more cumbersome and not applicable everywhere. It also makes the code database-specific as all the engines we support do not all handle this syntax.
These considerations actually lead me to this old thread:
https://www.postgresql.org/message-id/55130DC8.2070508%40redhat.com

Anyway, these tests definitely show that the performance of the standard JDBC batch API could potentially be improved in Postgres internally with a bit of magic :)
Isn't it possible (either in the driver or in the engine) to aggregate the single-insert statements of a batch to turn them into a multi-insert equivalent? It would not have to handle all cases: simply handling basic commands like "INSERT INTO X (a, b, c, ...) VALUES (?, ?, ?, ...)" would help a lot.
If this should not be done in the JDBC driver, then in which area could such handling be done?
Or have I missed something?

If needed, I am ready to offer my assistance with testing (I can tweak parameters, install dev drivers, modify my code to run alternate statements, experiment, etc.).

Cheers,
-Christopher

Re: Batches of single-insert statements vs batches of multi-insert statements

От
Vladimir Sitnikov
Дата:
Hi Christopher,

>Could someone educate me as to what happens internally to the JDBC single-insert batch that makes it so different from a multi-insert batch?

There are significant overheads at the database level itself to initialize and cleanup resources required for a "insert values" statement.
This is a known issue for backend developers, however it does not seem to have easy solution.

>If this should not be done in the JDBC driver, then in which area could such handling be done?
>Or have I missed something?

Believe me or not, pgjdbc has already that feature implemented.

The feature is disabled by default (so everybody can give it a try before we make it default), and we plan to activate it by default.
To activate multi-inserts, you should add reWriteBatchedInserts=true connection parameter.
You need 9.4.1209-SNAPSHOT version (you can grab that from Maven Central or build from sources).

We all would be grateful if you could test it and let us know if it works for you.

Vladimir 

Re: Batches of single-insert statements vs batches of multi-insert statements

От
Christopher Deckers
Дата:
Hi Vladimir,

Thanks for your answer! It explains well the current situation.

Believe me or not, pgjdbc has already that feature implemented.

Oh good! But it definitely needs more testing and fixing :)
I got: "Batch entry 0 INSERT INTO XXXX".
Next exception: "VALUES lists must all be the same length".

I narrowed down the issue to a simple test case, see attached file.

Funny enough, if I change the name of the table, it seems to work. I used the convention that we have in our production code.
Note that this test case does not produce any error if I remove the "reWriteBatchedInserts=true" parameter.

Please let me know if I can be of any help!
-Christopher

Вложения

Re: Batches of single-insert statements vs batches of multi-insert statements

От
Thomas Kellerer
Дата:
Vladimir Sitnikov schrieb am 05.06.2016 um 20:33:
> The feature is disabled by default (so everybody can give it a try before we make it default), and we plan to
activateit by default. 
> To activate multi-inserts, you should add reWriteBatchedInserts=true connection parameter.
> You need 9.4.1209-SNAPSHOT version (you can grab that from Maven Central or build from sources).

Is there a reason why that build is not available on https://jdbc.postgresql.org/download.html ?




Re: Re: Batches of single-insert statements vs batches of multi-insert statements

От
Dave Cramer
Дата:
It's a snapshot, not a release


On 5 June 2016 at 16:48, Thomas Kellerer <spam_eater@gmx.net> wrote:
Vladimir Sitnikov schrieb am 05.06.2016 um 20:33:
The feature is disabled by default (so everybody can give it a try before we make it default), and we plan to activate it by default.
To activate multi-inserts, you should add reWriteBatchedInserts=true connection parameter.
You need 9.4.1209-SNAPSHOT version (you can grab that from Maven Central or build from sources).

Is there a reason why that build is not available on https://jdbc.postgresql.org/download.html ?






--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

Re: Batches of single-insert statements vs batches ofmulti-insert statements

От
Vladimir Sitnikov
Дата:
Hi Christopher,

>Could someone educate me as to what happens internally to the JDBC single-insert batch that makes it so different from a multi-insert batch?

There are significant overheads at the database level itself to initialize and cleanup resources required for a "insert values" statement.
This is a known issue for backend developers, however it does not seem to have easy solution.

>If this should not be done in the JDBC driver, then in which area could such handling be done?
>Or have I missed something?

Believe me or not, pgjdbc has already that feature implemented.

The feature is disabled by default (so everybody can give it a try before we make it default), and we plan to activate it by default.
To activate multi-inserts, you should add reWriteBatchedInserts=true connection parameter.
You need 9.4.1209-SNAPSHOT version (you can grab that from Maven Central or build from sources).

We all would be grateful if you could test it and let us know if it works for you.

Vladimir 

Re: Batches of single-insert statements vs batches ofmulti-insert statements

От
Christopher Deckers
Дата:
Hi Vladimir,

Thanks for your answer! It explains well the current situation.

Believe me or not, pgjdbc has already that feature implemented.

Oh good! But it definitely needs more testing and fixing :)
I got: "Batch entry 0 INSERT INTO XXXX".
Next exception: "VALUES lists must all be the same length".

I narrowed down the issue to a simple test case, see attached file.

Funny enough, if I change the name of the table, it seems to work. I used the convention that we have in our production code.
Note that this test case does not produce any error if I remove the "reWriteBatchedInserts=true" parameter.

Please let me know if I can be of any help!
-Christopher

Вложения

Re: Batches of single-insert statements vs batches of multi-insertstatements

От
Thomas Kellerer
Дата:
Vladimir Sitnikov schrieb am 05.06.2016 um 20:33:
> The feature is disabled by default (so everybody can give it a try before we make it default), and we plan to
activateit by default. 
> To activate multi-inserts, you should add reWriteBatchedInserts=true connection parameter.
> You need 9.4.1209-SNAPSHOT version (you can grab that from Maven Central or build from sources).

Is there a reason why that build is not available on https://jdbc.postgresql.org/download.html ?




Re: Re: Batches of single-insert statements vs batches ofmulti-insert statements

От
Dave Cramer
Дата:
It's a snapshot, not a release


On 5 June 2016 at 16:48, Thomas Kellerer <spam_eater@gmx.net> wrote:
Vladimir Sitnikov schrieb am 05.06.2016 um 20:33:
The feature is disabled by default (so everybody can give it a try before we make it default), and we plan to activate it by default.
To activate multi-inserts, you should add reWriteBatchedInserts=true connection parameter.
You need 9.4.1209-SNAPSHOT version (you can grab that from Maven Central or build from sources).

Is there a reason why that build is not available on https://jdbc.postgresql.org/download.html ?






--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
"jingzhi.zhang@outlook.com"
Дата:
Hi,


I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Is there any suggestion? How to deal with the client thread blocking (read for a response message?) and connection leak.


Thanks a lot.


PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
"jingzhi.zhang@outlook.com"
Дата:
Hi,


I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Is there any suggestion? How to deal with the client thread blocking (read for a response message?) and connection leak.


Thanks a lot.


There is a cancel statement, but I presume you want the create index to succeed. You may have to execute the execute() in a background thread to not block  the main thread


On 6 June 2016 at 12:25, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Hi,


I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Is there any suggestion? How to deal with the client thread blocking (read for a response message?) and connection leak.


Thanks a lot.



Re: PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
Vladimir Sitnikov
Дата:
Jingzhi>I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Jingzhi>Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
Jingzhi>At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Jingzhi, Do you have a firewall in between app and the database?
Can it terminate the connection while index is being built?

Dave>You may have to execute the execute() in a background thread to not block  the main thread

Dave, I'm afraid it looks like we need keep-alive messages for such long-running transactions.
There's tcpKeepAlive, however it has no way to set specific timeout value.
It could help to detect "broken connection" at java side, though.

It boils down to plug&pray kind of solution, so everybody should ensure there's no firewall that could kill long-lasting TCP connections.

What if we could teach pgjdbc to send a dummy command once a while?
The drawback is it could fill up buffer at the backend side, however sending a sync message once every 10-15 minutes sounds quite innocent.

Vladimir



On 7 June 2016 at 08:01, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?


Well the problem is we don't know that the connection has failed until the TCP connection has been killed by the O/S.

Vladimir is proposing keep alive messages so that either it won't fail or we will know about it sooner if it does.


 


在 2016年6月7日,19:37,Vladimir Sitnikov <sitnikov.vladimir@gmail.com> 写道:

Jingzhi>I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Jingzhi>Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
Jingzhi>At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Jingzhi, Do you have a firewall in between app and the database?
Can it terminate the connection while index is being built?

Dave>You may have to execute the execute() in a background thread to not block  the main thread

Dave, I'm afraid it looks like we need keep-alive messages for such long-running transactions.
There's tcpKeepAlive, however it has no way to set specific timeout value.
It could help to detect "broken connection" at java side, though.

It boils down to plug&pray kind of solution, so everybody should ensure there's no firewall that could kill long-lasting TCP connections.

What if we could teach pgjdbc to send a dummy command once a while?
The drawback is it could fill up buffer at the backend side, however sending a sync message once every 10-15 minutes sounds quite innocent.

Vladimir


On Tue, Jun 7, 2016 at 8:17 AM, Dave Cramer <pg@fastcrypt.com> wrote:
On 7 June 2016 at 08:01, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?


Well the problem is we don't know that the connection has failed until the TCP connection has been killed by the O/S.

Vladimir is proposing keep alive messages so that either it won't fail or we will know about it sooner if it does.
 
I don't think it's a good idea to flood the server with dummy messages for the purposes of keeping the connection alive. Besides the usual problems of filling up the send buffer, you'd also have to have it done in a separate thread as the usual sender is already blocked waiting for the command response.

This an OS issue, not a driver issue. The correct thing to do is to enable TCP keep alives at the OS level. Here's some info for doing so on Linux: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

The defaults for TCP keep alives are very high (I think two-hours before the first attempt) so it's a good idea to reduce them if you're dealing with long running WAN connections. Also, from personal experience it's usually residential networking devices (i.e. your WiFi router) that drop inactive NATed connections. This is less of an issue in a server-to-server environment in a data center. Either way, bumping up the TCP keep alives works in both situations.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
Vitalii Tymchyshyn
Дата:

Hi.

But do we already use SO_KEEPALIVE on the socket layer?
Also I am not really sure socket was closed on the server end. TS said he could see server backend in the idle state. Also backed server did not fail, so unless there were some network glitch we should have received tcp reset.
BTW: Are there any stateful firewalls or NATs between server in client?

Best regards, Vitalii Tymchyshyn

Вт, 7 черв. 2016 08:17 користувач Dave Cramer <pg@fastcrypt.com> пише:



On 7 June 2016 at 08:01, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?


Well the problem is we don't know that the connection has failed until the TCP connection has been killed by the O/S.

Vladimir is proposing keep alive messages so that either it won't fail or we will know about it sooner if it does.


 


在 2016年6月7日,19:37,Vladimir Sitnikov <sitnikov.vladimir@gmail.com> 写道:

Jingzhi>I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Jingzhi>Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
Jingzhi>At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Jingzhi, Do you have a firewall in between app and the database?
Can it terminate the connection while index is being built?

Dave>You may have to execute the execute() in a background thread to not block  the main thread

Dave, I'm afraid it looks like we need keep-alive messages for such long-running transactions.
There's tcpKeepAlive, however it has no way to set specific timeout value.
It could help to detect "broken connection" at java side, though.

It boils down to plug&pray kind of solution, so everybody should ensure there's no firewall that could kill long-lasting TCP connections.

What if we could teach pgjdbc to send a dummy command once a while?
The drawback is it could fill up buffer at the backend side, however sending a sync message once every 10-15 minutes sounds quite innocent.

Vladimir

Re: PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
Vladimir Sitnikov
Дата:
Vitalii>But do we already use SO_KEEPALIVE on the socket layer?
Vitalii>Also I am not really sure socket was closed on the server end.
Vitalii> TS said he could see server backend in the idle state. Also backed server did not fail, so unless there were some  network glitch we should have received tcp reset.

tcpKeepAlive connection setting is false by default. I think it's better to make "true" the default value.

Sehrope>I don't think it's a good idea to flood the server with dummy messages for the purposes of keeping the connection alive.
Sehrope>Besides the usual problems of filling up the send buffer, you'd also have to have it done in a separate thread
Sehrope>as the usual sender is already blocked waiting for the command response.

Sending a sync message once every 10m will hardly fill up the buffers.
On the other hand, it could make the connection more resilient.

Having several threads for (e.g. one for read and one for write) would resolve "deadlock out of buffers got full" issue, so it is good no matter if we add "ping" there.

Vladimir
There is a cancel statement, but I presume you want the create index to succeed. You may have to execute the execute() in a background thread to not block  the main thread


On 6 June 2016 at 12:25, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Hi,


I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Is there any suggestion? How to deal with the client thread blocking (read for a response message?) and connection leak.


Thanks a lot.



Re: PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
"jingzhi.zhang@outlook.com"
Дата:
Hi Dave,


Thanks:) 
But I need to clarify the question:

Because of the ‘create index’ sql statement execution (via jdbc) last nearly 30 minutes, the execute() method blocked forever.
At pg server side, about 30 minutes later, 'create index’ successfully finished. Then the connection state change to ‘idle’.
From jdbc client side, it seems ‘create index’ have not finished, but actually the operation already finished at server side.
This cause the client thread blocked and connection leak.


在 2016年6月7日,19:21,Dave Cramer <pg@fastcrypt.com> 写道:

There is a cancel statement, but I presume you want the create index to succeed. You may have to execute the execute() in a background thread to not block  the main thread


On 6 June 2016 at 12:25, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Hi,


I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Is there any suggestion? How to deal with the client thread blocking (read for a response message?) and connection leak.


Thanks a lot.




Re: PrepareStatement.execute() blocked because of long time'create index' operation, connection leak

От
Vladimir Sitnikov
Дата:
Jingzhi>I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Jingzhi>Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
Jingzhi>At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Jingzhi, Do you have a firewall in between app and the database?
Can it terminate the connection while index is being built?

Dave>You may have to execute the execute() in a background thread to not block  the main thread

Dave, I'm afraid it looks like we need keep-alive messages for such long-running transactions.
There's tcpKeepAlive, however it has no way to set specific timeout value.
It could help to detect "broken connection" at java side, though.

It boils down to plug&pray kind of solution, so everybody should ensure there's no firewall that could kill long-lasting TCP connections.

What if we could teach pgjdbc to send a dummy command once a while?
The drawback is it could fill up buffer at the backend side, however sending a sync message once every 10-15 minutes sounds quite innocent.

Vladimir

Re: PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
"jingzhi.zhang@outlook.com"
Дата:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?



在 2016年6月7日,19:37,Vladimir Sitnikov <sitnikov.vladimir@gmail.com> 写道:

Jingzhi>I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Jingzhi>Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
Jingzhi>At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Jingzhi, Do you have a firewall in between app and the database?
Can it terminate the connection while index is being built?

Dave>You may have to execute the execute() in a background thread to not block  the main thread

Dave, I'm afraid it looks like we need keep-alive messages for such long-running transactions.
There's tcpKeepAlive, however it has no way to set specific timeout value.
It could help to detect "broken connection" at java side, though.

It boils down to plug&pray kind of solution, so everybody should ensure there's no firewall that could kill long-lasting TCP connections.

What if we could teach pgjdbc to send a dummy command once a while?
The drawback is it could fill up buffer at the backend side, however sending a sync message once every 10-15 minutes sounds quite innocent.

Vladimir




On 7 June 2016 at 08:01, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?


Well the problem is we don't know that the connection has failed until the TCP connection has been killed by the O/S.

Vladimir is proposing keep alive messages so that either it won't fail or we will know about it sooner if it does.


 


在 2016年6月7日,19:37,Vladimir Sitnikov <sitnikov.vladimir@gmail.com> 写道:

Jingzhi>I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Jingzhi>Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
Jingzhi>At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Jingzhi, Do you have a firewall in between app and the database?
Can it terminate the connection while index is being built?

Dave>You may have to execute the execute() in a background thread to not block  the main thread

Dave, I'm afraid it looks like we need keep-alive messages for such long-running transactions.
There's tcpKeepAlive, however it has no way to set specific timeout value.
It could help to detect "broken connection" at java side, though.

It boils down to plug&pray kind of solution, so everybody should ensure there's no firewall that could kill long-lasting TCP connections.

What if we could teach pgjdbc to send a dummy command once a while?
The drawback is it could fill up buffer at the backend side, however sending a sync message once every 10-15 minutes sounds quite innocent.

Vladimir


On Tue, Jun 7, 2016 at 8:17 AM, Dave Cramer <pg@fastcrypt.com> wrote:
On 7 June 2016 at 08:01, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?


Well the problem is we don't know that the connection has failed until the TCP connection has been killed by the O/S.

Vladimir is proposing keep alive messages so that either it won't fail or we will know about it sooner if it does.
 
I don't think it's a good idea to flood the server with dummy messages for the purposes of keeping the connection alive. Besides the usual problems of filling up the send buffer, you'd also have to have it done in a separate thread as the usual sender is already blocked waiting for the command response.

This an OS issue, not a driver issue. The correct thing to do is to enable TCP keep alives at the OS level. Here's some info for doing so on Linux: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

The defaults for TCP keep alives are very high (I think two-hours before the first attempt) so it's a good idea to reduce them if you're dealing with long running WAN connections. Also, from personal experience it's usually residential networking devices (i.e. your WiFi router) that drop inactive NATed connections. This is less of an issue in a server-to-server environment in a data center. Either way, bumping up the TCP keep alives works in both situations.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: PrepareStatement.execute() blocked because of long time'create index' operation, connection leak

От
Vitalii Tymchyshyn
Дата:

Hi.

But do we already use SO_KEEPALIVE on the socket layer?
Also I am not really sure socket was closed on the server end. TS said he could see server backend in the idle state. Also backed server did not fail, so unless there were some network glitch we should have received tcp reset.
BTW: Are there any stateful firewalls or NATs between server in client?

Best regards, Vitalii Tymchyshyn

Вт, 7 черв. 2016 08:17 користувач Dave Cramer <pg@fastcrypt.com> пише:



On 7 June 2016 at 08:01, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?


Well the problem is we don't know that the connection has failed until the TCP connection has been killed by the O/S.

Vladimir is proposing keep alive messages so that either it won't fail or we will know about it sooner if it does.


 


在 2016年6月7日,19:37,Vladimir Sitnikov <sitnikov.vladimir@gmail.com> 写道:

Jingzhi>I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Jingzhi>Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
Jingzhi>At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Jingzhi, Do you have a firewall in between app and the database?
Can it terminate the connection while index is being built?

Dave>You may have to execute the execute() in a background thread to not block  the main thread

Dave, I'm afraid it looks like we need keep-alive messages for such long-running transactions.
There's tcpKeepAlive, however it has no way to set specific timeout value.
It could help to detect "broken connection" at java side, though.

It boils down to plug&pray kind of solution, so everybody should ensure there's no firewall that could kill long-lasting TCP connections.

What if we could teach pgjdbc to send a dummy command once a while?
The drawback is it could fill up buffer at the backend side, however sending a sync message once every 10-15 minutes sounds quite innocent.

Vladimir

Re: PrepareStatement.execute() blocked because of long time'create index' operation, connection leak

От
Vladimir Sitnikov
Дата:
Vitalii>But do we already use SO_KEEPALIVE on the socket layer?
Vitalii>Also I am not really sure socket was closed on the server end.
Vitalii> TS said he could see server backend in the idle state. Also backed server did not fail, so unless there were some  network glitch we should have received tcp reset.

tcpKeepAlive connection setting is false by default. I think it's better to make "true" the default value.

Sehrope>I don't think it's a good idea to flood the server with dummy messages for the purposes of keeping the connection alive.
Sehrope>Besides the usual problems of filling up the send buffer, you'd also have to have it done in a separate thread
Sehrope>as the usual sender is already blocked waiting for the command response.

Sending a sync message once every 10m will hardly fill up the buffers.
On the other hand, it could make the connection more resilient.

Having several threads for (e.g. one for read and one for write) would resolve "deadlock out of buffers got full" issue, so it is good no matter if we add "ping" there.

Vladimir

Re: PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
"jingzhi.zhang@outlook.com"
Дата:
I learnt a lot. Many thanks.

Questions:
1. How much time later a connection (jdbc) will change from ‘alive’ to ‘idle’ state if no data exchange between client and server?
And, How to configure at OS level? (PG using OS level connection state to determine the jdbc connection state?)
2. Is there any solutions for jdbc client  side, such as setting socket timeout?


在 2016年6月7日,20:31,Sehrope Sarkuni <sehrope@jackdb.com> 写道:

On Tue, Jun 7, 2016 at 8:17 AM, Dave Cramer <pg@fastcrypt.com> wrote:
On 7 June 2016 at 08:01, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?


Well the problem is we don't know that the connection has failed until the TCP connection has been killed by the O/S.

Vladimir is proposing keep alive messages so that either it won't fail or we will know about it sooner if it does.
 
I don't think it's a good idea to flood the server with dummy messages for the purposes of keeping the connection alive. Besides the usual problems of filling up the send buffer, you'd also have to have it done in a separate thread as the usual sender is already blocked waiting for the command response.

This an OS issue, not a driver issue. The correct thing to do is to enable TCP keep alives at the OS level. Here's some info for doing so on Linux: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

The defaults for TCP keep alives are very high (I think two-hours before the first attempt) so it's a good idea to reduce them if you're dealing with long running WAN connections. Also, from personal experience it's usually residential networking devices (i.e. your WiFi router) that drop inactive NATed connections. This is less of an issue in a server-to-server environment in a data center. Either way, bumping up the TCP keep alives works in both situations.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: Batches of single-insert statements vs batches of multi-insert statements

От
Jeremy Whiting
Дата:
Hi Christopher,
 I suggest you avoid trying to hard coding data values in the statement sql. Instead use parameter place-holders (?) and use the API to bind the value. You will find it works much better for you. Like this ....

String insert = "INSERT INTO T_AbCd (T_AbCd_ID, SomeDate, ASmallInt) VALUES (?, ?, ?)";
PreparedStatement pst = conn.prepareStatement(insert, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
pst.setLong(1, 24202712L);
pst.setDate(2, new java.sql.Date(2014-1900, 3-1, 21));
pst.setInt(3, 0);
pst.addBatch();
pst.setLong(1, 24202713L);
pst.setDate(2, new java.sql.Date(2014-1900, 3-1, 22));
pst.setInt(3, 0);
pst.addBatch();
pst.executeBatch();



 The reason is the batched re-write feature builds dynamically the sql issued to the back end. Using parameter place-holders. So you can see hard coding data values isn't going to work without some horribly complex statement parsing added to the driver. This time parsing data and recognizing all data values and type permutations. yikes

 I'd say you have these options. If you want to enable batched re-writing.

a) Change your code using the same style of coding as the above example. Re-deploying to production.
b) If re-deploy to production isn't possible in the short term try using Byteman [1]. Byteman will unobtrusively mend your existing code. See the attached Byteman rule. The attached rule fixes your test class.
c) Provide a patch to add implement sql statement parsing support for hard coded data. I see you have already started to try this [2].

Regards,
Jeremy

[1] http://byteman.jboss.org/
[2] https://github.com/pgjdbc/pgjdbc/pull/580

On 05/06/16 21:39, Christopher Deckers wrote:
Hi Vladimir,

Thanks for your answer! It explains well the current situation.

Believe me or not, pgjdbc has already that feature implemented.

Oh good! But it definitely needs more testing and fixing :)
I got: "Batch entry 0 INSERT INTO XXXX".
Next exception: "VALUES lists must all be the same length".

I narrowed down the issue to a simple test case, see attached file.

Funny enough, if I change the name of the table, it seems to work. I used the convention that we have in our production code.
Note that this test case does not produce any error if I remove the "reWriteBatchedInserts=true" parameter.

Please let me know if I can be of any help!
-Christopher





-- 
Jeremy Whiting
Senior Software Engineer, Middleware Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Directors:Michael Cunningham (US), Michael O'Neill(Ireland), Eric Shander (US)
Вложения

Re: Batches of single-insert statements vs batches of multi-insert statements

От
Christopher Deckers
Дата:
Hi Jeremy,
 
 I suggest you avoid trying to hard coding data values in the statement sql.

The resulting SQL is not hand-crafted, and is built through all sorts of layers depending on various rules. The end result that I exposed does not tell how it was built.

Note that from a theoretical point of view, if I were to add parameters for fixed values, then I would send through the wire as many times this parameter, while if it is part of the statement the driver can decide to only send the dynamic parameters. Also, we do have external multi-insert logic for that specific part of the code, and it also improves speed with jTDS; but SQL Server is limited to 2000 dynamic parameters, so if we waste those parameters with static values, less batches can go through at once.

In any case, it must be supported by the batch rewrite feature, whether people like the style or not :)
 
c) Provide a patch to add implement sql statement parsing support for hard coded data. I see you have already started to try this [2].

In fact, the latest version of my patch hooks into the existing query parsing, and just has to record the start and end of the values block. That combined with the positions of the parameters made the latest version reliable.
That being said, there is currently no safety in case we put too many batches overflowing Postgres limits, which is what I will try to work on when I get some free time.

Cheers,
-Christopher

Re: Batches of single-insert statements vs batches ofmulti-insert statements

От
Jeremy Whiting
Дата:
Hi Christopher,
 I suggest you avoid trying to hard coding data values in the statement sql. Instead use parameter place-holders (?) and use the API to bind the value. You will find it works much better for you. Like this ....

String insert = "INSERT INTO T_AbCd (T_AbCd_ID, SomeDate, ASmallInt) VALUES (?, ?, ?)";
PreparedStatement pst = conn.prepareStatement(insert, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
pst.setLong(1, 24202712L);
pst.setDate(2, new java.sql.Date(2014-1900, 3-1, 21));
pst.setInt(3, 0);
pst.addBatch();
pst.setLong(1, 24202713L);
pst.setDate(2, new java.sql.Date(2014-1900, 3-1, 22));
pst.setInt(3, 0);
pst.addBatch();
pst.executeBatch();



 The reason is the batched re-write feature builds dynamically the sql issued to the back end. Using parameter place-holders. So you can see hard coding data values isn't going to work without some horribly complex statement parsing added to the driver. This time parsing data and recognizing all data values and type permutations. yikes

 I'd say you have these options. If you want to enable batched re-writing.

a) Change your code using the same style of coding as the above example. Re-deploying to production.
b) If re-deploy to production isn't possible in the short term try using Byteman [1]. Byteman will unobtrusively mend your existing code. See the attached Byteman rule. The attached rule fixes your test class.
c) Provide a patch to add implement sql statement parsing support for hard coded data. I see you have already started to try this [2].

Regards,
Jeremy

[1] http://byteman.jboss.org/
[2] https://github.com/pgjdbc/pgjdbc/pull/580

On 05/06/16 21:39, Christopher Deckers wrote:
Hi Vladimir,

Thanks for your answer! It explains well the current situation.

Believe me or not, pgjdbc has already that feature implemented.

Oh good! But it definitely needs more testing and fixing :)
I got: "Batch entry 0 INSERT INTO XXXX".
Next exception: "VALUES lists must all be the same length".

I narrowed down the issue to a simple test case, see attached file.

Funny enough, if I change the name of the table, it seems to work. I used the convention that we have in our production code.
Note that this test case does not produce any error if I remove the "reWriteBatchedInserts=true" parameter.

Please let me know if I can be of any help!
-Christopher





-- 
Jeremy Whiting
Senior Software Engineer, Middleware Performance Team
Red Hat

------------------------------------------------------------
Registered Address: Red Hat UK Ltd, 64 Baker Street, 4th Floor, London. W1U 7DF. United Kingdom.
Registered in England and Wales under Company Registration No. 03798903. Directors: Directors:Michael Cunningham (US), Michael O'Neill(Ireland), Eric Shander (US)
Вложения

Re: Batches of single-insert statements vs batches ofmulti-insert statements

От
Christopher Deckers
Дата:
Hi Jeremy,
 
 I suggest you avoid trying to hard coding data values in the statement sql.

The resulting SQL is not hand-crafted, and is built through all sorts of layers depending on various rules. The end result that I exposed does not tell how it was built.

Note that from a theoretical point of view, if I were to add parameters for fixed values, then I would send through the wire as many times this parameter, while if it is part of the statement the driver can decide to only send the dynamic parameters. Also, we do have external multi-insert logic for that specific part of the code, and it also improves speed with jTDS; but SQL Server is limited to 2000 dynamic parameters, so if we waste those parameters with static values, less batches can go through at once.

In any case, it must be supported by the batch rewrite feature, whether people like the style or not :)
 
c) Provide a patch to add implement sql statement parsing support for hard coded data. I see you have already started to try this [2].

In fact, the latest version of my patch hooks into the existing query parsing, and just has to record the start and end of the values block. That combined with the positions of the parameters made the latest version reliable.
That being said, there is currently no safety in case we put too many batches overflowing Postgres limits, which is what I will try to work on when I get some free time.

Cheers,
-Christopher

Re: PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
"jingzhi.zhang@outlook.com"
Дата:
Hi Dave,


Thanks:) 
But I need to clarify the question:

Because of the ‘create index’ sql statement execution (via jdbc) last nearly 30 minutes, the execute() method blocked forever.
At pg server side, about 30 minutes later, 'create index’ successfully finished. Then the connection state change to ‘idle’.
From jdbc client side, it seems ‘create index’ have not finished, but actually the operation already finished at server side.
This cause the client thread blocked and connection leak.


在 2016年6月7日,19:21,Dave Cramer <pg@fastcrypt.com> 写道:

There is a cancel statement, but I presume you want the create index to succeed. You may have to execute the execute() in a background thread to not block  the main thread


On 6 June 2016 at 12:25, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Hi,


I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Is there any suggestion? How to deal with the client thread blocking (read for a response message?) and connection leak.


Thanks a lot.




Re: PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
"jingzhi.zhang@outlook.com"
Дата:
I learnt a lot. Many thanks.

Questions:
1. How much time later a connection (jdbc) will change from ‘alive’ to ‘idle’ state if no data exchange between client and server?
And, How to configure at OS level? (PG using OS level connection state to determine the jdbc connection state?)
2. Is there any solutions for jdbc client  side, such as setting socket timeout?


在 2016年6月7日,20:31,Sehrope Sarkuni <sehrope@jackdb.com> 写道:

On Tue, Jun 7, 2016 at 8:17 AM, Dave Cramer <pg@fastcrypt.com> wrote:
On 7 June 2016 at 08:01, jingzhi.zhang@outlook.com <jingzhi.zhang@outlook.com> wrote:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?


Well the problem is we don't know that the connection has failed until the TCP connection has been killed by the O/S.

Vladimir is proposing keep alive messages so that either it won't fail or we will know about it sooner if it does.
 
I don't think it's a good idea to flood the server with dummy messages for the purposes of keeping the connection alive. Besides the usual problems of filling up the send buffer, you'd also have to have it done in a separate thread as the usual sender is already blocked waiting for the command response.

This an OS issue, not a driver issue. The correct thing to do is to enable TCP keep alives at the OS level. Here's some info for doing so on Linux: http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

The defaults for TCP keep alives are very high (I think two-hours before the first attempt) so it's a good idea to reduce them if you're dealing with long running WAN connections. Also, from personal experience it's usually residential networking devices (i.e. your WiFi router) that drop inactive NATed connections. This is less of an issue in a server-to-server environment in a data center. Either way, bumping up the TCP keep alives works in both situations.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: PrepareStatement.execute() blocked because of long time 'create index' operation, connection leak

От
"jingzhi.zhang@outlook.com"
Дата:
Vladimir,

Thanks :)

I think there’s no firewall in our test environments. However, there’s  network control software at client machine.
The network control software occasionally lost the connection.

My question is, if the network connection lost, then jdbc client should return an IOException immediately? 
OR  blocked forever until TCP connection killed by operating system?



在 2016年6月7日,19:37,Vladimir Sitnikov <sitnikov.vladimir@gmail.com> 写道:

Jingzhi>I use jdbc  to create index on a table, the sql of 'create index' cost about 30 minutes.
Jingzhi>Finally I found, the jdbc client thread blocked at PrepareStatement.execute(), not finished.
Jingzhi>At server side, query pg_stat_activity, the connection state already be ‘idle’ (index creation finished).

Jingzhi, Do you have a firewall in between app and the database?
Can it terminate the connection while index is being built?

Dave>You may have to execute the execute() in a background thread to not block  the main thread

Dave, I'm afraid it looks like we need keep-alive messages for such long-running transactions.
There's tcpKeepAlive, however it has no way to set specific timeout value.
It could help to detect "broken connection" at java side, though.

It boils down to plug&pray kind of solution, so everybody should ensure there's no firewall that could kill long-lasting TCP connections.

What if we could teach pgjdbc to send a dummy command once a while?
The drawback is it could fill up buffer at the backend side, however sending a sync message once every 10-15 minutes sounds quite innocent.

Vladimir