Обсуждение: BUG #15700: PG 10 vs. 11: Large increase in memory usage when selecting BYTEA data (maybe memory leak)

Поиск
Список
Период
Сортировка

BUG #15700: PG 10 vs. 11: Large increase in memory usage when selecting BYTEA data (maybe memory leak)

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15700
Logged by:          Matthias Otterbach
Email address:      mo@otterbach.eu
PostgreSQL version: 11.2
Operating system:   Ubuntu 18.04.2 LTS
Description:

I am currently testing an application for which I previously used PostgreSQL
10 with the current PostgreSQL 11.2 release. During the tests I experienced
out of memory errors of my PostgreSQL database which I could not explain
(actually one session runs out of memory which causes the server to
terminate all connections). They seem to affect tables containing BYTEA data
(many rows, some have no content at all in the BYTEA column, some use up to
25 MB).

I was able to narrow them down to a small example query where I also
experienced a much higher memory usage with PostgreSQL 11 compared to
previous versions (also tested with 10.7).

For my comparison I worked on a local VM with only 4 GB memory configured, a
plain Ubuntu 18.04.2 installation (including all updates as of last week)
and the current PostgreSQL 10.7 resp. 11.2 installation installed from
http://apt.postgresql.org/pub/repos/apt/dists/bionic-pgdg/. Actually almost
all configuration options should be pretty much the default (at least for my
testing VM), e.g. work_mem = 4MB, shared_buffers = 128MB. I created a table
containing a large amount of BYTEA data using these statements:

> create table public.a (b bytea);
> insert into public.a select repeat('0', 1024 * 1024 * 100)::bytea from
generate_series(1, 300);
> select pg_size_pretty(sum(length(b))) from public.a;

In total I now have ~ 29 GB of data (actually 300 rows à 100 MB) and start
selecting all data using a small Java program (current JDBC driver
42.2.5).

String sql = "SELECT b FROM public.a";
try (Connection connection =
DriverManager.getConnection("jdbc:postgresql://127.0.0.1:5432/postgres?currentSchema=public",
"username", "password")) {
  connection.setAutoCommit(false);
  try (PreparedStatement ps = connection.prepareStatement(sql)) {
  ps.setFetchSize(5); // we do not want to run out of Java heap space
  try (ResultSet rs = ps.executeQuery()) {
    int i = 0;
    while (rs.next()) {
      // just loop over all data, get the data and do something with it
(actually we print a line every 10 rows containing the length, the other
argument is never true with my test data
      i++;
      byte[] b = rs.getBytes(1);
      if (i % 10 == 0 || b == null || b.length <= i) {
        System.err.println("Row " + i + ": " + (b != null ? b.length :
null));
      }
    }
  }
}

The Java program actually just executes "SELECT b FROM public.a" and keeps
streaming more rows doing something with the content, also the execution
plan for my query is fairly simple - actually it seems to be just a
sequential scan (with both versions).

With PostgreSQL 10.7 the program went through fine (with plenty of free
memory on my database VM, actually including the OS there was never used
more than 1 GB on the VM). With PostgreSQL 11.2 the memory of my postgres
process (pid of my session) keeps increasing and finally crashes after I
fetched only about 8 GB of the data (about 80 rows):

2019-03-13 10:05:03.386 UTC [1303] LOG:  database system is ready to accept
connections
2019-03-13 10:05:03.927 UTC [1311] [unknown]@[unknown] LOG:  incomplete
startup packet
TopMemoryContext: 67424 total in 5 blocks; 12656 free (10 chunks); 54768
used
  TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096
used
  TopTransactionContext: 8192 total in 1 blocks; 7744 free (1 chunks); 448
used
  pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 1456 free
(0 chunks); 6736 used
  RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296
used
  MessageContext: 8192 total in 1 blocks; 6752 free (1 chunks); 1440 used
  Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632
used
  smgr relation table: 16384 total in 2 blocks; 4600 free (2 chunks); 11784
used
  TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks);
256 used
  Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
  TopPortalContext: 8192 total in 1 blocks; 7392 free (0 chunks); 800 used
    PortalContext: 1024 total in 1 blocks; 552 free (0 chunks); 472 used:
C_1
      ExecutorState: 4294976384 total in 17 blocks; 4080 free (0 chunks);
4294972304 used
        printtup: 314581120 total in 3 blocks; 7936 free (8 chunks);
314573184 used
        ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256
used
    PortalContext: 1024 total in 1 blocks; 744 free (1 chunks); 280 used: 
  Relcache by OID: 16384 total in 2 blocks; 3512 free (2 chunks); 12872
used
  CacheMemoryContext: 524288 total in 7 blocks; 133056 free (5 chunks);
391232 used
    index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used:
pg_toast_16384_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_index_indrelid_index
    CachedPlan: 2048 total in 2 blocks; 200 free (0 chunks); 1848 used:
SELECT b FROM public.a
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_statistic_ext_relid_index
    CachedPlanSource: 4096 total in 3 blocks; 2296 free (5 chunks); 1800
used: SELECT b FROM public.a
      unnamed prepared statement: 8192 total in 1 blocks; 3664 free (10
chunks); 4528 used
    CachedPlan: 1024 total in 1 blocks; 264 free (0 chunks); 760 used:
BEGIN
    index info: 2048 total in 2 blocks; 728 free (2 chunks); 1320 used:
pg_db_role_setting_databaseid_rol_index
    index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used:
pg_opclass_am_name_nsp_index
    index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used:
pg_foreign_data_wrapper_name_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_enum_oid_index
    index info: 2048 total in 2 blocks; 680 free (2 chunks); 1368 used:
pg_class_relname_nsp_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_foreign_server_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_publication_pubname_index
    index info: 2048 total in 2 blocks; 592 free (3 chunks); 1456 used:
pg_statistic_relid_att_inh_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_cast_source_target_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_language_name_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_transform_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_collation_oid_index
    index info: 3072 total in 2 blocks; 1216 free (3 chunks); 1856 used:
pg_amop_fam_strat_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_index_indexrelid_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_ts_template_tmplname_index
    index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used:
pg_ts_config_map_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_opclass_oid_index
    index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used:
pg_foreign_data_wrapper_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_event_trigger_evtname_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_statistic_ext_name_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_publication_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_ts_dict_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_event_trigger_oid_index
    index info: 3072 total in 2 blocks; 1216 free (3 chunks); 1856 used:
pg_conversion_default_index
    index info: 3072 total in 2 blocks; 1216 free (3 chunks); 1856 used:
pg_operator_oprname_l_r_n_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_trigger_tgrelid_tgname_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_enum_typid_label_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_ts_config_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_user_mapping_oid_index
    index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used:
pg_opfamily_am_name_nsp_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_foreign_table_relid_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_type_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_aggregate_fnoid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_constraint_oid_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_rewrite_rel_rulename_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_ts_parser_prsname_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_ts_config_cfgname_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_ts_parser_oid_index
    index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used:
pg_publication_rel_prrelid_prpubid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_operator_oid_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_namespace_nspname_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_ts_template_oid_index
    index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used:
pg_amop_opr_fam_index
    index info: 2048 total in 2 blocks; 672 free (3 chunks); 1376 used:
pg_default_acl_role_nsp_obj_index
    index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used:
pg_collation_name_enc_nsp_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_publication_rel_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_range_rngtypid_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_ts_dict_dictname_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_type_typname_nsp_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_opfamily_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_statistic_ext_oid_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_class_oid_index
    index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used:
pg_proc_proname_args_nsp_index
    index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used:
pg_partitioned_table_partrelid_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_transform_type_lang_index
    index info: 2048 total in 2 blocks; 680 free (2 chunks); 1368 used:
pg_attribute_relid_attnum_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_proc_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_language_oid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_namespace_oid_index
    index info: 3072 total in 2 blocks; 1136 free (2 chunks); 1936 used:
pg_amproc_fam_proc_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_foreign_server_name_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_attribute_relid_attnam_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_conversion_oid_index
    index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used:
pg_user_mapping_user_server_index
    index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used:
pg_subscription_rel_srrelid_srsubid_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_sequence_seqrelid_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_conversion_name_nsp_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_authid_oid_index
    index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used:
pg_auth_members_member_role_index
    index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used:
pg_subscription_oid_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_tablespace_oid_index
    index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used:
pg_shseclabel_object_index
    index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used:
pg_replication_origin_roname_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_database_datname_index
    index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used:
pg_subscription_subname_index
    index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used:
pg_replication_origin_roiident_index
    index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used:
pg_auth_members_role_member_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_database_oid_index
    index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used:
pg_authid_rolname_index
  WAL record construction: 49768 total in 2 blocks; 6368 free (0 chunks);
43400 used
  PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks); 5568 used
  MdSmgr: 8192 total in 1 blocks; 7720 free (0 chunks); 472 used
  LOCALLOCK hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used
  Timezones: 104120 total in 2 blocks; 2624 free (0 chunks); 101496 used
  ErrorContext: 8192 total in 1 blocks; 7936 free (3 chunks); 256 used
Grand total: 4610628736 bytes in 192 blocks; 315312 free (137 chunks);
4610313424 used
2019-03-13 10:11:54.628 UTC [1350] postgres@postgres ERROR:  out of memory
2019-03-13 10:11:54.628 UTC [1350] postgres@postgres DETAIL:  Failed on
request of size 268435456 in memory context "ExecutorState".
2019-03-13 10:11:54.628 UTC [1350] postgres@postgres STATEMENT:  SELECT b
FROM public.a

I know that I'm also using the JDBC driver here which of course could also
cause the bug, it might be a JDBC driver bug for which I should report a bug
elsewhere. But as I used the same driver to test with PostgreSQL 10.7 and
11.2 I actually suspect a change in PostgreSQL itself and not the JDBC
driver which causes this memory usage. I also suspect a bug in PostgreSQL
itself because it is the database which runs out of memory (which in my
opinion should not happen if I only have one connection and configured
work_mem pretty low).


PG Bug reporting form <noreply@postgresql.org> writes:
> [ apparent backend memory leak when processing wide rows ]

FWIW, I tried to reproduce this without success, on both HEAD and v11
branch tip.  It's possible that this is a bug somebody fixed since
11.2, but I see no mention of relevant memory leaks in the commit log.
Still, it's hard to interpret this:

>       ExecutorState: 4294976384 total in 17 blocks; 4080 free (0 chunks);
> 4294972304 used

as anything but a leak in ExecutorState.  For comparison, when I try
this same example here, I get a fairly steady value of

      ExecutorState: 134225984 total in 2 blocks; 5144 free (1 chunks); 134220840 used

> I know that I'm also using the JDBC driver here which of course could also
> cause the bug, it might be a JDBC driver bug for which I should report a bug
> elsewhere.

I confess that I'm not actually trying this with JDBC, but with
libpq (via psql and pgbench).  One could imagine that JDBC is
presenting the query sufficiently differently to cause different
backend behavior, say by asking for binary not text output ---
but I tried that, no change.

I think there must be some other moving parts you haven't mentioned.
Maybe you have some PG extension(s) installed?

            regards, tom lane


On Mon, Mar 18, 2019 at 7:01 AM PG Bug reporting form <noreply@postgresql.org> wrote:
 
The Java program actually just executes "SELECT b FROM public.a" and keeps
streaming more rows doing something with the content, also the execution
plan for my query is fairly simple - actually it seems to be just a
sequential scan (with both versions).

Could you include as an attachment a complete compilable java program?  Including the imports, the main public class definition, etc.

Cheers,

Jeff

Re: BUG #15700: PG 10 vs. 11: Large increase in memory usage whenselecting BYTEA data (maybe memory leak)

От
Matthias Otterbach
Дата:
Dear Tom Lane, dear pgsql-bugs list,

thanks for your reply and attempt to reproduce my memory issue.

Am 2019-03-18 17:33, schrieb Tom Lane:
> PG Bug reporting form <noreply@postgresql.org> writes:
> 
>> I know that I'm also using the JDBC driver here which of course could 
>> also
>> cause the bug, it might be a JDBC driver bug for which I should report 
>> a bug
>> elsewhere.
> 
> I confess that I'm not actually trying this with JDBC, but with
> libpq (via psql and pgbench).  One could imagine that JDBC is
> presenting the query sufficiently differently to cause different
> backend behavior, say by asking for binary not text output ---
> but I tried that, no change.

Can I do anything to figure out how the JDBC driver presents the query? 
Is there any information available if I increase a log level or enable 
any debugging output in the PostgreSQL server?

I tried increasing log levels a little bit, but actually I do not know 
if it gives any more information:

log_min_messages = debug5
log_min_error_statement = debug5

debug_print_parse = on
debug_print_rewritten = on
debug_print_plan = on
debug_pretty_print = on
log_connections = on
log_disconnections = on
log_duration = on
log_error_verbosity = verbose

You can find an extract from the logfile at 
https://pastebin.com/5eB0t0uS (filtered for my process id which caused 
the crash even though there were not many other processes as I use this 
virtual machine solely for this test query).

> I think there must be some other moving parts you haven't mentioned.
> Maybe you have some PG extension(s) installed?

Actually I hope not, that is why I tried to reproduce the problem with a 
vanilla Ubuntu and PostgreSQL installation in a virtual machine - to 
make sure that there are no other extensions or anything influence my 
issue. Actually "postgres=# select * from pg_extension;" returns:

  extname | extowner | extnamespace | extrelocatable | extversion | 
extconfig | extcondition
---------+----------+--------------+----------------+------------+-----------+--------------
  plpgsql |       10 |           11 | f              | 1.0        |       
     |
(1 row)

Best regards, Matthias Otterbach


Re: BUG #15700: PG 10 vs. 11: Large increase in memory usage whenselecting BYTEA data (maybe memory leak)

От
Matthias Otterbach
Дата:
Dear Jeff, dear pgsql-bugs list,

thanks for your reply.

Am 2019-03-18 19:15, schrieb Jeff Janes:
> On Mon, Mar 18, 2019 at 7:01 AM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> 
>> The Java program actually just executes "SELECT b FROM public.a" and
>> keeps
>> streaming more rows doing something with the content, also the
>> execution
>> plan for my query is fairly simple - actually it seems to be just a
>> sequential scan (with both versions).
> 
> Could you include as an attachment a complete compilable java program?
>  Including the imports, the main public class definition, etc.

Of course that is possible, you find the program attached.

You should be able to compile and execute it using the following command 
lines (maybe change the host, username and password before compilation 
as they are unfortunately hard-coded currently):

- Compilation: /your/path/to/javac TestMemoryIssue.java
- Execution: /your/path/to/java -cp 
"/your/path/to/postgresql-42.2.5.jar;." TestMemoryIssue

If it matters, I used

openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)

for compilation and execution of the program. But as the same program 
worked against a PG 10 database I suppose that the Java version should 
not matter.

Best regards,
Matthias Otterbach
Вложения
On Mon, Mar 18, 2019 at 2:26 PM Matthias Otterbach <mo@otterbach.eu> wrote:
>
> I confess that I'm not actually trying this with JDBC, but with
> libpq (via psql and pgbench).  One could imagine that JDBC is
> presenting the query sufficiently differently to cause different
> backend behavior, say by asking for binary not text output ---
> but I tried that, no change.

Can I do anything to figure out how the JDBC driver presents the query?

You can set log_statement to 'all' to get a view of the statements issued.

Doing that, I can reproduce the problem, and it seems to be using an implicitly declared cursor (which I have not run into before).  That might ring a bell for Tom.

1600 [unknown] BEGIN 00000 2019-03-18 14:36:00.120 EDT LOG:  execute S_1: BEGIN
1600 [unknown] SELECT 00000 2019-03-18 14:36:00.130 EDT LOG:  execute S_2/C_3: SELECT b FROM public.a
1600 [unknown] SELECT 00000 2019-03-18 14:36:00.785 EDT LOG:  execute fetch from S_2/C_3: SELECT b FROM public.a
1600 [unknown] SELECT 00000 2019-03-18 14:36:01.359 EDT LOG:  execute fetch from S_2/C_3: SELECT b FROM public.a
1600 [unknown] SELECT 00000 2019-03-18 14:36:01.948 EDT LOG:  execute fetch from S_2/C_3: SELECT b FROM public.a
1600 [unknown] SELECT 00000 2019-03-18 14:36:02.451 EDT LOG:  execute fetch from S_2/C_3: SELECT b FROM public.a
1600 [unknown] SELECT 00000 2019-03-18 14:36:03.000 EDT LOG:  execute fetch from S_2/C_3: SELECT b FROM public.a
1600 [unknown] SELECT 00000 2019-03-18 14:36:03.483 EDT LOG:  execute fetch from S_2/C_3: SELECT b FROM public.a
1600 [unknown] SELECT 00000 2019-03-18 14:36:04.029 EDT LOG:  execute fetch from S_2/C_3: SELECT b FROM public.a
....

Cheers,

Jeff
Jeff Janes <jeff.janes@gmail.com> writes:
> Doing that, I can reproduce the problem, and it seems to be using an
> implicitly declared cursor (which I have not run into before).

Hm, I'd tried a manually declared cursor w/ FETCH commands, but apparently
this way is different.  Thanks for the confirmation.

            regards, tom lane


On Mon, Mar 18, 2019 at 2:50 PM Jeff Janes <jeff.janes@gmail.com> wrote:

Doing that, I can reproduce the problem, and it seems to be using an implicitly declared cursor (which I have not run into before).


The culprit is f2dec34e19d3969ddd6.  Also, it doesn't depend on bytea, it leaks for text as well.

I looked at the patch and nothing in jumps out to me as causing a leak.

commit f2dec34e19d3969ddd616e671fe9a7b968bec812
Author: Andres Freund <andres@anarazel.de>
Date:   Wed Oct 11 16:26:35 2017 -0700

    Use one stringbuffer for all rows printed in printtup.c.

Cheers,

Jeff
Hi,

On 2019-03-18 16:44:02 -0400, Jeff Janes wrote:
> On Mon, Mar 18, 2019 at 2:50 PM Jeff Janes <jeff.janes@gmail.com> wrote:
> 
> >
> > Doing that, I can reproduce the problem, and it seems to be using an
> > implicitly declared cursor (which I have not run into before).
> >
> >
> The culprit is f2dec34e19d3969ddd6.  Also, it doesn't depend on bytea, it
> leaks for text as well.
> 
> I looked at the patch and nothing in jumps out to me as causing a leak.
> 
> commit f2dec34e19d3969ddd616e671fe9a7b968bec812
> Author: Andres Freund <andres@anarazel.de>
> Date:   Wed Oct 11 16:26:35 2017 -0700
> 
>     Use one stringbuffer for all rows printed in printtup.c.

Thanks for figuring that out. On a quick scan I don't see anything
obvious. Let me try this out.

- Andres


Jeff Janes <jeff.janes@gmail.com> writes:
> The culprit is f2dec34e19d3969ddd6.

Hmm.  I bet problem is that the buf isn't being freed in
printtup_shutdown.

> Also, it doesn't depend on bytea, it
> leaks for text as well.

Yeah, I'd expected that it wasn't particularly bytea's fault.

            regards, tom lane


I wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> The culprit is f2dec34e19d3969ddd6.

> Hmm.  I bet problem is that the buf isn't being freed in
> printtup_shutdown.

Yup; the attached fixes it.  Will push after a bit more testing.

            regards, tom lane

diff --git a/src/backend/access/common/printtup.c b/src/backend/access/common/printtup.c
index e4ee5c9..5eb928b 100644
*** a/src/backend/access/common/printtup.c
--- b/src/backend/access/common/printtup.c
*************** printtup_create_DR(CommandDest dest)
*** 91,96 ****
--- 91,97 ----
       */
      self->sendDescrip = (dest == DestRemote);

+     self->buf.data = NULL;
      self->attrinfo = NULL;
      self->nattrs = 0;
      self->myinfo = NULL;
*************** printtup_shutdown(DestReceiver *self)
*** 547,552 ****
--- 548,557 ----
      if (myState->tmpcontext)
          MemoryContextDelete(myState->tmpcontext);
      myState->tmpcontext = NULL;
+
+     if (myState->buf.data)
+         pfree(myState->buf.data);
+     myState->buf.data = NULL;
  }

  /* ----------------

Hi,

On 2019-03-18 16:55:21 -0400, Tom Lane wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
> > The culprit is f2dec34e19d3969ddd6.
> 
> Hmm.  I bet problem is that the buf isn't being freed in
> printtup_shutdown.

That's indeed part of it, and there's very little reason (besides there
not being a convenient function for it, but it's obviously trivial to
open code) to not free it.

This specific issue can trivially resolved by adding something like
    /* free memory allocated by initStringInfo */
    if(myState->buf.data)
        pfree(myState->buf.data);
    myState->buf.data = NULL;

to printtup_shutdown(). I'm inclined to just leave it open coded for
now.


But are we actually prepared to assume that nothing ever leaks in
receivers? That's, as far as I can tell, not a documented
assumption.

What we're getting here is a stream of:

2019-03-18 14:47:46.159 PDT [30035][3/4] LOG:  duration: 0.269 ms  parse <unnamed>: SELECT b FROM public.a
2019-03-18 14:47:46.159 PDT [30035][3/4] LOG:  duration: 0.235 ms  bind <unnamed>/C_1: SELECT b FROM public.a
2019-03-18 14:47:50.705 PDT [30035][3/4] LOG:  duration: 4545.495 ms  execute <unnamed>/C_1: SELECT b FROM public.a
2019-03-18 14:48:01.126 PDT [30035][3/4] LOG:  duration: 9613.291 ms  execute fetch from <unnamed>/C_1: SELECT b FROM
public.a
2019-03-18 14:48:06.488 PDT [30035][3/4] LOG:  duration: 4564.857 ms  execute fetch from <unnamed>/C_1: SELECT b FROM
public.a
2019-03-18 14:48:11.700 PDT [30035][3/4] LOG:  duration: 4449.060 ms  execute fetch from <unnamed>/C_1: SELECT b FROM
public.a

(ignore the timing, I had a debugger attached)

So we're assuming that all dest receivers never leak.

We're also assuming that we don't leak into MessageContext over such
cycles, which seems wrong. At the very least things like
errdetail_params() are happy to leak into MessageContext.

Greetings,

Andres Freund


I wrote:
> Yup; the attached fixes it.  Will push after a bit more testing.

Done now, but I forgot to credit Jeff in the commit message.
My apologies for that (too bad it's impossible to edit commit
messages after the fact).

            regards, tom lane


Andres Freund <andres@anarazel.de> writes:
> But are we actually prepared to assume that nothing ever leaks in
> receivers? That's, as far as I can tell, not a documented
> assumption.

There's a lot that's not documented there :-(

> We're also assuming that we don't leak into MessageContext over such
> cycles, which seems wrong. At the very least things like
> errdetail_params() are happy to leak into MessageContext.

This leak isn't in MessageContext; if it were, there likely wouldn't
have been a noticeable problem.  It's leaking in the executor's
context over repeat ExecutorRun cycles in the same execution state.
We do have an expectation that there won't be per-row leakage in
the ExecutorState.

            regards, tom lane


Hi,

On 2019-03-18 18:03:40 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > We're also assuming that we don't leak into MessageContext over such
> > cycles, which seems wrong. At the very least things like
> > errdetail_params() are happy to leak into MessageContext.
> 
> This leak isn't in MessageContext; if it were, there likely wouldn't
> have been a noticeable problem.

I know that this leak wasn't into MessageContext - I was (wrongly)
thinking that there also might be an issue related to MessageContext
too, but I was wrong on that.


> It's leaking in the executor's
> context over repeat ExecutorRun cycles in the same execution state.

What I don't quite get is why we're doing

    if (sendTuples)
        dest->rStartup(dest, operation, queryDesc->tupDesc);

inside the per-query context, given that the lifetime of the started
dest receiver can be, like in this example, be considerably shorter than
the query execution.  I guess given the current interface we can't
really do much better, as we'd also not want to leak into the calling
context, and allocating yet another memory context wouldn't be cheap :(

Greetings,

Andres Freund


Re: BUG #15700: PG 10 vs. 11: Large increase in memory usage whenselecting BYTEA data (maybe memory leak)

От
Matthias Otterbach
Дата:
Dear all,

On 2019-03-18 23:00, Tom Lane wrote:
> I wrote:
>> Yup; the attached fixes it.  Will push after a bit more testing.
> 
> Done now, but I forgot to credit Jeff in the commit message.
> My apologies for that (too bad it's impossible to edit commit
> messages after the fact).

thanks a lot for your replies and the quick fix, I also re-tested it 
just now (compiling from the current 11 branch) and could not reproduce 
my memory issue anymore.

Thanks again, best regards
Matthias Otterbach