Обсуждение: Out of memory
Hi, I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a message onthis list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now getting furtherout of memory issues during the same stage of plpgsql function as mentioned before. The function itself is run as part of larger transaction which does the following: 1/ Maintains 104 tables (15 PostGIS tables), by loading and applying incremental table changes. A typical incremental loadwith maintain about 10,000 rows. 2/ When each one of these tables is updated an after trigger is fired that maintains an associated table revision table. 3/ After all of the tables are maintained a plpgsql function is called to build/maintain a set of de-normalised tables. Thesetables total about 20GB. Each one of these tables is compared against the previous table revision to determine its rowchanges. It's in this function that the out of memory exception is occurring. The server log error message I'm getting in the function is here http://pastebin.com/346zi2sS. It's very long and containsthe top transaction memory debug info. My initial observation about this error is that maybe PostgreSQL is encountering a memory corruption error because the amountof OS memory does not seem to run out. The plpgsql function uses functions from both PostGIS and pgc_checksum (http://pgfoundry.org/projects/pg-comparator)- so maybe they are the cause of the problem. Or maybe I have configured somethingwrong... I did some memory logging during and the execution of the function. It shows for the majority of the transaction executionthat the actual memory used is about 1GB (grows from the initial 600mb) with about 6.5GB cached for the OS: total used free shared buffers cached Mem: 8004 7839 165 0 0 6802 -/+ buffers/cache: 1037 6967 Swap: 397 0 397 But just before the out of memory error occurs there is a spike to 2.5GB of used memory, but there us still 4.5GB cachedby the OS: total used free shared buffers cached Mem: 8004 7702 301 0 0 4854 -/+ buffers/cache: 2848 5156 Swap: 397 0 397 Then after the error the memory slowly returns this state: total used free shared buffers cached Mem: 8004 1478 6526 0 0 1133 -/+ buffers/cache: 345 7659 Swap: 397 0 397 The OS I'm running is: Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux. It’s a running on VMWare and, has 2 CPU’s and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running otherthan cacti, ssh and ftp server daemons. The main OS parameters I have tuned are: vm.swappiness=0 vm.overcommit_memory=2 kernel.shmmax = 4196769792 kernel.shmall = 1024602 And the PostgreSQL is: PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit. The main changed postgresql.conf parameters I've tuned are: shared_buffers = 512MB maintenance_work_mem = 512MB temp_buffers = 256MB work_mem = 1MB wal_buffers = 16MB effective_cache_size = 4094MB The size of the database is 350GB. The typical number of users connected to the database is 1 or 2. This database is usedfor loading external data, managing revision table information and generating and outputting de-normalised datasets,so it does not have a high number of transactions running. Typically 1 large one per day. Two questions: 1) Have I set the OS and postgresql parameter to sensible values given the hardware and database utilization. 2) Can anyone help me make sense of the top transaction memory error to help track down the issue? Any other suggestions would be greatly appreciated. Thanks Jeremy ________________________________________ From: Jeremy Palmer Sent: Saturday, 26 March 2011 9:57 p.m. To: Scott Marlowe Cc: pgsql-general@postgresql.org Subject: RE: [GENERAL] Out of memory Hi Scott, It was the work_mem that was set too high. I reduced it to 32mb and the function executed. Just so I understand this. Every time a sort is performed within a function, the sort memory is allocated, and then it notreleased until the function completes? Rather then deallocating the memory after each sort operation has completed. Thanks, Jeremy ________________________________________ From: Scott Marlowe [scott.marlowe@gmail.com] Sent: Friday, 25 March 2011 5:04 p.m. To: Jeremy Palmer Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Out of memory On Thu, Mar 24, 2011 at 9:23 PM, Jeremy Palmer <JPalmer@linz.govt.nz> wrote: > I’ve been getting database out of memory failures with some queries which > deal with a reasonable amount of data. > > I was wondering what I should be looking at to stop this from happening. > > The typical messages I been getting are like this: > http://pastebin.com/Jxfu3nYm > The OS is: > > Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC > 2011 x86_64 GNU/Linux. > > It’s a running on VMWare and, has 2 CPU’s and 8GB of RAM. This VM is > dedicated to PostgreSQL. The main OS parameters I have tuned are: > > work_mem = 200MB That's a really big work_mem. I have mainline db servers with 128G of ram that have work_mem set to 16M and that is still considered a little high in my book. If you drop work_mem down to 1MB does the out of memory go away? work_mem is how much memory EACH sort can use on its own, if you have a plpgsql procedure that keeps running query after query, it could use a LOT of memory really fast. ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
On Tue, 2011-04-05 at 21:50 +1200, Jeremy Palmer wrote:
> Hi,
>
> I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a
messageon this list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now
gettingfurther out of memory issues during the same stage of plpgsql function as mentioned before.
>
> The function itself is run as part of larger transaction which does the following:
Where is the source to the function?
Regards,
Jeff Davis
On 04/05/11 2:50 AM, Jeremy Palmer wrote:
> I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a
messageon this list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now
gettingfurther out of memory issues during the same stage of plpgsql function as mentioned before.
>
> The function itself is run as part of larger transaction which does the following:
>
> 1/ Maintains 104 tables (15 PostGIS tables), by loading and applying incremental table changes. A typical incremental
loadwith maintain about 10,000 rows.
>
> 2/ When each one of these tables is updated an after trigger is fired that maintains an associated table revision
table.
>
> 3/ After all of the tables are maintained a plpgsql function is called to build/maintain a set of de-normalised
tables.These tables total about 20GB. Each one of these tables is compared against the previous table revision to
determineits row changes. It's in this function that the out of memory exception is occurring.
a few random questions...
Does that all really have to be a single transaction?
Do you really need to use triggers for your revision tracking, and can't
rely on your daily update cycle to manually set the revision information?
Is it really necessary to generate massive denormalized tables, rather
than using view's to join the data?
> shared_buffers = 512MB
> maintenance_work_mem = 512MB
> temp_buffers = 256MB
> work_mem = 1MB
> wal_buffers = 16MB
> effective_cache_size = 4094MB
>
> The size of the database is 350GB. The typical number of users connected to the database is 1 or 2. This database is
usedfor loading external data, managing revision table information and generating and outputting de-normalised
datasets,so it does not have a high number of transactions running. Typically 1 large one per day.
with only 1-2 connections, you certainly could increase the work_mem.
Alternately, this single giant transaction could manually set a larger
work_mem which would only apply to it. Personally, given your 8gb
system and what you've described, I think I'd set the tuning parameters
something like...
shared_buffers = 1GB
maintenance_work_mem = 128MB
temp_buffers = 64MB
work_mem = 16MB
wal_buffers = 16MB
effective_cache_size = 4094MB
adjust effective_cache_size to somewhat less than the 'cached' value
shown in `free -m` after your system has been running for awhile.
Hi John, > Does that all really have to be a single transaction? Yes - I need to ensure that of the changesets and denormalised tables are created in the same transaction, so that if anerror occurs the database is rolled back to the last successfully applied changeset. I don't want to get into the businessof the splitting it into separate transactions and then having to revert changes that were applied in a previoustransaction step. > Do you really need to use triggers for your revision tracking, and can't > rely on your daily update cycle to manually set the revision information? They are not necessary, but it has the cleanest code implementation and makes the revision maintenance to the tables almosttransparent. If they are causing the problem I could change the logic... > Is it really necessary to generate massive denormalized tables, rather > than using view's to join the data? Yes - to create the tables is complex, and often involves complex functions and multiple temp tables. The overall time tocreate these tables is somewhere in the area of 3hours on this server. I'm also unloading these tables multiple times forseparate purposes, so they would need to be materialised anyway. > with only 1-2 connections, you certainly could increase the work_mem. I can't increase this value at the moment on this server because I was getting out of memory errors with the initial populationof the database (which builds the denormalized tables, but does not determine the changeset to the previous tablerevision). I tried values, 256mb - 2mb and could only get the query to run with 1mb. I suspect even this was pushing the boundary, sowhen I got to the next stage in my testing - to apply incremental updates - the memory issue raised it head again. Regards, Jeremy ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
Hi Jeff, < Where is the source to the function? The source is located here: https://github.com/linz/linz_bde_uploader The main function LDS_MaintainSimplifiedLayers that is being called is on line 37 is in https://github.com/linz/linz_bde_uploader/blob/master/sql/lds_layer_functions.sql. The actual out of memory exception was caught with the bde_GetTableDifferences function source file on line 3263 in https://github.com/linz/linz_bde_uploader/blob/master/sql/bde_control_functions.sql. When I was actually getting an out of memory issue when creating the tables (not maintaining them), the query that seemedto kill the transaction was the one located at line 1463 of https://github.com/linz/linz_bde_uploader/blob/master/sql/lds_layer_functions.sql.After I dropped the work_mem to 1MB itgot past that and completed ok. But during the maintenance of the table the row differences need to be calculated and thenapplied to the table. See the LDS_ApplyTableDifferences function on line 353. Regards, Jeremy ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
Well after a few days of further investigation I still can't track the issue down. The main problem I can only reproducethe error running the whole transaction. So I can't isolate the problem down to a simple use case or even smallersubset of the transaction, which would have been nice for posting to this list. Does anyone have an idea of how I might go about trying to tackle this problem now. Should I try further reducing the memorysettings? Or install a debug version of PostgreSQL and get some further information about what is going on before theerror. Any advice would be greatly appreciated. Regards, Jeremy ________________________________________ From: Jeremy Palmer Sent: Tuesday, 5 April 2011 9:50 p.m. To: pgsql-general@postgresql.org Subject: Out of memory Hi, I've been having repeated troubles trying to get a PostgreSQL app to play nicely on Ubuntu. I recently posted a message onthis list about an out of memory error and got a resolution by reducing the work_mem setting. However I'm now getting furtherout of memory issues during the same stage of plpgsql function as mentioned before. The function itself is run as part of larger transaction which does the following: 1/ Maintains 104 tables (15 PostGIS tables), by loading and applying incremental table changes. A typical incremental loadwith maintain about 10,000 rows. 2/ When each one of these tables is updated an after trigger is fired that maintains an associated table revision table. 3/ After all of the tables are maintained a plpgsql function is called to build/maintain a set of de-normalised tables. Thesetables total about 20GB. Each one of these tables is compared against the previous table revision to determine its rowchanges. It's in this function that the out of memory exception is occurring. The server log error message I'm getting in the function is here http://pastebin.com/346zi2sS. It's very long and containsthe top transaction memory debug info. My initial observation about this error is that maybe PostgreSQL is encountering a memory corruption error because the amountof OS memory does not seem to run out. The plpgsql function uses functions from both PostGIS and pgc_checksum (http://pgfoundry.org/projects/pg-comparator)- so maybe they are the cause of the problem. Or maybe I have configured somethingwrong... I did some memory logging during and the execution of the function. It shows for the majority of the transaction executionthat the actual memory used is about 1GB (grows from the initial 600mb) with about 6.5GB cached for the OS: total used free shared buffers cached Mem: 8004 7839 165 0 0 6802 -/+ buffers/cache: 1037 6967 Swap: 397 0 397 But just before the out of memory error occurs there is a spike to 2.5GB of used memory, but there us still 4.5GB cachedby the OS: total used free shared buffers cached Mem: 8004 7702 301 0 0 4854 -/+ buffers/cache: 2848 5156 Swap: 397 0 397 Then after the error the memory slowly returns this state: total used free shared buffers cached Mem: 8004 1478 6526 0 0 1133 -/+ buffers/cache: 345 7659 Swap: 397 0 397 The OS I'm running is: Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux. It’s a running on VMWare and, has 2 CPU’s and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running otherthan cacti, ssh and ftp server daemons. The main OS parameters I have tuned are: vm.swappiness=0 vm.overcommit_memory=2 kernel.shmmax = 4196769792 kernel.shmall = 1024602 And the PostgreSQL is: PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit. The main changed postgresql.conf parameters I've tuned are: shared_buffers = 512MB maintenance_work_mem = 512MB temp_buffers = 256MB work_mem = 1MB wal_buffers = 16MB effective_cache_size = 4094MB The size of the database is 350GB. The typical number of users connected to the database is 1 or 2. This database is usedfor loading external data, managing revision table information and generating and outputting de-normalised datasets,so it does not have a high number of transactions running. Typically 1 large one per day. Two questions: 1) Have I set the OS and postgresql parameter to sensible values given the hardware and database utilization. 2) Can anyone help me make sense of the top transaction memory error to help track down the issue? Any other suggestions would be greatly appreciated. Thanks Jeremy ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________
Hi All,
I running PostgreSQL 9.0.3 and getting an out of memory error while running a big transaction. This error does not
crashthe backend.
The nature of this transaction is it is sequentially applying data updates to a large number (104) of tables, then
afterapplying those updates, a series of materialised views are created or updated using a set of pl/pgsql function. I
needthis operation to be one transaction so I can rollback the update if any issue occurs.
The size of the database and the database is 350GB and I would characterise this system as a data warehouse.
At this stage I can't isolate the problem down to a simple use case or even smaller subset of the transaction, which
wouldhave been nice for posting to this list. I can only replicate the error when I run the "entire" transaction -
runningthe table update part or materialised views parts separately work fine. To make matter worse I do not see any
contextaround my error "out of memory" message - which makes the message useless.
The actual error occurs during a table compare operation within a function which uses 2 cursor to scan for differences.
Howeverdirectly before this I have a query that generates a temp table that I had trouble with earlier (when initially
tunningthe server) and had to set the work_mem to 1MB so it would run:
CREATE TEMP TABLE tmp_titles AS
SELECT
TTL.audit_id AS id,
TTL.title_no,
TTL.status,
TTLT.char_value AS type,
LOC.name AS land_district,
TTL.issue_date,
TTLG.char_value AS guarantee_status,
string_agg(
DISTINCT(
ETTT.char_value || ', ' ||
ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '')
),
E'\r\n'
ORDER BY
ETTT.char_value || ', ' ||
ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '') ASC
) AS estate_description,
string_agg(
DISTINCT
CASE PRP.type
WHEN 'CORP' THEN PRP.corporate_name
WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
END,
', '
ORDER BY
CASE PRP.type
WHEN 'CORP' THEN PRP.corporate_name
WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
END ASC
) AS owners,
count(
DISTINCT
CASE PRP.type
WHEN 'CORP' THEN PRP.corporate_name
WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
END
) AS number_owners,
TPA.title_no IS NOT NULL AS part_share,
-- With Postgis 1.5.2 the ST_Collect aggregate returns a truncated
-- collection when a null value is found. To fix this the shapes
-- are order so all null shapes row are at the end of input list.
ST_Multi(ST_Collect(PAR.shape ORDER BY PAR.shape ASC)) AS shape
FROM
crs_title TTL
LEFT JOIN crs_title_estate ETT ON TTL.title_no = ETT.ttl_title_no AND ETT.status = 'REGD'
LEFT JOIN crs_estate_share ETS ON ETT.id = ETS.ett_id AND ETT.status = 'REGD'
LEFT JOIN crs_proprietor PRP ON ETS.id = PRP.ets_id AND PRP.status = 'REGD'
LEFT JOIN crs_legal_desc LGD ON ETT.lgd_id = LGD.id AND LGD.type = 'ETT' AND LGD.status = 'REGD'
LEFT JOIN crs_legal_desc_prl LGP ON LGD.id = LGP.lgd_id
LEFT JOIN (
SELECT
title_no
FROM
tmp_parcel_titles
GROUP BY
title_no
HAVING
count(*) > 1
) TPA ON TTL.title_no = TPA.title_no
LEFT JOIN (
SELECT
id,
(ST_Dump(shape)).geom AS shape
FROM
crs_parcel
WHERE
status = 'CURR' AND
ST_GeometryType(shape) IN ('ST_MultiPolygon', 'ST_Polygon')
) PAR ON LGP.par_id = PAR.id
JOIN crs_locality LOC ON TTL.ldt_loc_id = LOC.id
JOIN crs_sys_code TTLG ON TTL.guarantee_status = TTLG.code AND TTLG.scg_code = 'TTLG'
JOIN crs_sys_code TTLT ON TTL.type = TTLT.code AND TTLT.scg_code = 'TTLT'
LEFT JOIN crs_sys_code ETTT ON ETT.type = ETTT.code AND ETTT.scg_code = 'ETTT'
WHERE
TTL.status IN ('LIVE', 'PRTC') AND
TTL.title_no NOT IN (SELECT title_no FROM tmp_excluded_titles)
GROUP BY
TTL.audit_id,
TTL.title_no,
TTL.status,
TTLT.char_value,
LOC.name,
TTL.issue_date,
TTLG.char_value,
TPA.title_no;
This query does a lot of string concatenation and uses the new 9.0 string_agg function. The result set is about
2millionrows.
So down to questions:
1) Is there anything I can do to get more information out about this memory error message? Debug build maybe, attach a
debuggerand set some break points?
2) Is there anything I can do to optimise the memory so this transaction might be able to run?
3) I noticed in the Top Memory Context dump in the server log contained a lot of the MCs for table indexes/PKs (about
850)See http://pastebin.com/346zi2sS. Is this a problem? Could these MCs be cleaned-up part way thought the
transaction?Maybe use a savepoint? Or do they exist for the life of session or transaction?
4) Can anyone help me make sense of the top transaction memory error to help track down the issue? What number should I
belooking for?
Well I'm pretty much at the end of the line of in terms of getting PostgreSQL to work for this task. So any guidance
wouldbe soooo much appreciated.
Thanks again,
Jeremy
PS. My config:
The OS I'm running is:
Linux TSTLHAPP01 2.6.32-29-server #58-Ubuntu SMP Fri Feb 11 21:06:51 UTC 2011 x86_64 GNU/Linux.
It's a running on VMWare and, has 2 CPU's and 8GB of RAM. This VM is dedicated to PostgreSQL, not much else is running
otherthan cacti, ssh and ftp server daemons. The main OS parameters I have tuned are:
vm.swappiness=0
vm.overcommit_memory=2
kernel.shmmax = 4196769792
kernel.shmall = 1024602
And the PostgreSQL is:
PostgreSQL 9.0.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit.
The main changed postgresql.conf parameters I've tuned are:
shared_buffers = 512MB
maintenance_work_mem = 512MB
temp_buffers = 256MB
work_mem = 1MB
wal_buffers = 16MB
effective_cache_size = 4094MB
The typical number of users connected to the database is 1 or 2.
______________________________________________________________________________________________________
This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message.
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.
Thank you.
______________________________________________________________________________________________________
Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> I running PostgreSQL 9.0.3 and getting an out of memory error while running a big transaction. This error does not
crashthe backend.
If it's a standard "out of memory" message, there should be a memory
context map dumped to postmaster's stderr. (Which is inconvenient for
some logging arrangements, but it's hard to improve that without risking
not being able to print the map for lack of memory ...) If you could
show us the map it might help to figure out what's happening.
regards, tom lane
Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> Ok I have attached the map, or least what I think the map is.
Yup, that's what I was after. It looks like the main problem is here:
> PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used
> ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used
> ExprContext: 2622363000 total in 9 blocks; 21080 free (15 chunks); 2622341920 used
You've evidently got a leak during execution of a query that's being run
in a "portal", which most likely is a cursor or plpgsql FOR-IN-SELECT
query. Unfortunately there's not enough information here to tell which
query that is, but maybe you can narrow it down now. I'm guessing that
some specific function or operator you're using in that query is leaking
memory with successive executions.
regards, tom lane
Hi Tom,
Wow thank you so much for the hint!
The plpgsql code that is could be to blame is in the below snippet. I had a look and I'm not sure why it might be
leaking.Is it because I assign the v_id1 and v_id2 to the return table 'id' record, return it and then assign to v_id1
orv_id2 again from the cursor?
CREATE OR REPLACE FUNCTION bde_control.bde_gettabledifferences(p_table1 regclass, p_table2 regclass, p_compare_key
name)
RETURNS TABLE("action" character, id bigint) AS
...
...
FETCH FIRST FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1;
FETCH FIRST FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2;
WHILE v_id1 IS NOT NULL AND v_id2 IS NOT NULL LOOP
IF v_id1 < v_id2 THEN
action := 'D';
id := v_id1;
RETURN NEXT;
FETCH NEXT FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1;
CONTINUE;
ELSIF v_id2 < v_id1 THEN
action := 'I';
id := v_id2;
RETURN NEXT;
FETCH NEXT FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2;
CONTINUE;
ELSIF v_uniq1 <> v_uniq2 THEN
action := 'X';
id := v_id1;
RETURN NEXT;
ELSIF v_check1 <> v_check2 THEN
action := 'U';
id := v_id1;
RETURN NEXT;
END IF;
FETCH NEXT FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1;
FETCH NEXT FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2;
END LOOP;
WHILE v_id1 IS NOT NULL LOOP
action := 'D';
id := v_id1;
RETURN NEXT;
FETCH NEXT FROM v_table_cur1 INTO v_id1, v_check1, v_uniq1;
END LOOP;
WHILE v_id2 IS NOT NULL LOOP
action := 'I';
id := v_id2;
RETURN NEXT;
FETCH NEXT FROM v_table_cur2 INTO v_id2, v_check2, v_uniq2;
END LOOP;
CLOSE v_table_cur1;
CLOSE v_table_cur2;
RETURN;
The full function can be read in full here:
https://github.com/linz/linz_bde_uploader/blob/master/sql/bde_control_functions.sql#L3263
The actual query for both cursors in the case of the table that was proabily causing the error looks like this
(v_table_cur1and v_table_cur2 only differ by table referenced):
SELECT
id AS ID,
COALESCE('V|' || CAST(T.estate_description AS TEXT), '|N') || '|V' ||
CAST(T.guarantee_status AS TEXT) || '|V' ||
CAST(T.issue_date AS TEXT) || '|V' ||
CAST(T.land_district AS TEXT) || '|V' ||
CAST(T.number_owners AS TEXT) || '|V' ||
CAST(T.part_share AS TEXT) ||
COALESCE('V|' || CAST(T.shape AS TEXT), '|N') || '|V' ||
CAST(T.status AS TEXT) || '|V' ||
CAST(T.title_no AS TEXT) || '|V' ||
CAST(T.type AS TEXT) AS check_sum,
'' AS check_uniq
FROM
lds.titles AS T
ORDER BY
id ASC;
The definition for the table looks like this:
CREATE TABLE titles (
id INTEGER NOT NULL PRIMARY KEY,
title_no VARCHAR(20) NOT NULL,
status VARCHAR(4) NOT NULL,
type TEXT NOT NULL,
land_district VARCHAR(100) NOT NULL,
issue_date TIMESTAMP NOT NULL,
guarantee_status TEXT NOT NULL,
estate_description TEXT,
number_owners INT8 NOT NULL,
part_share BOOLEAN NOT NULL,
shape GEOMETRY,
);
CREATE INDEX shx_title_shape ON titles USING gist (shape);
Thanks,
Jeremy
________________________________________
From: Tom Lane [tgl@sss.pgh.pa.us]
Sent: Wednesday, 13 April 2011 5:44 p.m.
To: Jeremy Palmer
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] 9.0 Out of memory
Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> Ok I have attached the map, or least what I think the map is.
Yup, that's what I was after. It looks like the main problem is here:
> PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used
> ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used
> ExprContext: 2622363000 total in 9 blocks; 21080 free (15 chunks); 2622341920 used
You've evidently got a leak during execution of a query that's being run
in a "portal", which most likely is a cursor or plpgsql FOR-IN-SELECT
query. Unfortunately there's not enough information here to tell which
query that is, but maybe you can narrow it down now. I'm guessing that
some specific function or operator you're using in that query is leaking
memory with successive executions.
regards, tom lane
______________________________________________________________________________________________________
This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message.
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.
Thank you.
______________________________________________________________________________________________________
Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> The plpgsql code that is could be to blame is in the below snippet. I had a look and I'm not sure why it might be
leaking.Is it because I assign the v_id1 and v_id2 to the return table 'id' record, return it and then assign to v_id1
orv_id2 again from the cursor?
No, given the info from the memory map I'd have to say that the leakage
is in the cursor not in what you do in the plpgsql function. The cursor
query looks fairly unexciting except for the cast from geometry to text.
I don't have PostGIS installed here so I can't do any testing, but I
wonder whether the leak goes away if you remove that part of the query
(ie, leave the shape out of the "checksum" for testing purposes).
If so, you probably ought to file the issue as a PostGIS bug.
regards, tom lane
> No, given the info from the memory map I'd have to say that the leakage
> is in the cursor not in what you do in the plpgsql function. The cursor
> query looks fairly unexciting except for the cast from geometry to text.
> I don't have PostGIS installed here so I can't do any testing, but I
> wonder whether the leak goes away if you remove that part of the query
> (ie, leave the shape out of the "checksum" for testing purposes).
> If so, you probably ought to file the issue as a PostGIS bug.
Ok I removed the geometry column from the cursor query within the function and the session still runs out of memory.
I'mstill seeing the same error message as well:
PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used
ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used
ExprContext: 2496819768 total in 9 blocks; 21080 free (15 chunks); 2496798688 used
So I guess it's not likely to be the PostGIS geometry to text cast that is leaking the memory.
One thing that has got me interested now is query that executes directly before (see SQL below). If I remove the
geometrycolumn that is generated using ST_Collect aggregate function, the subsequent function involving the cursor
querycompletes and the transaction also runs to completion.
Is there any way that ST_Collect could be leaking memory into a context that does not get cleaned up after the query
runs?Or do I have two leaks going on here?!
Cheers,
Jeremy
CREATE TEMP TABLE tmp_titles AS
SELECT
TTL.audit_id AS id,
TTL.title_no,
TTL.status,
TTLT.char_value AS type,
LOC.name AS land_district,
TTL.issue_date,
TTLG.char_value AS guarantee_status,
string_agg(
DISTINCT(
ETTT.char_value || ', ' ||
ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '')
),
E'\r\n'
ORDER BY
ETTT.char_value || ', ' ||
ETT.share || COALESCE(', ' || LGD.legal_desc_text, '') ||
COALESCE(', ' || to_char(ROUND(LGD.total_area, 0), 'FM9G999G999G999G999') || ' m2', '') ASC
) AS estate_description,
string_agg(
DISTINCT
CASE PRP.type
WHEN 'CORP' THEN PRP.corporate_name
WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
END,
', '
ORDER BY
CASE PRP.type
WHEN 'CORP' THEN PRP.corporate_name
WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
END ASC
) AS owners,
count(
DISTINCT
CASE PRP.type
WHEN 'CORP' THEN PRP.corporate_name
WHEN 'PERS' THEN COALESCE(PRP.prime_other_names || ' ', '') || PRP.prime_surname
END
) AS number_owners,
TPA.title_no IS NOT NULL AS part_share,
-- With Postgis 1.5.2 the ST_Collect aggregate returns a truncated
-- collection when a null value is found. To fix this the shapes
-- are order so all null shapes row are at the end of input list.
ST_Multi(ST_Collect(PAR.shape ORDER BY PAR.shape ASC)) AS shape
FROM
crs_title TTL
LEFT JOIN crs_title_estate ETT ON TTL.title_no = ETT.ttl_title_no AND ETT.status = 'REGD'
LEFT JOIN crs_estate_share ETS ON ETT.id = ETS.ett_id AND ETT.status = 'REGD'
LEFT JOIN crs_proprietor PRP ON ETS.id = PRP.ets_id AND PRP.status = 'REGD'
LEFT JOIN crs_legal_desc LGD ON ETT.lgd_id = LGD.id AND LGD.type = 'ETT' AND LGD.status = 'REGD'
LEFT JOIN crs_legal_desc_prl LGP ON LGD.id = LGP.lgd_id
LEFT JOIN (
SELECT
title_no
FROM
tmp_parcel_titles
GROUP BY
title_no
HAVING
count(*) > 1
) TPA ON TTL.title_no = TPA.title_no
LEFT JOIN (
SELECT
id,
(ST_Dump(shape)).geom AS shape
FROM
crs_parcel
WHERE
status = 'CURR' AND
ST_GeometryType(shape) IN ('ST_MultiPolygon', 'ST_Polygon')
) PAR ON LGP.par_id = PAR.id
JOIN crs_locality LOC ON TTL.ldt_loc_id = LOC.id
JOIN crs_sys_code TTLG ON TTL.guarantee_status = TTLG.code AND TTLG.scg_code = 'TTLG'
JOIN crs_sys_code TTLT ON TTL.type = TTLT.code AND TTLT.scg_code = 'TTLT'
LEFT JOIN crs_sys_code ETTT ON ETT.type = ETTT.code AND ETTT.scg_code = 'ETTT'
WHERE
TTL.status IN ('LIVE', 'PRTC') AND
TTL.title_no NOT IN (SELECT title_no FROM tmp_excluded_titles)
GROUP BY
TTL.audit_id,
TTL.title_no,
TTL.status,
TTLT.char_value,
LOC.name,
TTL.issue_date,
TTLG.char_value,
TPA.title_no;
______________________________________________________________________________________________________
This message contains information, which is confidential and may be subject to legal privilege.
If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and
destroythe original message.
LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ.
Thank you.
______________________________________________________________________________________________________
Jeremy Palmer <JPalmer@linz.govt.nz> writes:
> Ok I removed the geometry column from the cursor query within the function and the session still runs out of memory.
I'mstill seeing the same error message as well:
> PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used
> ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used
> ExprContext: 2496819768 total in 9 blocks; 21080 free (15 chunks); 2496798688 used
> So I guess it's not likely to be the PostGIS geometry to text cast that is leaking the memory.
OK, so that was a wrong guess.
> One thing that has got me interested now is query that executes directly before (see SQL below). If I remove the
geometrycolumn that is generated using ST_Collect aggregate function, the subsequent function involving the cursor
querycompletes and the transaction also runs to completion.
Hrm. We were pretty much guessing as to which query was running in that
portal, I think. It seems entirely plausible that this other query is
the one at fault instead. It might be premature to blame ST_Collect per
se though --- in particular I'm wondering about the ORDER BY on the
ST_Collect's input. But if this line of thought is correct, you ought
to be able to exhibit a memory leak using just that sub-part of that
query, without the surrounding function or any other baggage. Maybe the
leak wouldn't drive the backend to complete failure without that
additional overhead; but a leak of a couple gig ought to be pretty
obvious when watching the process with "top" or similar tool.
regards, tom lane
>> Ok I removed the geometry column from the cursor query within the function and the session still runs out of memory. I'mstill seeing the same error message as well: >> PortalHeapMemory: 16384 total in 4 blocks; 5944 free (0 chunks); 10440 used >> ExecutorState: 122880 total in 4 blocks; 63984 free (8 chunks); 58896 used >> ExprContext: 2496819768 total in 9 blocks; 21080 free (15 >> chunks); 2496798688 used >> So I guess it's not likely to be the PostGIS geometry to text cast that is leaking the memory. >OK, so that was a wrong guess. Hi Tom, I finally tracked down the issue! The query that was generating the temp table which was used as input into the 2 cursorqueries was generating an invalid, very large geometry (like 200mb). It turned out I had a bug in the previous testing I was doing, and I didn't actually remove the geometry column from thesecond cursor. So I guess a 200mb geometry being cast to text used too much memory. Not sure if there is still a leak...But I guess that depends on weather the geometry expands to over 1 GB when converted to text. Anyway I would like to personally thank you for you time in helping me with this issue. Regards, Jeremy ______________________________________________________________________________________________________ This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz) and destroythe original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank you. ______________________________________________________________________________________________________