Обсуждение: BUG #17906: Segmentation fault and database crash during procedure call
BUG #17906: Segmentation fault and database crash during procedure call
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 17906
Logged by: Václav Pink
Email address: vaclav.pink@tietoevry.com
PostgreSQL version: 13.10
Operating system: running on linux server RHEL 7.9 (x64)
Description:
Good afternoon gentlemans,
we are fighting more than 3 weeks with error: server process (PID 10028) was
terminated by signal 11: Segmentation fault
This error occur when I call db procedure (whole procedure is below).
Database is in cluster, version of Postgresql is 13.10 and it's runnig on
linux server RHEL 7.9 (x64).
We can't find reason why it fails and whole database crashes.
We will be thankful of any comments, hints, suggestions how to fix it.
If I can provide any other info what can help you, please let me know.
Thank you very much!
Best regards
Vaclav Pink
details from log:
2023-04-21 08:15:58 UTC [10028]: [8-1] user=dm_admin,db=dm_data CONTEXT:
PL/pgSQL function dm_dev.get_domains(character varying,character
varying,character varying,character varying,jsonb) line 93 at RAISE
2023-04-21 08:15:58 UTC [8848]: [10-1] user=,db= LOG: server process (PID
10028) was terminated by signal 11: Segmentation fault
2023-04-21 08:15:58 UTC [8848]: [11-1] user=,db= DETAIL: Failed process was
running: CALL dm_dev.get_domains (
'te_mf',
null,
null,
null,
null
);
2023-04-21 08:15:58 UTC [8848]: [12-1] user=,db= LOG: terminating any other
active server processes
2023-04-21 08:15:58 UTC [20494]: [3-1] user=snowmirror,db=dm_data WARNING:
terminating connection because of crash of another server process
2023-04-21 08:15:58 UTC [20494]: [4-1] user=snowmirror,db=dm_data DETAIL:
The postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
/var/log/messages :
Apr 21 10:15:58 tfiubb4dmrepo1 kernel: postmaster[10028]: segfault at
2e19000 ip 00007fbac32f857f sp 00007fff722d8c48 error 4 in
libc-2.17.so[7fbac31a2000+1c4000]
procedure definiton:
CREATE OR REPLACE PROCEDURE dm_dev.get_domains(
user_name character varying,
query character varying,
INOUT response_code character varying,
INOUT response_message character varying,
INOUT response_body jsonb)
LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
user_internal boolean;
user_pii boolean;
user_domain_path character varying (40);
user_fixed_filter1 character varying (200);
user_fixed_filter2 character varying (200);
user_fixed_filter3 character varying (200);
user_domain character varying (40);
record_count integer;
dd_count integer;
r_data_row jsonb;
sql_command character varying (2000) DEFAULT '';
hlp_object json;
BEGIN
CREATE UNLOGGED TABLE IF NOT EXISTS record_temp_unlog (
id serial,
unique_call_id character varying (100),
r_data jsonb
);
CREATE UNLOGGED TABLE IF NOT EXISTS record_temp_unlog2 (
id serial,
unique_call_id character varying (100),
r_data jsonb
);
SELECT internal, pii, domain, domain_path, fixed_filter1, fixed_filter2,
fixed_filter3
INTO user_internal, user_pii, user_domain, user_domain_path,
user_fixed_filter1, user_fixed_filter2, user_fixed_filter3
FROM dm.apim_users
WHERE username = user_name;
response_code := '200';
response_message := 'Returns list of ServiceNow domains';
sql_command := sql_command ||
'INSERT INTO record_temp_unlog (r_data)
SELECT row_to_json (r)
FROM (
SELECT sys_id AS "id",
name AS "name",
active AS active,
type AS type,
json_build_object( ''id'', parent_id, ''name'', parent_name) AS
parent,
sys_domain_path AS domain_path,
full_name AS full_name,
level AS level,
sn_instance_code AS sn_instance_code,
sn_instance AS sn_instance_label,
case when sys_domain = ' ||''''|| user_domain ||'''' || ' then true
else false end as domain_flag
FROM dm.domains dom
WHERE dom.sys_domain_path LIKE ' || '''' || user_domain_path ||'%'
||''''
|| ' AND ' || COALESCE(user_fixed_filter1, ' 1 = 1 ')
|| ' AND ' || COALESCE(user_fixed_filter2, ' 1 = 1 ')
|| ' AND ' || COALESCE(user_fixed_filter3, ' 1 = 1 ')
|| ' LIMIT 10 ) r';
RAISE NOTICE 'sql_string (%)', sql_command;
EXECUTE sql_command;
SELECT COUNT(*)
INTO record_count
FROM record_temp_unlog;
SELECT MAX(order_by)
INTO dd_count
FROM dm.data_dictionary
WHERE entity = 'domains';
response_body := '{}'::jsonb;
BEGIN
FOR i IN 1..record_count LOOP
SELECT r_data
INTO r_data_row
FROM record_temp_unlog
WHERE id = i;
RAISE NOTICE 'pocet radku (%)', r_data_row::text;
BEGIN
FOR j IN 1..dd_count LOOP
SELECT CASE
WHEN (r_data_row ->> 'domain_flag')::boolean IS TRUE
THEN r_data_row
WHEN (dd.internal IS TRUE AND user_internal IS FALSE) OR (dd.pii IS
TRUE AND user_pii IS FALSE) THEN jsonb_set_lax(r_data_row,
string_to_array(dd.name, ','), NULL, false, 'use_json_null')
ELSE r_data_row
END
INTO r_data_row
FROM dm.data_dictionary AS dd
WHERE dd.entity = 'domains'
AND dd.order_by = j;
r_data_row := jsonb_set_lax(r_data_row,
string_to_array(r_data_row->>'domain_flag', ','), NULL, false,
'use_json_null');
END LOOP;
END;
INSERT INTO record_temp_unlog2 (id, r_data)
VALUES (i, r_data_row::jsonb);
RAISE NOTICE 'pocet radku po smycce (%)', r_data_row::text;
END LOOP;
END;
--$do$;
SELECT array_to_json(array_agg(jsonb_strip_nulls(r_data) ORDER BY
r_data->>'id' ))
INTO response_body
FROM record_temp_unlog2;
END $BODY$;
PG Bug reporting form <noreply@postgresql.org> writes: > we are fighting more than 3 weeks with error: server process (PID 10028) was > terminated by signal 11: Segmentation fault > This error occur when I call db procedure (whole procedure is below). Can't help you with such an incomplete bug report. If you could reduce the problem to a self-contained test case, we'd be happy to look at it. See https://wiki.postgresql.org/wiki/Guide_to_reporting_problems regards, tom lane
Re: BUG #17906: Segmentation fault and database crash during procedure call
От
Michael Paquier
Дата:
On Fri, Apr 21, 2023 at 10:09:00AM -0400, Tom Lane wrote: > Can't help you with such an incomplete bug report. If you could reduce > the problem to a self-contained test case, we'd be happy to look at it. > See FYI, an isolated test case would require to know what's behind the definitions of the tables apim_users, data_dictionary and domains which are used as part of the procedure you are seeing to fail. Likely this would require a sample of the data that fails. Being able to get a backtrace of the crash could provide hints, though without a data sample that may be difficult. If you send a sample of data, also make sure to mask anything sensitive. -- Michael
Вложения
Good afternoon guys, Our DB admin tryed reproduce the issue in OCP (single node, cluster similar to original environment), but everything wasOK. Regarding the tables which are used in the procedure - apim_users - basic table at this time with cca 10 rows, 10 columns, datatypes varchar and 2 boolean data_dictionary - small table with cca 10 columns and 150 rows, procedure use only 11 rows (cut by where condition) - datatypesvarchar and 2 boolean domains - materialized view with 11 columns and cca 3500 rows. Datatypes varchar. I'm sorry for not so much details about data, but we have very strict rules... And regarding logs and info - Db admin sayed that nothing more from crash time, only messages which I sent earlier. If something come on your mind, where can be problem, what can solve the issue, it will be very helpful. Thank you very much. Vaclav -----Original Message----- From: Michael Paquier <michael@paquier.xyz> Sent: Monday, May 1, 2023 9:41 AM To: Tom Lane <tgl@sss.pgh.pa.us> Cc: Vaclav Pink <vaclav.pink@tietoevry.com>; pgsql-bugs@lists.postgresql.org Subject: Re: BUG #17906: Segmentation fault and database crash during procedure call On Fri, Apr 21, 2023 at 10:09:00AM -0400, Tom Lane wrote: > Can't help you with such an incomplete bug report. If you could > reduce the problem to a self-contained test case, we'd be happy to look at it. > See FYI, an isolated test case would require to know what's behind the definitions of the tables apim_users, data_dictionaryand domains which are used as part of the procedure you are seeing to fail. Likely this would require a sample of the data that fails. Being able to get a backtrace of the crash could provide hints, though without a data sample that may be difficult. If yousend a sample of data, also make sure to mask anything sensitive. -- Michael
Vaclav Pink <vaclav.pink@tietoevry.com> writes:
> I'm sorry for not so much details about data, but we have very strict rules...
It's unlikely that the specific data involved is significant.
See if you can build a publishable test case using dummy data.
regards, tom lane