Обсуждение: BUG #17990: PSQL Process hangs in parallel mode

Поиск
Список
Период
Сортировка

BUG #17990: PSQL Process hangs in parallel mode

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17990
Logged by:          ess bee
Email address:      ess.bee59@gmail.com
PostgreSQL version: 14.8
Operating system:   Ubuntu-2204-jammy-amd64-base
Description:

Dear postgresql team!

I am using postgresql/postgis to calculate (better "estimate") pseudo-osm
tags to enhance the route-calculation in the "Brouter" for bikers .
Each continent is calculated in a separate database (Europe, north-america,
etc..)

For the planet the sequential calculation currently need 26 hours. My goal
is now to reduce this duration.

Unfortunately the "long" SQL´s (joins) used are running as mono-process
(they are only 1 or 2 % of the time running in 3 parallel processes).
That for I tried to start 2 parallel psql processes  (Europe parallel to the
rest of the planet)

The problem:
By the first 2 tests one of the 2 psql Process was  "hanging" hours / not
ending on a select/join! (first hang appears after 2 hours of processing,
the second after 6 hours) 
No message at all in the postgresql or Linux logs!
Just the CPU load indicated that the process was still "running".
Also after the successful termination of the other process, the hanging
process remained in that state.

  
My configuration:
==> PostgreSQL 14.8 (Ubuntu 14.8-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, 64-bit
==> postgres@Ubuntu-2204-jammy-amd64-base

It is difficult to document the full situation, here just the SQL
"hanging"
SELECT
    m.osm_id losmid, m.highway lhighway, q.highway as qhighway,
q.maxspeed_class,
    case 
    when q.highway in ('motorway', 'motorway_link','trunk','trunk_link') and
q.maxspeed_class < 1.1 then
         st_area(st_intersection(m.way, ST_Union( q.way)))      /
st_area(m.way)
    when q.highway in ('motorway', 'motorway_link','trunk','trunk_link')
then
         st_area(st_intersection(m.way, ST_Union( q.way)))        / (1.5 *
st_area(m.way))

    when q.highway in ('primary','primary_link')                        and
q.maxspeed_class < 2.1  then
    st_area(st_intersection(m.way, ST_Union( q.way)))        / (2 *
st_area(m.way))
    when q.highway in ('primary','primary_link')      then
    st_area(st_intersection(m.way, ST_Union( q.way)))     / (3 *
st_area(m.way))

    when q.highway in ('secondary')                                    and
q.maxspeed_class < 2.1 then
    st_area(st_intersection(m.way, ST_Union( q.way)))        / (3 *
st_area(m.way))
    when q.highway in ('secondary') then 
    st_area(st_intersection(m.way, ST_Union( q.way)))        / (5 *
st_area(m.way))
    end
as noise_factor
into table noise_tmp
FROM osm_line_buf_50 AS m
INNER JOIN osm_line_buf_50 AS q ON ST_Intersects(m.way, q.way) 
WHERE m.highway is not null
and q.highway in ('motorway',
'motorway_link','trunk','trunk_link','primary','primary_link','secondary')
GROUP BY losmid, lhighway, m.way, q.highway, q.maxspeed_class
order by noise_factor desc;

My questions: (as I could not find a corresponding bug in the pgsql-bugs)
Is a similar issue allready known?
Can anybody recommend a change (update, patch, configuration parameter)?
What could I provide for further analysis (pg_stat_activity) ?

Regards (and thank for the fantastic software!)
Ess Bee


Re: BUG #17990: PSQL Process hangs in parallel mode

От
Thomas Munro
Дата:
On Fri, Jun 23, 2023 at 8:47 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
> My questions: (as I could not find a corresponding bug in the pgsql-bugs)
> Is a similar issue allready known?
> Can anybody recommend a change (update, patch, configuration parameter)?
> What could I provide for further analysis (pg_stat_activity) ?

It sounds like something may be wrong, because it should respond to
cancellation promptly (unless perhaps the system is completely swamped
due to resource issues and not responding to anything but it didn't
sound like that from your description).  Before and after you cancel
the query, what do the 'hanging' processes show in the wait_event
column of pg_stat_activity?  Are they still consuming CPU after you
cancel?  Can you get a backtrace of the running processes, with
something like gstack (it may help to install the 'debug symbol'
package for PostgreSQL).  What does the query plan look like, ie
EXPLAIN SELECT ...?