Обсуждение: PostgreSQL 10.0 SELECT LIMIT performance problem

Поиск

Список

Период

Сортировка

PostgreSQL 10.0 SELECT LIMIT performance problem

От

Mareks Kalnačs

Дата:

12 сентября 2018 г., 14:31:50

Hi!

We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.

We have select with subselect filter:

select a.id, a.jdata

from oss_alarms a

where

a.jdata->>'dn' in

(

select o.jdata->>'ossDn'

from oss_objects o, tvc_entity e

where e.jtype='object'

and o.jdata->>'sid'=e.jdata->>'siteId'

and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')

)

order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc

limit 20;

Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.

Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.

Version string PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit

I’m gathered some technical information about DB structure and execution (see attachment).

Mareks Kalnačs

Software Engineer

Software Development Department

T: +371 67628888

M: +371 26479242
@: Mareks.Kalnacs@datakom.lv

www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia

Вложения

Re: PostgreSQL 10.0 SELECT LIMIT performance problem

От

Victor Yegorov

Дата:

12 сентября 2018 г., 18:31:50

ср, 12 сент. 2018 г. в 11:39, Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>:

We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.

Hi, Mareks.

As this is not actually a bug, it's better to use pgsql-performance or pgsql-general for such questions next time.

We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
                a.jdata->>'dn' in
                (
                    select o.jdata->>'ossDn'
                    from oss_objects o, tvc_entity e
                    where e.jtype='object'
                                                and o.jdata->>'sid'=e.jdata->>'siteId'
                                                and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
                )
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;

As execution plan without limit shows, your join conditions yield 316 rows, explicit sort is fast for this amount

For the plan with the limit, planner prefers to use `idx_oss_alarms_alarm_time` index, to avoid extra sort.

This leads to:

Rows Removed by Join Filter: 10717797

I.e. you're reading 10M rows via index scan and later throw them away, as they do not match your join condition:

Join Filter: ((a.jdata ->> 'dn'::text) = (o.jdata ->> 'ossDn'::text))

It looks like a.jdata->'dn' and a.jdata->'alarmTime' are correlated, although planner doesn't knows that.

PostgreSQL 10 has `CREATE STATISTICS`, but it works on table columns, not expressions.

You can try disabling `idx_oss_alarms_alarm_time` by using expression in `ORDER BY`:

order by (idx_oss_alarms_alarm_timetvc_convert_array_to_date(a.jdata -> 'alarmTime'))+INTERVAL '0' desc

But be warned — this can help in some situations and make things worse in others.

I would recommend to move JOIN and ORDER BY columns out of JSON and make them direct table columns.

Victor Yegorov

Вложения

image003.png

Re: PostgreSQL 10.0 SELECT LIMIT performance problem

От

Pavel Stehule

Дата:

12 сентября 2018 г., 18:43:06

2018-09-12 10:31 GMT+02:00 Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>:

Hi!

We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.

We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
                a.jdata->>'dn' in
                (
                    select o.jdata->>'ossDn'
                    from oss_objects o, tvc_entity e
                    where e.jtype='object'
                                                and o.jdata->>'sid'=e.jdata->>'siteId'
                                                and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
                )
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;

Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.
Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.

Version string    PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit

I’m gathered some technical information about DB structure and execution (see attachment).

Sometimes LIMIT clause can confuse optimizator, when data are not uniform.

You can try OFFSET 0 trick:

Original query: SELECT * FROM t ORDER BY c LIMIT 10

transform to:

SELECT * FROM (original query without limit OFFSET 0) x LIMIT 10;

Mareks Kalnačs
Software Engineer
Software Development Department
T:             +371 67628888
M:            +371 26479242
@:            Mareks.Kalnacs@datakom.lv
www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia

Вложения

image003.png

RE: PostgreSQL 10.0 SELECT LIMIT performance problem

От

Mareks Kalnačs

Дата:

12 сентября 2018 г., 19:16:42

Hi Pavel!

He know how to trick SQL, but this not a solution. We have different filters with different distribution fields, for example, jdata->>'dn' distribution is 2.8%, but jdata->>’tech’ 25%, this means we must get count and based on count modify select. This is DB server job, not an application server job.

Best regards,

Mareks Kalnačs

Software Engineer

Software Development Department

T: +371 67628888

M: +371 26479242
@: Mareks.Kalnacs@datakom.lv

www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

From: Pavel Stehule <pavel.stehule@gmail.com>
Sent: trešdiena, 2018. gada 12. septembris 15:43
To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>
Cc: pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem

2018-09-12 10:31 GMT+02:00 Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>:

Hi!

We have some serious performance problem with SELECTS when add limit, for example, execute time without limit ~250msec (316 rows returned), when add limit 20, execute time 15 – 50secs.

We have select with subselect filter:
select a.id, a.jdata
from oss_alarms a
where
                a.jdata->>'dn' in
                (
                    select o.jdata->>'ossDn'
                    from oss_objects o, tvc_entity e
                    where e.jtype='object'
                                                and o.jdata->>'sid'=e.jdata->>'siteId'
                                                and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%zaube%')
                )
order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) desc
limit 20;

Select used to get data for user interface table view window. Table view has several filters what users can apply and subselect realize one of them. Subselect execution time always fine, but main select have some serous performance problems. When subselect replaced with static values, select execute time is fine.
Table oss_alarms very often updated but only new records, there is about 10`000 to 30`000 new records per day and, when they processed, there are no more changes.

Version string    PostgreSQL 10.0 on powerpc64le-unknown-linux-gnu, compiled by gcc (GCC) 6.3.1 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065., 64-bit

I’m gathered some technical information about DB structure and execution (see attachment).

Sometimes LIMIT clause can confuse optimizator, when data are not uniform.

You can try OFFSET 0 trick:

Original query: SELECT * FROM t ORDER BY c LIMIT 10

transform to:

SELECT * FROM (original query without limit OFFSET 0) x LIMIT 10;

Mareks Kalnačs
Software Engineer
Software Development Department
T:             +371 67628888
M:            +371 26479242
@:            Mareks.Kalnacs@datakom.lv
www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia

Вложения

RE: PostgreSQL 10.0 SELECT LIMIT performance problem

От

Mareks Kalnačs

Дата:

12 сентября 2018 г., 22:23:22

Hi Victor!

I will disagree with you about bug. I understand that foreign keys will solve some issues and we already making changes add FK, but this only bypass problem, not solve it and in different conditions we got problem again. Main issue, optimizer not take in account index distribution. In current case main cause actually is subselect. I little bit played around with indexes and different select conditions and got very strange behavior. What we done, add new index for tvc_entity table (idx_tvc_entity_sid_u) and run full analyze for all tables and got problem in reverse, now optimizer thinks that he will get less rows from subselect.

Subselect:

select o.jdata->>'ossDn'

from oss_objects o, tvc_entity e

where e.jtype='object'

and e.jdata->>'siteId' IS NOT NULL

and o.jdata->>'sid'=e.jdata->>'siteId'

and tvc_unaccent(e.jdata->>'name') like tvc_unaccent('%za%')

oss_objects.jdata->>’sid’ distribution is 4.2% , null value count 319, total rows 31693, but execution plan always think that real rows returned will be less than actual:

Gather (cost=1014.16..5280.19 rows=9 width=32) (actual time=1.304..18.982 rows=1291 loops=1)

-> Nested Loop (cost=14.16..4279.28 rows=5 width=32) (actual time=3.172..14.497 rows=646 loops=2)

-> Parallel Bitmap Heap Scan on tvcis.tvc_entity e (cost=13.87..1694.08 rows=149 width=208) (actual time=3.017..11.597 rows=36 loops=2)

-> Index Scan using idx_oss_objects_sid on tvcis.oss_objects o (cost=0.29..17.12 rows=23 width=951) (actual time=0.015..0.054 rows=18 loops=71)

If oss_objects.jdata->>’sid’ distribution is 4.2% then for every unique tvc_entity.jdata->>’siteId’ will be selected 31693 * 4.2% = 23 rows, this means join must return more rows than select from tvc_entity, what is real result, but optimized think in reverse.

We have another example, where we don’t understand how to solve it, and this may not a bug. In this example we have full text search index within one table, and, when we hit value with low hit count, select runs slow:

select a.id, a.jdata

from tvcis.oss_alarms a

where (

to_tsvector(

'simple'::regconfig,

a.ts_vector_fields

) @@ to_tsquery(

'simple',

' (decodedType5b6f0753ossAlarmsGenerator) '

)

order by (tvc_convert_array_to_date(a.jdata -> 'alarmTime')) DESC

limit 20;

Bad case:

Sort (cost=8787.37..8799.58 rows=4885 width=809) (actual time=7.351..7.423 rows=586 loops=1)

Without limit:

Planning time: 0.301 ms

Execution time: 7.574 ms

With limit:

Planning time: 0.312 ms

Execution time: 4972.245 ms

Good case:

From count:

-> Parallel Bitmap Heap Scan on tvcis.oss_alarms a (cost=5080.23..455246.09 rows=248615 width=0) (actual time=1142.280..175706.097 rows=199195 loops=3)

Without limit:

Too many rows to test, get count instead

With limit:

Planning time: 0.370 ms

Execution time: 43.131 ms

But counts is a nightmare:

Planning time: 0.236 ms

Execution time: 176343.730 ms

Couple examples of explain plan attached.

Best regards,

Mareks Kalnačs

Software Engineer

Software Development Department

T: +371 67628888

M: +371 26479242
@: Mareks.Kalnacs@datakom.lv

www.datakom.lv, SIA Datakom, Vienības gatve 109, Rīga, Latvija, LV-1058

From: Victor Yegorov <vyegorov@gmail.com>
Sent: trešdiena, 2018. gada 12. septembris 15:32
To: Mareks Kalnačs <Mareks.Kalnacs@datakom.lv>
Cc: pgsql-bugs@postgresql.org; Māris Rucis <Maris.Rucis@datakom.lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem