Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.

Поиск
Список
Период
Сортировка
От Emil Iggland
Тема Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.
Дата
Msg-id 4eacd5f7-1f8d-1e0d-ad9f-61f6c5ab706e@metrima.com
обсуждение исходный текст
Ответы Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Dear mailing list

We are investigating a strange performance issue with our database.

Our use case is a sensor reading database where we have sensor location 
(called channels), parameter settings (called valueseries) and reading 
(called datavalues). Datavalues is partitioned per month.
Like this: channel <-(FK)- valueseries <-(FK)- datavalues

We have a query that returns the latest sensor reading. When we have no 
readings in datavalues the query is SLOW, when we have 1 or more 
readings in datavalues the query is FAST. (slow being ~1second, fast 
~5ms) It isn't the slowness that is the main problem, but rather the odd 
behaviour.

The query that is giving us issues is the following, channel 752433 has 
NO values, 752431 has values.
(Channel 752433 only has valueseries 752434)

select * from datavalue
where dataview in ( select id from valueseries where channel = 
%channel_idx%)
ORDER BY VALUETIMESTAMP DESC
FETCH FIRST ROW only;
        
Running explain analyze shows strange numbers, 52'000 rows are being 
returned but there are no rows there.

For channel 752433
->  Index Scan Backward using 
datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03 
datavalue_6  (cost=0.42..7166.19 rows=119673 width=226) (actual 
time=0.008..32.831 rows=119601 loops=1)
->  Index Scan Backward using 
datavalue_2022_04_valuetimestamp_dataview_idx on datavalue_2022_04 
datavalue_7  (cost=0.29..4002.79 rows=52499 width=227) (actual 
time=0.011..15.005 rows=52499 loops=1)
    
For channel 752431
->  Index Scan Backward using 
datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03 
datavalue_6  (cost=0.42..7166.19 rows=119673 width=226) (actual 
time=0.008..0.008 rows=1 loops=1)
->  Index Scan Backward using 
datavalue_2022_04_valuetimestamp_dataview_idx on datavalue_2022_04 
datavalue_7  (cost=0.29..4002.79 rows=52499 width=227) (actual 
time=0.011..0.011 rows=1 loops=1)        
    
Inserting even a single row changes the offending rows to the expected 
values:

insert into maclient.datavalue (dataview, valuetimestamp, datavalue) 
values (752434, '2022-03-01 00:00:00', 234);
->  Index Scan Backward using 
datavalue_2022_03_valuetimestamp_dataview_idx on datavalue_2022_03 
datavalue_6  (cost=0.42..7166.19 rows=119673 width=226) (actual 
time=0.006..0.006 rows=1 loops=1)
                
Full explain analyze on https://paste.depesz.com/s/ZwJ
with buffers and track_io_timing: https://paste.depesz.com/s/Ss

Disabling indexscan (set enable_indexscan=false;) hides the problem, it 
does not show up with a bitmap index scan.
Running autovacuum analyze doesn't seem to help, the results are the same.


SELECT version();
"PostgreSQL 14.2, compiled by Visual C++ build 1914, 64-bit"


Can anyone explain what is going on here.
* Why is the database returning 52'000+ rows when it should be returning 0?
* Is my query badly formulated?
* Is there something wrong with the indexes and I need to rebuild them?
    
We are stumped, and would greatly appreciate any input.

Regards
Emil



В списке pgsql-performance по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: Query Planner not taking advantage of HASH PARTITION
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Performance differential when 0 values present vs when 1 values present. Planner return 52k rows when 0 expected.