Query with order by and limit is very slow - wrong index used

Поиск
Список
Период
Сортировка
От Nowak Michał
Тема Query with order by and limit is very slow - wrong index used
Дата
Msg-id 6B8FADC8-9692-439F-8AA4-0CAA287FCEB0@me.com
обсуждение исходный текст
Ответы Re: Query with order by and limit is very slow - wrong index used  (Gregg Jaskiewicz <gryzman@gmail.com>)
Re: Query with order by and limit is very slow - wrong index used  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Query with order by and limit is very slow - wrong index used  (Marcin Mańk <marcin.mank@gmail.com>)
Список pgsql-performance
Since it's my first on this list, I'd like to say "Hi guys" :)

Here is definition of my table:
a9-dev=> \d records;
                             Table "public.records"
                Column                |            Type             | Modifiers
--------------------------------------+-----------------------------+-----------
 id                                   | bigint                      | not null
 checksum                             | character varying(32)       | not null
 data                                 | text                        | not null
 delete_date                          | timestamp without time zone |
 last_processing_date                 | timestamp without time zone |
 object_id                            | character varying(255)      | not null
 processing_path                      | character varying(255)      | not null
 schema_id                            | character varying(255)      | not null
 source_id                            | character varying(255)      | not null
 source_object_last_modification_date | timestamp without time zone | not null
Indexes:
    "records_pkey" PRIMARY KEY, btree (id)
    "unq_records_0" UNIQUE, btree (object_id, schema_id, source_id, processing_path)
    "length_processing_path_id_idx" btree (length(processing_path::text), id)
    "length_processing_path_idx" btree (length(processing_path::text))
    "object_id_id_idx" btree (object_id, id)
    "schema_id_id_idx" btree (schema_id, id)
    "schema_id_idx" btree (schema_id)
    "source_id_id_idx" btree (source_id, id)
    "source_id_idx" btree (source_id)
    "source_object_last_modification_date_id_idx" btree (source_object_last_modification_date, id)
    "source_object_last_modification_date_idx" btree (source_object_last_modification_date)

Average length of value of "data" column =  2991.7947061626100466

When I perform query such as this:  "select * from records where source_id = 'XXX' order by id limit 200;" I expect DB
touse index source_id_id_idx  with XXX as filter. It is true for all but one values of XXX - when I ask for records
withmost common source_id, records_pkey index is used instead and performance is terrible! Explain analyze results
below.

a9-dev=> explain analyze select * from records where source_id ='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'
orderby id limit 200;     

                                                                         QUERY PLAN
                                

---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..755.61 rows=200 width=1127) (actual time=75.292..684.582 rows=200 loops=1)
   ->  Index Scan using source_id_id_idx on records  (cost=0.00..1563542.89 rows=413849 width=1127) (actual
time=75.289..684.495rows=200 loops=1) 
         Index Cond: ((source_id)::text = 'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'::text)
 Total runtime: 690.358 ms
(4 rows)

a9-dev=> explain analyze select * from records where source_id
='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'order by id limit 200;    

                                                                            QUERY PLAN
                                     

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..466.22 rows=200 width=1127) (actual time=124093.485..124095.540 rows=200 loops=1)
   ->  Index Scan using records_pkey on records  (cost=0.00..2333280.84 rows=1000937 width=1127) (actual
time=124093.484..124095.501rows=200 loops=1) 
         Filter: ((source_id)::text = 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'::text)
 Total runtime: 124130.247 ms
(4 rows)


Some info about data distrubution:

a9-dev=> select min(id) from records;
  min
--------
 190830
(1 row)

a9-dev=> select min(id), max(id) from records where source_id='http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml';
   min   |   max
---------+---------
 1105217 | 3811326
(1 row)
a9-dev=> select min(id), max(id) from records where source_id='http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml';
   min   |   max
---------+---------
 1544991 | 3811413
(1 row)

a9-dev=> select min(id), max(id) from (select id from records where source_id =
'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'order by id limit 200) as a; 
  min   |   max
---------+---------
1105217 | 1105416
(1 row)

a9-dev=> select min(id), max(id) from (select id from records where source_id =
'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'order by id limit 200) as a; 
  min   |   max
---------+---------
1544991 | 1545190
(1 row)



a9-dev=> select source_id, count(*) from records where source_id =
'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml'or source_id =
'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml'group by source_id; 
                       source_id                        | count
--------------------------------------------------------+--------
 http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml    | 427254
 http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml | 989184
(2 rows)

a9-dev=> select count(*) from records;
  count
---------
 3620311
(1 row)


DB settings:

a9-dev=>  SELECT
a9-dev->    'version'::text AS "name",
a9-dev->    version() AS "current_setting"
a9-dev->  UNION ALL
a9-dev->  SELECT
a9-dev->    name,current_setting(name)
a9-dev->  FROM pg_settings
a9-dev->  WHERE NOT source='default' AND NOT name IN
a9-dev->    ('config_file','data_directory','hba_file','ident_file',
a9-dev(>    'log_timezone','DateStyle','lc_messages','lc_monetary',
a9-dev(>    'lc_numeric','lc_time','timezone_abbreviations',
a9-dev(>    'default_text_search_config','application_name',
a9-dev(>    'transaction_deferrable','transaction_isolation',
a9-dev(>    'transaction_read_only');
           name           |                                                 current_setting
                    

--------------------------+-----------------------------------------------------------------------------------------------------------------
 version                  | PostgreSQL 9.0.4 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.6.0 20110530 (Red
Hat4.6.0-9), 64-bit 
 lc_collate               | en_US.UTF-8
 lc_ctype                 | en_US.UTF-8
 listen_addresses         | *
 log_rotation_age         | 1d
 log_rotation_size        | 0
 log_truncate_on_rotation | on
 logging_collector        | on
 max_connections          | 100
 max_stack_depth          | 2MB
 port                     | 5432
 server_encoding          | UTF8
 shared_buffers           | 24MB
 TimeZone                 | Poland
(14 rows)


This query was always slow. Autovacuum is on, and I ran VACUUM ANALYZE manually few minutes before writing this email.

Please help me with my problem. I'll be happy to provide any additional information if needed.
Michal Nowak


В списке pgsql-performance по дате отправления:

Предыдущее
От: David Boreham
Дата:
Сообщение: Re: Suggestions for Intel 710 SSD test
Следующее
От: Gregg Jaskiewicz
Дата:
Сообщение: Re: Query with order by and limit is very slow - wrong index used