Обсуждение: Slow query with 'or' clause
Hi
I'm using posstgresql 8.1.4 on linux 2.6
shared_buffers = tested with 3000 and 10000
temp_buffers = 1000
work_mem = 4096
effective_cache_size = 65536
random_page_cost = 2
I have a query which I think is anormaly slow with à 'OR'
select count(*) from client_contact
left join client_company using(cli_id)
where (cli_mail = 'xxx@xxx.xx') OR
(lower(cli_nom) = 'xxxxxx' and zipcode = '10001');
if I split this query in 2 query like this
first
select count(*) from client_contact
left join client_company using(cli_id)
where (cli_mail = 'xxx@xxx.xx')
second
select count(*) from client_contact
left join client_company using(cli_id)
where (lower(cli_nom) = 'xxxxxx' and zipcode = '10001');
each query are under 100 ms
Why postgresql think scanning index on cli_nom and cli_mail is not a good thing
with the OR clause ?
I hope you can help me understanding the problem
regards,
explain analyse
select count(*) from client_contact
left join client_company using(cli_id)
where (cli_mail = 'xxx@xxx.xx') OR
(lower(cli_nom) = 'xxxxxx' and zipcode = '10001');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=37523.98..37523.99 rows=1 width=0) (actual
time=3871.086..3871.087 rows=1 loops=1)
-> Merge Left Join (cost=0.00..36719.10 rows=321952 width=0) (actual
time=3871.058..3871.058 rows=0 loops=1)
Merge Cond: ("outer".cli_id = "inner".cli_id)
Filter: ((("outer".cli_mail)::text = 'xxx@xxx.xx'::text) OR
((lower(("outer".cli_nom)::text) = 'xxxxxx'::text) AND (("inner".zipcode)::text
= '10001'::text)))
-> Index Scan using client_pkey on client_contact
(cost=0.00..14801.29 rows=321952 width=38) (actual time=0.110..1130.134
rows=321152 loops=1)
-> Index Scan using client_company_cli_id_idx on client_company
(cost=0.00..13891.30 rows=321114 width=12) (actual time=0.097..1171.905
rows=321152 loops=1)
Total runtime: 3871.443 ms
explain analyse
select count(*) from client_contact
left join client_company using(cli_id)
where (cli_mail = 'xxx@xxx.xx')
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2750.11..2750.12 rows=1 width=0) (actual time=23.930..23.932
rows=1 loops=1)
-> Nested Loop Left Join (cost=0.00..2750.08 rows=11 width=0) (actual
time=23.918..23.918 rows=0 loops=1)
-> Index Scan using email_client on client_contact
(cost=0.00..2711.33 rows=11 width=4) (actual time=23.913..23.913 rows=0
loops=1)
Index Cond: ((cli_mail)::text = 'xxx@xxx.xx'::text)
-> Index Scan using client_company_cli_id_idx on client_company
(cost=0.00..3.51 rows=1 width=4) (never executed)
Index Cond: ("outer".cli_id = client_company.cli_id)
Total runtime: 24.018 ms
explain analyse
select count(*) from client_contact
left join client_company using(cli_id)
where
(lower(cli_nom) = 'xxxxxx' and zipcode = '10001');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=107.18..107.19 rows=1 width=0) (actual time=84.935..84.936
rows=1 loops=1)
-> Nested Loop (cost=0.00..107.17 rows=1 width=0) (actual
time=84.928..84.928 rows=0 loops=1)
-> Index Scan using client_contact_cli_nom_idx on client_contact
(cost=0.00..40.19 rows=19 width=4) (actual time=84.832..84.835 rows=1 loops=1)
Index Cond: (lower((cli_nom)::text) = 'xxxxxx'::text)
-> Index Scan using client_company_cli_id_idx on client_company
(cost=0.00..3.51 rows=1 width=4) (actual time=0.083..0.083 rows=0 loops=1)
Index Cond: ("outer".cli_id = client_company.cli_id)
Filter: ((zipcode)::text = '10001'::text)
Total runtime: 85.013 ms
philippe wrote: > explain analyse > select count(*) from client_contact > left join client_company using(cli_id) > where (cli_mail = 'xxx@xxx.xx') OR > (lower(cli_nom) = 'xxxxxx' and zipcode = '10001'); > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=37523.98..37523.99 rows=1 width=0) (actual > time=3871.086..3871.087 rows=1 loops=1) > -> Merge Left Join (cost=0.00..36719.10 rows=321952 width=0) (actual > time=3871.058..3871.058 rows=0 loops=1) This is the root of the problem - it's expecting to match over 320000 rows rather than 0. I'm guessing there's a lot of correlation between cli_mail and cli_nom (you're expecting them to match the same clients) but the planner doesn't know this. If this is a common query, you could try an index on zipcode - that might cut down the other side. However, I have to ask why you're using a left-join? Do you really have rows in client_contact without a matching cli_id in client_company? -- Richard Huxton Archonet Ltd
Selon Richard Huxton <dev@archonet.com>: > philippe wrote: > > explain analyse > > select count(*) from client_contact > > left join client_company using(cli_id) > > where (cli_mail = 'xxx@xxx.xx') OR > > (lower(cli_nom) = 'xxxxxx' and zipcode = '10001'); > > > > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Aggregate (cost=37523.98..37523.99 rows=1 width=0) (actual > > time=3871.086..3871.087 rows=1 loops=1) > > -> Merge Left Join (cost=0.00..36719.10 rows=321952 width=0) (actual > > time=3871.058..3871.058 rows=0 loops=1) > > This is the root of the problem - it's expecting to match over 320000 > rows rather than 0. > > I'm guessing there's a lot of correlation between cli_mail and cli_nom > (you're expecting them to match the same clients) but the planner > doesn't know this. > > If this is a common query, you could try an index on zipcode - that > might cut down the other side. > > However, I have to ask why you're using a left-join? Do you really have > rows in client_contact without a matching cli_id in client_company? > You are right, I was focused on server perf and I should have analysed my query. Query time is ok now. thanks you !! > -- > Richard Huxton > Archonet Ltd > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >