Обсуждение: Query Optimization

Поиск
Список
Период
Сортировка

Query Optimization

От
Zach Calvert
Дата:
Sorry for the cross post - but I'm not sure my original posting to the
performance mailing list was the right place to send my question.  So,
let me try again at the novice list.

I have a query and I have run
explain analyze
select count(*)
from score
where leaderboardid=35 and score <= 6841 and active

The result is
"Aggregate  (cost=2491.06..2491.07 rows=1 width=0) (actual
time=38.878..38.878 rows=1 loops=1)"
"  ->  Seq Scan on score  (cost=0.00..2391.17 rows=39954 width=0)
(actual time=0.012..30.760 rows=38571 loops=1)"
"        Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
"Total runtime: 38.937 ms"

I have an index on score, I have an index on score, leaderboardid, and
active and still it does a sequential scan.  I can't seem to figure
out how to create an index that will
turn that "Seq Scan" into an index scan. The biggest problem is that
the query degrades very quickly with a lot more rows and I will be
getting A LOT MORE rows.  What can I do to improve the performance of
this query?



Thanks a bunch,
ZC

Re: Query Optimization

От
Luiz Eduardo Cantanhede Neri
Дата:
From what I noticed yout problem is the seq_scan
Seq Scan on score  (cost=0.00..2391.17 rows=39954 width=0)
(actual time=0.012..30.760 rows=38571 loops=1)"

You'll always should void scans, table scan, index scan, etc...

On Wed, May 27, 2009 at 11:28 AM, Zach Calvert <zachcalvert@hemerasoftware.com> wrote:
Sorry for the cross post - but I'm not sure my original posting to the
performance mailing list was the right place to send my question.  So,
let me try again at the novice list.

I have a query and I have run
explain analyze
select count(*)
from score
where leaderboardid=35 and score <= 6841 and active

The result is
"Aggregate  (cost=2491.06..2491.07 rows=1 width=0) (actual
time=38.878..38.878 rows=1 loops=1)"
"  ->  Seq Scan on score  (cost=0.00..2391.17 rows=39954 width=0)
(actual time=0.012..30.760 rows=38571 loops=1)"
"        Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
"Total runtime: 38.937 ms"

I have an index on score, I have an index on score, leaderboardid, and
active and still it does a sequential scan.  I can't seem to figure
out how to create an index that will
turn that "Seq Scan" into an index scan. The biggest problem is that
the query degrades very quickly with a lot more rows and I will be
getting A LOT MORE rows.  What can I do to improve the performance of
this query?



Thanks a bunch,
ZC

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

Re: Query Optimization

От
Sean Davis
Дата:


On Wed, May 27, 2009 at 10:50 AM, Luiz Eduardo Cantanhede Neri <lecneri@gmail.com> wrote:
From what I noticed yout problem is the seq_scan
Seq Scan on score  (cost=0.00..2391.17 rows=39954 width=0)
(actual time=0.012..30.760 rows=38571 loops=1)"

You'll always should void scans, table scan, index scan, etc...

On Wed, May 27, 2009 at 11:28 AM, Zach Calvert <zachcalvert@hemerasoftware.com> wrote:
Sorry for the cross post - but I'm not sure my original posting to the
performance mailing list was the right place to send my question.  So,
let me try again at the novice list.

I have a query and I have run
explain analyze
select count(*)
from score
where leaderboardid=35 and score <= 6841 and active

The result is
"Aggregate  (cost=2491.06..2491.07 rows=1 width=0) (actual
time=38.878..38.878 rows=1 loops=1)"
"  ->  Seq Scan on score  (cost=0.00..2391.17 rows=39954 width=0)
(actual time=0.012..30.760 rows=38571 loops=1)"
"        Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
"Total runtime: 38.937 ms"

I have an index on score, I have an index on score, leaderboardid, and
active and still it does a sequential scan.  I can't seem to figure
out how to create an index that will
turn that "Seq Scan" into an index scan. The biggest problem is that
the query degrades very quickly with a lot more rows and I will be
getting A LOT MORE rows.  What can I do to improve the performance of
this query?

Postgresql is aware of the "cost" associated with each query.  In the case of a small table with an index that is not very discriminative, it may choose a sequential scan.  However, as you add more rows, the index scan may become more effective and may be used instead.  One thing to keep in mind is that an index scan is NOT always faster than a sequential scan.

Sean
 

Re: Query Optimization

От
Tom Lane
Дата:
Sean Davis <sdavis2@mail.nih.gov> writes:
>> zachcalvert@hemerasoftware.com> wrote:
>>> I have a query and I have run
>>> explain analyze
>>> select count(*)
>>> from score
>>> where leaderboardid=35 and score <= 6841 and active
>>>
>>> The result is
>>> "Aggregate  (cost=2491.06..2491.07 rows=1 width=0) (actual
>>> time=38.878..38.878 rows=1 loops=1)"
>>> "  ->  Seq Scan on score  (cost=0.00..2391.17 rows=39954 width=0)
>>> (actual time=0.012..30.760 rows=38571 loops=1)"
>>> "        Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
>>> "Total runtime: 38.937 ms"
>>>
>>> I have an index on score, I have an index on score, leaderboardid, and
>>> active and still it does a sequential scan.

> Postgresql is aware of the "cost" associated with each query.  In the case
> of a small table with an index that is not very discriminative, it may
> choose a sequential scan.  However, as you add more rows, the index scan may
> become more effective and may be used instead.  One thing to keep in mind is
> that an index scan is NOT always faster than a sequential scan.

A crude rule of thumb is that you need the query to fetch less than
ten percent of the rows before a bitmap scan is going to be a win,
and less than one percent before a plain indexscan is going to be a win.
(If your database is entirely cached in memory then the crossover
percentages are higher, and you need to adjust the planner's cost
parameters so that it gets this right.)  It's not clear exactly how
big this table is, but I'm betting the query is fetching more than
ten percent of it.

One point worth making is that if this is the typical set of conditions
in your queries, then the best index would be one on (leaderboardid,
score) not (score, leaderboardid, active).  (I'm betting that the
condition active = true is so nonselective it's not worth keeping it in
the index at all.)  You want equality conditions on the leading
column(s) and inequalities on the trailing columns.  To see why this is,
think about the index sort ordering and the portion of the index that
the query will have to scan.  In the latter case the set of index
entries matching this query is a contiguous group; in the former, not.

Our fine manual has a reasonable amount of detail about proper index
design:
http://www.postgresql.org/docs/8.3/static/indexes.html

            regards, tom lane

Re: Query Optimization

От
Zach Calvert
Дата:
I've added the leaderboardid, score index and it still does the
sequential scan.  I'm going to try inserting a few hundred thousand
rows and seeing if it switches to index scan.

On Wed, May 27, 2009 at 10:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sean Davis <sdavis2@mail.nih.gov> writes:
>>> zachcalvert@hemerasoftware.com> wrote:
>>>> I have a query and I have run
>>>> explain analyze
>>>> select count(*)
>>>> from score
>>>> where leaderboardid=35 and score <= 6841 and active
>>>>
>>>> The result is
>>>> "Aggregate  (cost=2491.06..2491.07 rows=1 width=0) (actual
>>>> time=38.878..38.878 rows=1 loops=1)"
>>>> "  ->  Seq Scan on score  (cost=0.00..2391.17 rows=39954 width=0)
>>>> (actual time=0.012..30.760 rows=38571 loops=1)"
>>>> "        Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
>>>> "Total runtime: 38.937 ms"
>>>>
>>>> I have an index on score, I have an index on score, leaderboardid, and
>>>> active and still it does a sequential scan.
>
>> Postgresql is aware of the "cost" associated with each query.  In the case
>> of a small table with an index that is not very discriminative, it may
>> choose a sequential scan.  However, as you add more rows, the index scan may
>> become more effective and may be used instead.  One thing to keep in mind is
>> that an index scan is NOT always faster than a sequential scan.
>
> A crude rule of thumb is that you need the query to fetch less than
> ten percent of the rows before a bitmap scan is going to be a win,
> and less than one percent before a plain indexscan is going to be a win.
> (If your database is entirely cached in memory then the crossover
> percentages are higher, and you need to adjust the planner's cost
> parameters so that it gets this right.)  It's not clear exactly how
> big this table is, but I'm betting the query is fetching more than
> ten percent of it.
>
> One point worth making is that if this is the typical set of conditions
> in your queries, then the best index would be one on (leaderboardid,
> score) not (score, leaderboardid, active).  (I'm betting that the
> condition active = true is so nonselective it's not worth keeping it in
> the index at all.)  You want equality conditions on the leading
> column(s) and inequalities on the trailing columns.  To see why this is,
> think about the index sort ordering and the portion of the index that
> the query will have to scan.  In the latter case the set of index
> entries matching this query is a contiguous group; in the former, not.
>
> Our fine manual has a reasonable amount of detail about proper index
> design:
> http://www.postgresql.org/docs/8.3/static/indexes.html
>
>                        regards, tom lane
>

Re: Query Optimization

От
Tom Lane
Дата:
Zach Calvert <zachcalvert@hemerasoftware.com> writes:
> I've added the leaderboardid, score index and it still does the
> sequential scan.  I'm going to try inserting a few hundred thousand
> rows and seeing if it switches to index scan.

You didn't answer the critical question, which is what fraction of the
table rows this query selects.  If it's a large fraction then the
planner never will switch to indexscan, no matter what the table size
--- and it'll be doing the right thing.  Indexscans aren't magic.

            regards, tom lane

Re: Query Optimization

От
Zach Calvert
Дата:
Ah, I thought it was the other way.  I thought a significant count of
rows WOULD switch to an index scan.  Yes, this query can easily obtain
around 30-50% of the rows in the table.  My apologies.



Zach Calvert

On Wed, May 27, 2009 at 2:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Zach Calvert <zachcalvert@hemerasoftware.com> writes:
>> I've added the leaderboardid, score index and it still does the
>> sequential scan.  I'm going to try inserting a few hundred thousand
>> rows and seeing if it switches to index scan.
>
> You didn't answer the critical question, which is what fraction of the
> table rows this query selects.  If it's a large fraction then the
> planner never will switch to indexscan, no matter what the table size
> --- and it'll be doing the right thing.  Indexscans aren't magic.
>
>                        regards, tom lane
>