Обсуждение: Why is sorting on two columns so slower than sorting on one column?

Поиск
Список
Период
Сортировка

Why is sorting on two columns so slower than sorting on one column?

От
Jie Li
Дата:
<font size="2">Hi,<br /><br />Here is the test table, <br /><br />postgres=# \d big_wf <br />    Table
"public.big_wf"<br/> Column |  Type   | Modifiers <br />--------+---------+-----------<br /> age    | integer | <br
/> id    | integer | <br /><br />postgres=# \dt+ big_wf <br />                     List of relations<br /> Schema | 
Name | Type  |  Owner   |  Size  | Description <br />--------+--------+-------+----------+--------+-------------<br
/> public| big_wf | table | workshop | 142 MB | <br /><br /><br />The first query sorting on one column:<br
/>postgres=#explain analyze select * from big_wf order by age;<br
/>                                                      QUERY
PLAN                                                       <br />
-------------------------------------------------------------------------------------------------------------------------<br
/> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=11228.155..16427.149 rows=4100000 loops=1)<br />
  Sort Key: age<br />   Sort Method:  external sort  Disk: 72112kB<br />   ->  Seq Scan on big_wf 
(cost=0.00..59142.00rows=4100000 width=8) (actual time=6.196..4797.620 rows=4100000 loops=1)<br /> Total runtime:
19530.452ms<br /> (5 rows)<br /><br />The second query sorting on two columns:<br />postgres=# explain analyze select *
frombig_wf order by age,id;<br />                                                       QUERY
PLAN                                                       <br />
-------------------------------------------------------------------------------------------------------------------------<br
/> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=37544.779..48206.702 rows=4100000 loops=1)<br />
  Sort Key: age, id<br />   Sort Method:  external merge  Disk: 72048kB<br />   ->  Seq Scan on big_wf 
(cost=0.00..59142.00rows=4100000 width=8) (actual time=6.796..5518.663 rows=4100000 loops=1)<br /> Total runtime:
51258.000ms<br /> (5 rows)<br /><br />The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the first
column(age)of all the tuples are of the same value, so the second column(id)<span
id="internal-source-marker_0.431774702563559"style="font-family: Arial; color: rgb(0, 0, 0); background-color:
transparent;font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;"></span> is
alwaysneeded for comparison.<span id="internal-source-marker_0.431774702563559" style="font-family: Arial; color:
rgb(0,0, 0); background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;
vertical-align:baseline;">  While the first sorting takes about only 6 seconds, the second one takes over 30 seconds, 
Isthis too much than expected? Is there any </span></font><font size="2"><span
id="internal-source-marker_0.431774702563559"style="font-family: Arial; color: rgb(0, 0, 0); background-color:
transparent;font-weight: normal; font-style: normal; text-decoration: none; vertical-align: baseline;">possible
</span></font><spanid="internal-source-marker_0.431774702563559" style="font-size: 11pt; font-family: Arial; color:
rgb(0,0, 0); background-color: transparent; font-weight: normal; font-style: normal; text-decoration: none;
vertical-align:baseline;"><font size="2">optimization ?</font><br /><br />Thanks,<br />Li Jie<br /></span> 

Re: Why is sorting on two columns so slower than sorting on one column?

От
Kenneth Marshall
Дата:
On Thu, Dec 23, 2010 at 02:33:12AM -0500, Jie Li wrote:
> Hi,
> 
> Here is the test table,
> 
> postgres=# \d big_wf
>     Table "public.big_wf"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  age    | integer |
>  id     | integer |
> 
> postgres=# \dt+ big_wf
>                      List of relations
>  Schema |  Name  | Type  |  Owner   |  Size  | Description
> --------+--------+-------+----------+--------+-------------
>  public | big_wf | table | workshop | 142 MB |
> 
> 
> The first query sorting on one column:
> postgres=# explain analyze select * from big_wf order by age;
>                                                        QUERY
> PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual
> time=11228.155..16427.149 rows=4100000 loops=1)
>    Sort Key: age
>    Sort Method:  external sort  Disk: 72112kB
>    ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8)
> (actual time=6.196..4797.620 rows=4100000 loops=1)
>  Total runtime: 19530.452 ms
> (5 rows)
> 
> The second query sorting on two columns:
> postgres=# explain analyze select * from big_wf order by age,id;
>                                                        QUERY
> PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual
> time=37544.779..48206.702 rows=4100000 loops=1)
>    Sort Key: age, id
>    Sort Method:  external merge  Disk: 72048kB
>    ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8)
> (actual time=6.796..5518.663 rows=4100000 loops=1)
>  Total runtime: 51258.000 ms
> (5 rows)
> 
> The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
> first column(age) of all the tuples are of the same value, so the second
> column(id) is always needed for comparison.  While the first sorting takes
> about only 6 seconds, the second one takes over 30 seconds,  Is this too
> much than expected? Is there any possible optimization ?
> 
> Thanks,
> Li Jie

Hi Li,

If I understand your description, in the first query the sort does
not actually have to do anything because the column values for "age"
are all degenerate. In the second query, you actually need to sort
the values which is why it takes longer. If the first column values
are the same, then simply sorting by "id" alone would be faster.
You could also bump up work_mem for the query to perform the sort
in memory.

Regards,
Ken



Re: Why is sorting on two columns so slower than sorting on one column?

От
Marti Raudsepp
Дата:
On Thu, Dec 23, 2010 at 09:33, Jie Li <jay23jack@gmail.com> wrote:
> While the first sorting takes
> about only 6 seconds, the second one takes over 30 seconds,  Is this too
> much than expected? Is there any possible optimization ?

If you're doing these queries often, you should:
CREATE INDEX ix_big_wf_age_id ON big_wf (age, id)

If that's still not fast enough, you can physically sort rows in the
table using the newly created index:
CLUSTER big_wf USING ix_big_wf_age_id;

Please post back your results. :)

Regards,
Marti


Re: Why is sorting on two columns so slower than sortingon one column?

От
Kenneth Marshall
Дата:
On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
> Hi Ken,
> 
> Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are
random.
> 
> postgres=# explain analyze select * from big_wf order by id;
>                                                       QUERY PLAN
 
 
>
-------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1)
>   Sort Key: id
>   Sort Method:  external merge  Disk: 72048kB
>   ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8) (actual time=8.595..5569.500 rows=4100000
loops=1)
> 
> Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm
confusedis that, why is additional comparison really so expensive?  Does it incur additional I/O? From the cost model,
itseems not, all the "cost" are the same (575775.45).
 
> 
> Thanks,
> Li Jie

In the first query, the cost is basically the I/O cost to read the
table from disk. The actual sort does not do anything since the
sort values are the same. In the second query, the sort has to
swap things in memory/disk to get them in the correct order for
the result. This actually takes CPU and possibly additional I/O
which is why it is slower. In the case of sorting by just the "id"
column, the size of the sorted values is smaller which would need
fewer batches to complete the sort since the sort is bigger than
the work_mem.

Cheers,
Ken


Re: Why is sorting on two columns so slower thansortingon one column?

От
Kenneth Marshall
Дата:
On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
> ----- Original Message ----- 
> From: "Kenneth Marshall" <ktm@rice.edu>
> To: "Li Jie" <jay23jack@gmail.com>
> Cc: "pgsql-hackers" <pgsql-hackers@postgresql.org>
> Sent: Thursday, December 23, 2010 10:30 PM
> Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?
> 
> 
> > On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
> >> Hi Ken,
> >> 
> >> Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are
random.
> >> 
> >> postgres=# explain analyze select * from big_wf order by id;
> >>                                                       QUERY PLAN
    
 
> >>
-------------------------------------------------------------------------------------------------------------------------
> >> Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1)
> >>   Sort Key: id
> >>   Sort Method:  external merge  Disk: 72048kB
> >>   ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8) (actual time=8.595..5569.500 rows=4100000
loops=1)
> >> 
> >> Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm
confusedis that, why is additional comparison really so expensive?  Does it incur additional I/O? From the cost model,
itseems not, all the "cost" are the same (575775.45).
 
> >> 
> >> Thanks,
> >> Li Jie
> > 
> > In the first query, the cost is basically the I/O cost to read the
> > table from disk. The actual sort does not do anything since the
> > sort values are the same. In the second query, the sort has to
> > swap things in memory/disk to get them in the correct order for
> > the result. This actually takes CPU and possibly additional I/O
> > which is why it is slower. In the case of sorting by just the "id"
> > column, the size of the sorted values is smaller which would need
> > fewer batches to complete the sort since the sort is bigger than
> > the work_mem.
> > 
> > Cheers,
> > Ken
> 
> Hi Ken,
> 
> Thanks for your analysis.
> 
> But in the last query that sorts on "id",  since the query selects all the columns for output, the actual sorted size
isthe same, and the only difference is the comparison cost. The query sorting on two columns needs to do twice the
comparison.Am I right?
 
> 
> Thanks,
> Li Jie

I think you are right. Sorry for the confusion.

Ken


Re: Why is sorting on two columns so slower thansortingon one column?

От
Tom Lane
Дата:
Kenneth Marshall <ktm@rice.edu> writes:
> On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
>> But in the last query that sorts on "id",  since the query selects all the columns for output, the actual sorted
sizeis the same, and the only difference is the comparison cost. The query sorting on two columns needs to do twice the
comparison.Am I right?
 

> I think you are right. Sorry for the confusion.

I doubt the cost of comparing two integers is the issue here; rather
it's more likely one of how many merge passes were needed.  You could
find out instead of just speculating by turning on trace_sort and
comparing the log outputs.
        regards, tom lane


Re: Why is sorting on two columns so slower thansortingon one column?

От
"Li Jie"
Дата:
----- Original Message ----- 
From: "Kenneth Marshall" <ktm@rice.edu>
To: "Li Jie" <jay23jack@gmail.com>
Cc: "pgsql-hackers" <pgsql-hackers@postgresql.org>
Sent: Thursday, December 23, 2010 10:30 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?


> On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
>> Hi Ken,
>> 
>> Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are
random.
>> 
>> postgres=# explain analyze select * from big_wf order by id;
>>                                                       QUERY PLAN
  
 
>>
-------------------------------------------------------------------------------------------------------------------------
>> Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1)
>>   Sort Key: id
>>   Sort Method:  external merge  Disk: 72048kB
>>   ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8) (actual time=8.595..5569.500 rows=4100000
loops=1)
>> 
>> Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm
confusedis that, why is additional comparison really so expensive?  Does it incur additional I/O? From the cost model,
itseems not, all the "cost" are the same (575775.45).
 
>> 
>> Thanks,
>> Li Jie
> 
> In the first query, the cost is basically the I/O cost to read the
> table from disk. The actual sort does not do anything since the
> sort values are the same. In the second query, the sort has to
> swap things in memory/disk to get them in the correct order for
> the result. This actually takes CPU and possibly additional I/O
> which is why it is slower. In the case of sorting by just the "id"
> column, the size of the sorted values is smaller which would need
> fewer batches to complete the sort since the sort is bigger than
> the work_mem.
> 
> Cheers,
> Ken

Hi Ken,

Thanks for your analysis.

But in the last query that sorts on "id",  since the query selects all the columns for output, the actual sorted size
isthe same, and the only difference is the comparison cost. The query sorting on two columns needs to do twice the
comparison.Am I right?
 

Thanks,
Li Jie

Re: Why is sorting on two columns so slower than sorting on one column?

От
"Li Jie"
Дата:
Hi Marti,

Thanks for your help! I guess I understand what you mean, a clustered index will make sorting as cheap as a seq scan,
right?

But what I meant is, is there any potential optimization for the backend implementation? Intuitively, if sorting on one
columnor two columns will incur the same I/O costs, why should there be so much difference?
 

Thanks,
Li Jie

----- Original Message ----- 
From: "Marti Raudsepp" <marti@juffo.org>
To: "Jie Li" <jay23jack@gmail.com>
Cc: "pgsql-hackers" <pgsql-hackers@postgresql.org>
Sent: Thursday, December 23, 2010 10:17 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower than sorting on one column?


On Thu, Dec 23, 2010 at 09:33, Jie Li <jay23jack@gmail.com> wrote:
> While the first sorting takes
> about only 6 seconds, the second one takes over 30 seconds, Is this too
> much than expected? Is there any possible optimization ?

If you're doing these queries often, you should:
CREATE INDEX ix_big_wf_age_id ON big_wf (age, id)

If that's still not fast enough, you can physically sort rows in the
table using the newly created index:
CLUSTER big_wf USING ix_big_wf_age_id;

Please post back your results. :)

Regards,
Marti

Re: Why is sorting on two columns so slower than sortingon one column?

От
"Li Jie"
Дата:
Hi Ken,

Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are
random.

postgres=# explain analyze select * from big_wf order by id;                                                     QUERY
PLAN                                                       
 

-------------------------------------------------------------------------------------------------------------------------
Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1) Sort
Key:id Sort Method:  external merge  Disk: 72048kB ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8)
(actualtime=8.595..5569.500 rows=4100000 loops=1)
 

Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm
confusedis that, why is additional comparison really so expensive?  Does it incur additional I/O? From the cost model,
itseems not, all the "cost" are the same (575775.45).
 

Thanks,
Li Jie

----- Original Message ----- 
From: "Kenneth Marshall" <ktm@rice.edu>
To: "Jie Li" <jay23jack@gmail.com>
Cc: "pgsql-hackers" <pgsql-hackers@postgresql.org>
Sent: Thursday, December 23, 2010 10:03 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower than sortingon one column?


> On Thu, Dec 23, 2010 at 02:33:12AM -0500, Jie Li wrote:
>> Hi,
>> 
>> Here is the test table,
>> 
>> postgres=# \d big_wf
>>     Table "public.big_wf"
>>  Column |  Type   | Modifiers
>> --------+---------+-----------
>>  age    | integer |
>>  id     | integer |
>> 
>> postgres=# \dt+ big_wf
>>                      List of relations
>>  Schema |  Name  | Type  |  Owner   |  Size  | Description
>> --------+--------+-------+----------+--------+-------------
>>  public | big_wf | table | workshop | 142 MB |
>> 
>> 
>> The first query sorting on one column:
>> postgres=# explain analyze select * from big_wf order by age;
>>                                                        QUERY
>> PLAN
>>
-------------------------------------------------------------------------------------------------------------------------
>>  Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual
>> time=11228.155..16427.149 rows=4100000 loops=1)
>>    Sort Key: age
>>    Sort Method:  external sort  Disk: 72112kB
>>    ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8)
>> (actual time=6.196..4797.620 rows=4100000 loops=1)
>>  Total runtime: 19530.452 ms
>> (5 rows)
>> 
>> The second query sorting on two columns:
>> postgres=# explain analyze select * from big_wf order by age,id;
>>                                                        QUERY
>> PLAN
>>
-------------------------------------------------------------------------------------------------------------------------
>>  Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual
>> time=37544.779..48206.702 rows=4100000 loops=1)
>>    Sort Key: age, id
>>    Sort Method:  external merge  Disk: 72048kB
>>    ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8)
>> (actual time=6.796..5518.663 rows=4100000 loops=1)
>>  Total runtime: 51258.000 ms
>> (5 rows)
>> 
>> The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
>> first column(age) of all the tuples are of the same value, so the second
>> column(id) is always needed for comparison.  While the first sorting takes
>> about only 6 seconds, the second one takes over 30 seconds,  Is this too
>> much than expected? Is there any possible optimization ?
>> 
>> Thanks,
>> Li Jie
> 
> Hi Li,
> 
> If I understand your description, in the first query the sort does
> not actually have to do anything because the column values for "age"
> are all degenerate. In the second query, you actually need to sort
> the values which is why it takes longer. If the first column values
> are the same, then simply sorting by "id" alone would be faster.
> You could also bump up work_mem for the query to perform the sort
> in memory.
> 
> Regards,
> Ken
>

Re: Why is sorting on two columns so slower thansortingon one column?

От
Jie Li
Дата:


On Thu, Dec 23, 2010 at 10:26 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Kenneth Marshall <ktm@rice.edu> writes:
> On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
>> But in the last query that sorts on "id",  since the query selects all the columns for output, the actual sorted size is the same, and the only difference is the comparison cost. The query sorting on two columns needs to do twice the comparison. Am I right?

> I think you are right. Sorry for the confusion.

I doubt the cost of comparing two integers is the issue here; rather
it's more likely one of how many merge passes were needed.  You could
find out instead of just speculating by turning on trace_sort and
comparing the log outputs.

                       regards, tom lane

I follow your advice and set the trace_sort, here is the two queries result, the log outputs looks similar and  they seem to use the same number of tapes. But the time is different. Could you have a look:

postgres=# explain analyze select * from big_wf order by id;
                                                      QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=28102.985..39351.309 rows=4100000 loops=1)
  Sort Key: id
  Sort Method:  external merge  Disk: 72048kB
  ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8) (actual time=9.190..7262.789 rows=4100000 loops=1)
Total runtime: 42953.855 ms
(5 rows)
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  begin tuple sort: nkeys = 1, workMem = 20480, randomAccess = f
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  switching to external sort with 74 tapes: CPU 0.29s/0.28u sec elapsed 0.71 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 1 to tape 0: CPU 0.68s/2.12u sec elapsed 3.41 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 2 to tape 1: CPU 1.22s/4.24u sec elapsed 6.53 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 3 to tape 2: CPU 1.67s/6.38u sec elapsed 9.67 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 4 to tape 3: CPU 2.22s/8.61u sec elapsed 12.93 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 5 to tape 4: CPU 2.72s/10.80u sec elapsed 16.09 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 6 to tape 5: CPU 3.23s/12.86u sec elapsed 19.12 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 7 to tape 6: CPU 3.81s/15.02u sec elapsed 23.84 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 8 to tape 7: CPU 4.36s/17.13u sec elapsed 27.05 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  performsort starting: CPU 4.38s/17.20u sec elapsed 27.15 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing run 9 to tape 8: CPU 4.39s/17.88u sec elapsed 27.97 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  finished writing final run 10 to tape 9: CPU 4.39s/17.88u sec elapsed 27.97 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  performsort done (except 10-way final merge): CPU 4.39s/17.98u sec elapsed 28.10 sec
STATEMENT:  explain analyze select * from big_wf order by id;
LOG:  external sort ended, 9006 disk blocks used: CPU 8.01s/27.02u sec elapsed 42.92 sec
STATEMENT:  explain analyze select * from big_wf order by id;


postgres=# explain analyze select * from big_wf order by age,id;
Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=43709.851..57048.645 rows=4100000 loops=1)
  Sort Key: age, id
  Sort Method:  external merge  Disk: 72048kB
  ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8) (actual time=10.090..7075.208 rows=4100000 loops=1)
Total runtime: 60721.824 ms

STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  begin tuple sort: nkeys = 2, workMem = 20480, randomAccess = f
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  switching to external sort with 74 tapes: CPU 0.28s/0.30u sec elapsed 0.67 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 1 to tape 0: CPU 0.71s/3.63u sec elapsed 4.90 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 2 to tape 1: CPU 1.30s/7.53u sec elapsed 10.30 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 3 to tape 2: CPU 1.88s/11.36u sec elapsed 15.39 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 4 to tape 3: CPU 2.43s/15.20u sec elapsed 20.51 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 5 to tape 4: CPU 3.05s/18.96u sec elapsed 25.44 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 6 to tape 5: CPU 3.68s/22.74u sec elapsed 30.47 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 7 to tape 6: CPU 4.24s/26.63u sec elapsed 36.61 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 8 to tape 7: CPU 4.78s/30.41u sec elapsed 41.56 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  performsort starting: CPU 4.81s/30.56u sec elapsed 41.75 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing run 9 to tape 8: CPU 4.84s/32.07u sec elapsed 43.56 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  finished writing final run 10 to tape 9: CPU 4.84s/32.07u sec elapsed 43.56 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  performsort done (except 10-way final merge): CPU 4.85s/32.16u sec elapsed 43.70 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;
LOG:  external sort ended, 9006 disk blocks used: CPU 8.60s/41.93u sec elapsed 60.73 sec
STATEMENT:  explain analyze select * from big_wf order by age,id;

Re: Why is sorting on two columns so slower thansortingon one column?

От
Simon Riggs
Дата:
On Fri, 2010-12-24 at 00:27 -0500, Jie Li wrote:

>         I doubt the cost of comparing two integers is the issue here;
>         rather
>         it's more likely one of how many merge passes were needed.
>          You could
>         find out instead of just speculating by turning on trace_sort
>         and
>         comparing the log outputs.

> postgres=# explain analyze select * from big_wf order by id;

> LOG:  switching to external sort with 74 tapes: CPU 0.29s/0.28u sec
> elapsed 0.71 sec
> LOG:  external sort ended, 9006 disk blocks used: CPU 8.01s/27.02u
> sec 

> elapsed 42.92 sec
> STATEMENT:  explain analyze select * from big_wf order by id;


> STATEMENT:  explain analyze select * from big_wf order by age,id;

> LOG:  begin tuple sort: nkeys = 2, workMem = 20480, randomAccess = f
> STATEMENT:  explain analyze select * from big_wf order by age,id;
> LOG:  switching to external sort with 74 tapes: CPU 0.28s/0.30u sec
> elapsed 0.67 sec
> LOG:  external sort ended, 9006 disk blocks used: CPU 8.60s/41.93u sec
> elapsed 60.73 sec
> STATEMENT:  explain analyze select * from big_wf order by age,id;

I think the answer is that only the first column comparison is
optimised. Second and subsequent comparisons are not optimised.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: Why is sorting on two columns so slower thansortingon one column?

От
Robert Haas
Дата:
On Mon, Dec 27, 2010 at 3:58 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I think the answer is that only the first column comparison is
> optimised. Second and subsequent comparisons are not optimised.

What sort of optimization are you referring to here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company