optimisation? collation "C" sorting for GroupAggregate for all deterministic collations
| От | Maxim Ivanov |
|---|---|
| Тема | optimisation? collation "C" sorting for GroupAggregate for all deterministic collations |
| Дата | |
| Msg-id | fU99oVcSherOv0kdkueSvHPdlcPmOt_BrQQYTphlHqQtm12InTe2uJJLiJpxL1UKBF95XSXayj4nLkfzu7Xd2Q5cwWJa8cxRjQ0kh_9q1v0=@yamlcoder.me обсуждение исходный текст |
| Ответы |
Re: optimisation? collation "C" sorting for GroupAggregate for alldeterministic collations
|
| Список | pgsql-hackers |
Hi All,
It is known, that collation "C" significantly speeds up string comparisons and as a result sorting. I was wondering, whether it is possible to use it regardless of collation set on a column in sorts not visible to users?
Example I have in mind is sorting performed for GroupAggregate. Purpose of that sort is to bring equal values next to each other, so as long as:
1) user didn't request ORDER BY in addition to GROUP BY
2) source column has any deterministic collation (as per docs all builtin collations are deterministic)
it seems to be possible to do sorting with any deterministic collation, regardless of what user specifid for the column being sorted. "C" collation is deterministic and fastest.
Example I have in mind is sorting performed for GroupAggregate. Purpose of that sort is to bring equal values next to each other, so as long as:
1) user didn't request ORDER BY in addition to GROUP BY
2) source column has any deterministic collation (as per docs all builtin collations are deterministic)
it seems to be possible to do sorting with any deterministic collation, regardless of what user specifid for the column being sorted. "C" collation is deterministic and fastest.
In other words, couldn't PostgreSQL convert this:
Group Key: ec_180days.msn, ec_180days.to_date_time
-> Sort (cost=15726557.87..15906557.89 rows=72000008 width=113) (actual time=490094.849..524854.662 rows=72000000 loops=1)
Sort Key: ec_180days.msn, ec_180days.to_date_time
Sort Method: external merge Disk: 7679136kB
To this:
Group Key: ec_180days.msn, ec_180days.to_date_time
-> Sort (cost=14988274.87..15168274.89 rows=72000008 width=92) (actual time=140489.807..174228.722 rows=72000000 loops=1)
Sort Key: ec_180days.msn COLLATE "C", ec_180days.to_date_time
Sort Method: external merge Disk: 7679136kB
which is 3 times faster in my tests.
В списке pgsql-hackers по дате отправления: