Re: Disk-based hash aggregate's cost model
От | Peter Geoghegan |
---|---|
Тема | Re: Disk-based hash aggregate's cost model |
Дата | |
Msg-id | CAH2-WzmSOS9O_ko_pkgHJS0WfA-SOMWATUZuaVGc_ktPoK_DQg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Disk-based hash aggregate's cost model (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Disk-based hash aggregate's cost model
|
Список | pgsql-hackers |
On Wed, Sep 2, 2020 at 5:18 PM Jeff Davis <pgsql@j-davis.com> wrote: > create table text10m(t text collate "C.UTF-8", i int, n numeric); > insert into text10m select s.g::text, s.g, s.g::numeric from (select > (random()*1000000000)::int as g from generate_series(1,10000000)) s; > explain analyze select distinct t from text10m; Note that you won't get what Postgres considers to be the C collation unless you specify "collate C" -- "C.UTF-8" is the C collation exposed by glibc. The difference matters a lot, because only the former can use abbreviated keys (unless you manually #define TRUST_STRXFRM). And even without abbreviated keys it's probably still significantly faster for other reasons. This doesn't undermine your point, because we don't take the difference into account in cost_sort() -- even though abbreviated keys will regularly make text sorts 2x-3x faster. My point is only that it would be more accurate to say that the costing unfairly boosts sorts on collated texts specifically. Though maybe not when an ICU collation is used (since abbreviated keys will be enabled generally). -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: