Re: Feature request for adoptive indexes
От | Tomas Vondra |
---|---|
Тема | Re: Feature request for adoptive indexes |
Дата | |
Msg-id | a2183996-8eba-70be-c121-f76810717308@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Feature request for adoptive indexes (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Feature request for adoptive indexes
|
Список | pgsql-hackers |
On 11/1/21 21:06, Robert Haas wrote: > On Tue, Oct 26, 2021 at 11:11 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> If I get what you propose, you want to have a "top" tree for (job, nlp, >> year), which "splits" the data set into subsets of ~5000-7000 rows. And >> then for each subset you want a separate "small" trees on each of the >> other columns, so in this case three trees. >> >> Well, the problem with this is pretty obvious - each of the small trees >> requires separate copies of the leaf pages. And remember, in a btree the >> internal pages are usually less than 1% of the index, so this pretty >> much triples the size of the index. And if you insert a row into the >> index, it has to insert the item pointer into each of the small trees, >> likely requiring a separate I/O for each. >> >> So I'd bet this is not any different from just having three separate >> indexes - it doesn't save space, doesn't save I/O, nothing. > > I agree. In a lot of cases, it's actually useful to define the index > on fewer columns, like (job, nlp, year) or even just (job, nlp) or > even just (job) because it makes the index so much smaller and that's > pretty important. If you have enough duplicate entries in a (job, nlp, > year) index to justify create a (job, nlp, year, sequence) index, the > number of distinct (job, nlp, year) tuples has to be small compared to > the number of (job, nlp, year, sequence) tuples - and that means that > you wouldn't actually save much by combining your (job, nlp, year, > sequence) index with a (job, nlp, year, other-stuff) index. As you > say, the internal pages aren't the problem. > > I don't intend to say that there's no possible use for this kind of > technology. Peter G. says that people are writing research papers > about that and they probably wouldn't be doing that unless they'd > found some case where it's a big win. But this example seems extremely > unconvincing. > I actually looked at the use case mentioned by Peter G, i.e. merged indexes with master-detail clustering (see e.g. [1]), but that seems like a rather different thing. The master-detail refers to storing rows from multiple tables, interleaved in a way that allows faster joins. So it's essentially a denormalization tool. Perhaps there's something we could learn about efficient storage of the small trees, but I haven't found any papers describing that (I haven't spent much time on the search, though). [1] Algorithms for merged indexes, Goetz Graefe https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.7709 regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: