Re: [HACKERS] Bizarre coding in _bt_binsrch
От | Bruce Momjian |
---|---|
Тема | Re: [HACKERS] Bizarre coding in _bt_binsrch |
Дата | |
Msg-id | 199911292224.RAA15252@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Bizarre coding in _bt_binsrch (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] Bizarre coding in _bt_binsrch
|
Список | pgsql-hackers |
Tom, I assume you have dealt with this, right? > I have been puzzling out the coding in _bt_binsrch() in > backend/access/nbtree/nbtsearch.c, with an eye to speeding it up for > the many-equal-keys case. I have finally worked out exactly what it's > doing, to wit: > > * On a leaf page, we always return the first key >= scan key > * (which could be the last slot + 1). > * > * On a non-leaf page, there are special cases: > * > * For an insertion (srchtype != BT_DESCENT and natts == keysz) > * always return first key >= scan key (which could be off the end). > * > * For a standard search (srchtype == BT_DESCENT and natts == keysz) > * return the first equal key if one exists, else the last lesser key > * if one exists, else the first slot on the page. > * > * For a partial-match search (srchtype == BT_DESCENT and natts < keysz) > * return the last lesser key if one exists, else the first slot. > > This strikes me as a tad bizarre --- in particular, the discrepancy > between treatment of equal keys in the normal and partial search cases. > > I think I understand why the partial-match code works that way: there > could be matching keys in the sub-page belonging to the last lesser key. > For example, if our scan target is (x = 2) and we have internal keys > (x = 1, y = 2) > (x = 2, y = 1) > then we need to look at the first key's subpages, where we might find > matching keys like (x = 2, y = 0). > > The full-width case appears to assume that that can't happen: if we > have a given key in an upper page, there can be *no* equal keys in > subpages to its left. That's a rather strong assumption about how > page splitting is done; is it correct? > > Even more to the point, *should* it be correct? If we always returned > the last lesser key, then the code would work for any correctly > sequenced b-tree, but this code looks like it works only if splits occur > only at the leftmost of a group of equal keys. If there are a lot of > equal keys, that could result in a badly unbalanced tree, no? Maybe > that's the real reason why performance seems to be so bad for many > equal keys... maybe the split algorithm needs to be relaxed? > > regards, tom lane > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
В списке pgsql-hackers по дате отправления: