Re: Constant time insertion into highly non-unique
От | Jim C. Nasby |
---|---|
Тема | Re: Constant time insertion into highly non-unique |
Дата | |
Msg-id | 20050415163144.GS58835@decibel.org обсуждение исходный текст |
Ответ на | Re: Constant time insertion into highly non-unique (Simon Riggs <simon@2ndquadrant.com>) |
Список | pgsql-hackers |
On Thu, Apr 14, 2005 at 06:36:44PM +0100, Simon Riggs wrote: > I'd suggest a move right probability of 97% (divide by 16) for itemsz > > 16 bytes and 94% (divide by 32) when itemsz >= 128 > > Though I think functional indexes are the way to go there. Why? On a somewhat different note, what about storing or caching info about what page has free space on it? ISTM that will be much more performant than any kind of search, though you still do need to be careful about bulk-inserts. My thought is this: Go to the last page that we knew to have free space on it (last_free_page) If that page is now full, continue scanning right If we hit the end of appropriate pages, do a page split Finally, when vacuum frees space in the index, it needs to set last_free_page to the left-most page with free space, so that all free space will be used. That would be the 'greediest' version of this algorithm possible. In reality, it probably makes more sense to combine this with the current behavior, so that you have a random chance of stopping your scan to the right and just doing a page split wherever you are in the scan. This would still be a win because most of the time you'd go straight to a page with free space on it. As for storing the extra info, there's 2 possibilities that come to mind. You can either store it in the index itself, or you can cache it in shared memory. The advantage of just caching it is that you don't need to change the on-disk index format. You also wouldn't need to update something on disk every time last_free_page changes. The disadvantage is that it's probably more complex to code, and you obviously lose last_free_page info on restart and for index values that see fewer inserts as the info for those index values falls out of the cache. In either case, you only want to worry about doing this for index values that span more than a few pages (maybe the magic 3 that means you've got at least one index page that's nothing but values for this index). For more unique indexes, the extra overhead doesn't make sense. Can someone think of a way to check the performance of this idea without actually coding it? -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
В списке pgsql-hackers по дате отправления: