Re: Constant time insertion into highly non-unique indexes
От | Tom Lane |
---|---|
Тема | Re: Constant time insertion into highly non-unique indexes |
Дата | |
Msg-id | 372.1113489340@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Constant time insertion into highly non-unique indexes (Simon Riggs <simon@2ndquadrant.com>) |
Ответы |
Re: Constant time insertion into highly non-unique
|
Список | pgsql-hackers |
Simon Riggs <simon@2ndquadrant.com> writes: > Recent discussions on PERFORM have made me look into some aspects of > B-tree index code, especially with regard to bulk loading high volumes > of data. Have you read the archived discussions that led up to the current algorithm? I don't think it's nearly as bad as you believe. In particular, I think you missed the point that the move-or-split decision is *random* and is therefore made differently by each inserter. Therefore the probability that the earlier pages get split rises rapidly as more and more insertions are made --- and it only takes one split decision to take the pressure off. It's entirely possible that the 0.99 figure needs some fine tuning. IIRC, the experiments we did to choose that number were done with pretty simple test indexes --- probably just int4. Thinking about the behavior, it seems plausible that the figure needs to drop as the number of entries per page drops ... but we have not tested that. In an int4 index on Intel-ish hardware (MAXALIGN 4), you can fit about 500 entries per page. So consider a case where the first 2 pages for a given value are full and the third is half full. To fill the third completely will require 250 insertions, by which time there is a very good chance (more than 90% if I did the math right) that someone will have decided to split rather than move right at the second page. After that the second page fills, and then we are back to the original state (because the new third page will be half full). So I claim that in fact the behavior *is* constant time: most insertions will succeed on either the second or third page, indefinitely. However, obviously if there are only a few values per page, you would get much worse behavior. (OTOH, the wider the index values, the lower the probability of exact duplicates anyway, I'd think, so we may be wasting our time to worry about the behavior there.) regards, tom lane
В списке pgsql-hackers по дате отправления: