Re: Performance on inserts
От | Jules Bean |
---|---|
Тема | Re: Performance on inserts |
Дата | |
Msg-id | 20000826114855.A1524@pear.presence.net.uk обсуждение исходный текст |
Ответ на | Re: Performance on inserts (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Performance on inserts
(Matthew Kirkwood <matthew@hairy.beasts.org>)
Re: Performance on inserts (Tom Lane <tgl@sss.pgh.pa.us>) Re: Performance on inserts (Bruce Momjian <pgman@candle.pha.pa.us>) |
Список | pgsql-hackers |
On Fri, Aug 25, 2000 at 07:00:22PM -0400, Tom Lane wrote: > The algorithm does seem to work quite nicely just the way I described > it, although it turns out I was off base about a good probability > setting. I find that something up around 0.99 seems to be good. > Using the same (perhaps overly simplistic) test case: > > # tuples inserted 6.5 current+random hack @ 0.99 > Time index size Time index size > 1536 <1sec 90112 <1sec 106496 > 3072 1.56 163840 <1sec 188416 > 6144 3.70 286720 1.40 376832 > 12288 9.73 532480 2.65 688128 > 24576 93.26 1024000 5.22 1368064 > 49152 363.23 2007040 10.34 2727936 > 98304 22.07 5545984 > 196608 45.60 11141120 > 393216 92.53 22290432 > > I tried probabilities from 0.67 to 0.999 and found that runtimes didn't > vary a whole lot (though this is near the minimum), while index size > consistently got larger as the probability of moving right decreased. > The runtime is nicely linear throughout the range. That looks brilliant!! (Bearing in mind that I have over 10 million tuples in my table, you can imagine what performance was like for me!) Is there any chance you could generate a patch against released 7.0.2 to add just this functionality... It would be the kiss of life for my code! (Not in a hurry, I'm not back in work until Wednesday, as it happens) And, of course, what would /really/ get my code going speedily would be the partial indices mentioned elsewhere in this thread. If the backend could automagically drop keys containing > 10% (tunable) of the rows from the index, then my index would be (a) about 70% smaller! and (b) only used when it's faster. [This means it would have to update some simple histogram data. However, I can't see that being much of an overhead] For the short term, if I can get a working version of the above randomisation patch, I think I shall 'fake' a partial index by manually setting 'enable_seqscan=off' for all but the 4 or 5 most common categories. Those two factors combined will speed up my bulk inserts a lot. One other idea, though: Is there any simple way for Pg to combine inserts into one bulk? Specifically, their effect on the index files. It has always seemed to me to be one of the (many) glaring flaws in SQL that the INSERT statement only takes one row at a time. But, using INSERT ... SELECT, I can imagine that it might be possible to do 'bulk' index updating. so that scanning process is done once per 'batch'. If I can make an analogy with sorted files (which indices are rather like), if I wanted to add another 100 lines to a 1000 line sorted file, I'd sort the 100 first, and then 'merge' them in. Whilst I realise that indices aren't stored sorted (no need), I think it ought to be possible to construct an efficient algorithm for merging two btrees? Jules -- Jules Bean | Any sufficiently advanced jules@debian.org | technology is indistinguishable jules@jellybean.co.uk | from a perl script
В списке pgsql-hackers по дате отправления: