Re: "Write amplification" is made worse by "getting tired" whileinserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)
От | Peter Geoghegan |
---|---|
Тема | Re: "Write amplification" is made worse by "getting tired" whileinserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters) |
Дата | |
Msg-id | CAH2-Wzm9EQJdOsQRuus293QG64rHcC1hOFAZ5+_8JNm35m1c1w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: "Write amplification" is made worse by "getting tired" whileinserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters) (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: "Write amplification" is made worse by "getting tired" whileinserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)
|
Список | pgsql-hackers |
On Tue, Jul 17, 2018 at 1:12 PM, Robert Haas <robertmhaas@gmail.com> wrote: > This seems like really interesting and important work. I wouldn't > have foreseen that the "getting tired" code would have led to this > kind of bloat (even if I had known about it at all). Thanks! I'm glad that I can come up with concrete, motivating examples like this, because it's really hard to talk about the big picture here. With something like a pgbench workload, there are undoubtedly many different factors in play, since temporal locality influences many different things all at once. I don't think that I understand it all just yet. Taking a holistic view of the problem seems very helpful, but it's also very frustrating at times. > I wonder, > though, whether it's possible that the reverse could happen in some > other scenario. It seems to me that with the existing code, if you > reinsert a value many copies of which have been deleted, you'll > probably find partially-empty pages whose free space can be reused, > but if there's one specific place where each tuple needs to go, you > might end up having to split pages if the new TIDs are all larger or > smaller than the old TIDs. That's a legitimate concern. After all, what I've done boils down to adding a restriction on space utilization that wasn't there before. This clearly helps because it makes it practical to rip out the "getting tired" thing, but that's not everything. There are good reasons for that hack, but if latency magically didn't matter then we could just tear the hack out without doing anything else. That would make groveling through pages full of duplicates at least as discerning about space utilization as my patch manages to be, without any of the complexity. There is actually a flipside to that downside, though (i.e. the downside is also an upside): While not filling up leaf pages that have free space on them is bad, it's only bad when it doesn't leave the pages completely empty. Leaving the pages completely empty is actually good, because then VACUUM is in a position to delete entire pages, removing their downlinks from parent pages. That's a variety of bloat that we can reverse completely. I suspect that you'll see far more of that favorable case in the real world with my patch. It's pretty much impossible to do page deletions with pages full of duplicates today, because the roughly-uniform distribution of still-live tuples among leaf pages fails to exhibit any temporal locality. So, maybe my patch would still come out ahead of simply ripping out "getting tired" in this parallel universe where latency doesn't matter, and space utilization is everything. I made one small mistake with my test case: It actually *is* perfectly efficient at recycling space even at the end, since I don't delete all the duplicates (just 90% of them). Getting tired might have been a contributing factor there, too. -- Peter Geoghegan
В списке pgsql-hackers по дате отправления: