Set hint bits upon eviction from BufMgr
От | Merlin Moncure |
---|---|
Тема | Set hint bits upon eviction from BufMgr |
Дата | |
Msg-id | AANLkTikiHsQb7ENwcyB0aMmZczgW1KcPSmd2LoS4cU5M@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Set hint bits upon eviction from BufMgr
Re: Set hint bits upon eviction from BufMgr |
Список | pgsql-hackers |
Maybe I'm being overly simplistic or incorrect here, but I was thinking that there might be a route to reducing hint bit impact to the main sufferers of the feature without adding too much pain in the general case. I'm unfortunately convinced there is no getting rid of them -- in fact their utility will become even more apparent with faster storage and the pendulum of optimization swings back to the cpu side. My idea is to reserve a bit in the page header, say PD_ALL_SAME_XMIN that indicates all the tuples are from the same transaction and set it when the first insertion tuple hits the page and unset it when any tuple is added from another xmin/touched/deleted. The point here is to set up a cheap check at the page level that we can make when a page is getting evicted from the bufmgr. If the bit is set, we grab off the xmin of the first tuple on the page and test it for visibility (assuming the hint bit is not already set). If we get a thumbs up on the transaction, we can look the page and set all tuple hints as during the page evict/sync process. We don't worry about logging/crash safety on the 'all same' hint because it's only interesting to this bufmgr check (it can even be cleared when page is loaded). Without this bit, the only way to set hint bits going during bufmgr eviction is to do a visibility check on every tuple, which would probably be prohibitively expensive. Since OLTP environments would rarely see this bit, they would not have to pay for the check. Also, we can maybe tweak the bufmgr to prefer not to evict pages with this bit set if it's known they are not yet written out to primary storage. Maybe this impossible or not logical...just thinking out loud. Anyways, if this actually works, shared buffers can start to play a role of mitigating hint bit i/o as long as the transaction resolves before pages start jumping out into storage. If you couple this with a facility to do bulk loads that break up transactions on regular intervals, you have a good shot at getting all your hint bits written out properly in large load situation. You might be able to do similar tricks with deletes -- I haven't thought about that. Also there might be some interplay with vacuum or some other deal breaker -- curious to see if I have something worth further thought here. merlin
В списке pgsql-hackers по дате отправления: