Re: _mdfd_getseg can be expensive
От | Andres Freund |
---|---|
Тема | Re: _mdfd_getseg can be expensive |
Дата | |
Msg-id | 20141031223210.GN13584@awork2.anarazel.de обсуждение исходный текст |
Ответ на | _mdfd_getseg can be expensive (Andres Freund <andres@2ndquadrant.com>) |
Ответы |
Re: _mdfd_getseg can be expensive
|
Список | pgsql-hackers |
Hi, On 2014-03-31 12:10:01 +0200, Andres Freund wrote: > I recently have seen some perf profiles in which _mdfd_getseg() was in > the top #3 when VACUUMing large (~200GB) relations. Called by mdread(), > mdwrite(). Looking at it's implementation, I am not surprised. It > iterates over all segment entries a relations has; for every read or > write. That's not painful for smaller relations, but at a couple of > hundred GB it starts to be annoying. Especially if kernel readahead has > already read in all data from disk. > > I don't have a good idea what to do about this yet, but it seems like > something that should be fixed mid-term. > > The best I can come up is is caching the last mdvec used, but that's > fairly ugly. Alternatively it might be a good idea to not store MdfdVec > as a linked list, but as a densely allocated array. I've seen this a couple times more since. On larger relations it gets even more pronounced. When sequentially scanning a 2TB relation, _mdfd_getseg() gets up to 80% proportionate CPU time towards the end of the scan. I wrote the attached patch that get rids of that essentially quadratic behaviour, by replacing the mdfd chain/singly linked list with an array. Since we seldomly grow files by a whole segment I can't see the slightly bigger memory reallocations matter significantly. In pretty much every other case the array is bound to be a winner. Does anybody have fundamental arguments against that idea? With some additional work we could save a bit more memory by getting rid of the mdfd_segno as it's essentially redundant - but that's not entirely trivial and I'm unsure if it's worth it. I've also attached a second patch that makes PageIsVerified() noticeably faster when the page is new. That's helpful and related because it makes it easier to test the correctness of the md.c rewrite by faking full 1GB segments. It's also pretty simple, so there's imo little reason not to do it. Greetings, Andres Freund PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633 -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: