Re: [HACKERS] vacuum process size
От | Tom Lane |
---|---|
Тема | Re: [HACKERS] vacuum process size |
Дата | |
Msg-id | 2348.935511622@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] vacuum process size (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Ответы |
Re: [HACKERS] vacuum process size
RE: [HACKERS] vacuum process size |
Список | pgsql-hackers |
I have been looking some more at the vacuum-process-size issue, and I am having a hard time understanding why the VPageList data structure is the critical one. As far as I can see, there should be at most one pointer in it for each disk page of the relation. OK, you were vacuuming a table with something like a quarter million pages, so the end size of the VPageList would have been something like a megabyte, and given the inefficient usage of repalloc() in the original code, a lot more space than that would have been wasted as the list grew. So doubling the array size at each step is a good change. But there are a lot more tuples than pages in most relations. I see two lists with per-tuple data in vacuum.c, "vtlinks" in vc_scanheap and "vtmove" in vc_rpfheap, that are both being grown with essentially the same technique of repalloc() after every N entries. I'm not entirely clear on how many tuples get put into each of these lists, but it sure seems like in ordinary circumstances they'd be much bigger space hogs than any of the three VPageList lists. I recommend going to a doubling approach for each of these lists as well as for VPageList. There is a fourth usage of repalloc with the same method, for "ioid" in vc_getindices. This only gets one entry per index on the current relation, so it's unlikely to be worth changing on its own merit. But it might be worth building a single subroutine that expands a growable list of entries (taking sizeof() each entry as a parameter) and applying it in all four places. regards, tom lane
В списке pgsql-hackers по дате отправления: