Обсуждение: Changing the autovacuum launcher scheduling; oldest table first algorithm
Hi, I've created the new thread for the changing AV launcher scheduling. The problem of AV launcher scheduling is described on [1] but I summarize it here. If there is even one database that is at risk of wraparound, currently AV launcher selects the database having the oldest datfrozenxid until all tables in that database have been vacuumed. This leads that tables on other database can bloat and not be frozen because other database are not selected by AV launcher. It makes things worse if the database has a large table that takes a long time to be vacuumed. Attached patch changes the AV launcher scheduling algorithm based on the proposed idea by Robert so that it selects a database first that has the oldest table when the database is at risk of wraparound. For detail of the algorithm please refer to [2]. In this algorithm, the running AV workers advertise the next datfrozenxid on the shared memory that we will have. That way, AV launcher can select a database that has the oldest table in the database cluster. However, this algorithm doesn't support the case where the age of next datfrozenxid gets greater than autovacum_vacuum_max_age. This case can happen if users set autovacvuum_vacuum_max_age to small value and vacuuming the whole database takes a long time. But since it's not a common case and it doesn't degrade than current behavior even if this happened, I think it's not a big problem. Feedback is very welcome. [1] https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8A4DC6%40G01JPEXMBYT05 [2] https://www.postgresql.org/message-id/CA%2BTgmobT3m%3D%2BdU5HF3VGVqiZ2O%2Bv6P5wN1Gj%2BPrq%2Bhj7dAm9AQ%40mail.gmail.com Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Вложения
Re: Changing the autovacuum launcher scheduling; oldest table firstalgorithm
От
Grigory Smolkin
Дата:
On 02/28/2018 12:04 PM, Masahiko Sawada wrote: > Hi, > > I've created the new thread for the changing AV launcher scheduling. > The problem of AV launcher scheduling is described on [1] but I > summarize it here. > > If there is even one database that is at risk of wraparound, currently > AV launcher selects the database having the oldest datfrozenxid until > all tables in that database have been vacuumed. This leads that tables > on other database can bloat and not be frozen because other database > are not selected by AV launcher. It makes things worse if the database > has a large table that takes a long time to be vacuumed. > > Attached patch changes the AV launcher scheduling algorithm based on > the proposed idea by Robert so that it selects a database first that > has the oldest table when the database is at risk of wraparound. For > detail of the algorithm please refer to [2]. > > In this algorithm, the running AV workers advertise the next > datfrozenxid on the shared memory that we will have. That way, AV > launcher can select a database that has the oldest table in the > database cluster. However, this algorithm doesn't support the case > where the age of next datfrozenxid gets greater than > autovacum_vacuum_max_age. This case can happen if users set > autovacvuum_vacuum_max_age to small value and vacuuming the whole > database takes a long time. But since it's not a common case and it > doesn't degrade than current behavior even if this happened, I think > it's not a big problem. > > Feedback is very welcome. > > [1] https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F8A4DC6%40G01JPEXMBYT05 > [2] https://www.postgresql.org/message-id/CA%2BTgmobT3m%3D%2BdU5HF3VGVqiZ2O%2Bv6P5wN1Gj%2BPrq%2Bhj7dAm9AQ%40mail.gmail.com > > Regards, > > -- > Masahiko Sawada > NIPPON TELEGRAPH AND TELEPHONE CORPORATION > NTT Open Source Software Center Hello! I`ve tried to compile your patch on Fedora24 with gcc 6.3.1 20161221: gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g3 -O0 -I../../../src/include -D_GNU_SOURCE -c -o autovacuum.o autovacuum.c In file included from ../../../src/include/postgres.h:46:0, from autovacuum.c:62: autovacuum.c: In function ‘get_next_xid_bounds’: autovacuum.c:1979:9: warning: implicit declaration of function ‘TransactionIdIsVaild’ [-Wimplicit-function-declaration] Assert(TransactionIdIsVaild(frozenxid)); ^ ../../../src/include/c.h:713:7: note: in definition of macro ‘Trap’ if (condition) \ ^~~~~~~~~ autovacuum.c:1979:2: note: in expansion of macro ‘Assert’ Assert(TransactionIdIsVaild(frozenxid)); ^~~~~~ autovacuum.c:1980:28: error: ‘minmutli’ undeclared (first use in this function) Assert(MultiXactIdIsValid(minmutli)); ^ ../../../src/include/c.h:713:7: note: in definition of macro ‘Trap’ if (condition) \ ^~~~~~~~~ autovacuum.c:1980:2: note: in expansion of macro ‘Assert’ Assert(MultiXactIdIsValid(minmutli)); ^~~~~~ autovacuum.c:1980:9: note: in expansion of macro ‘MultiXactIdIsValid’ Assert(MultiXactIdIsValid(minmutli)); ^~~~~~~~~~~~~~~~~~ autovacuum.c:1980:28: note: each undeclared identifier is reported only once for each function it appears in Assert(MultiXactIdIsValid(minmutli)); ^ ../../../src/include/c.h:713:7: note: in definition of macro ‘Trap’ if (condition) \ ^~~~~~~~~ autovacuum.c:1980:2: note: in expansion of macro ‘Assert’ Assert(MultiXactIdIsValid(minmutli)); ^~~~~~ autovacuum.c:1980:9: note: in expansion of macro ‘MultiXactIdIsValid’ Assert(MultiXactIdIsValid(minmutli)); ^~~~~~~~~~~~~~~~~~ <builtin>: recipe for target 'autovacuum.o' failed make[3]: *** [autovacuum.o] Error 1 -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, On 2018-02-28 18:04:27 +0900, Masahiko Sawada wrote: > I've created the new thread for the changing AV launcher scheduling. > The problem of AV launcher scheduling is described on [1] but I > summarize it here. This is a new patch submitted to CF 2018-03. As that's the last CF for v11, and this patch isn't trivial, I propose to instead move this to the next CF. Greetings, Andres Freund
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
От
Masahiko Sawada
Дата:
On Fri, Mar 2, 2018 at 4:51 PM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > On 2018-02-28 18:04:27 +0900, Masahiko Sawada wrote: >> I've created the new thread for the changing AV launcher scheduling. >> The problem of AV launcher scheduling is described on [1] but I >> summarize it here. > > This is a new patch submitted to CF 2018-03. As that's the last CF for > v11, and this patch isn't trivial, I propose to instead move this to the > next CF. > Thank you for comment. I recently added this patch to the current CF but discussion for design has started from the end of Jan. So I don't think this patch is design phase but should I move it to the next CF? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
От
Masahiko Sawada
Дата:
On Fri, Mar 2, 2018 at 5:00 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Mar 2, 2018 at 4:51 PM, Andres Freund <andres@anarazel.de> wrote: >> Hi, >> >> On 2018-02-28 18:04:27 +0900, Masahiko Sawada wrote: >>> I've created the new thread for the changing AV launcher scheduling. >>> The problem of AV launcher scheduling is described on [1] but I >>> summarize it here. >> >> This is a new patch submitted to CF 2018-03. As that's the last CF for >> v11, and this patch isn't trivial, I propose to instead move this to the >> next CF. >> > > Thank you for comment. > I recently added this patch to the current CF but discussion for > design has started from the end of Jan. So I don't think this patch is > design phase but should I move it to the next CF? > I now see situation of the current CF, and agreed with you. Will move this to the next CF. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Changing the autovacuum launcher scheduling; oldest table firstalgorithm
От
Alvaro Herrera
Дата:
Hello I haven't read your respective patches yet, but both these threads brought to memory a patch I proposed a few years ago that I never completed: https://www.postgresql.org/message-id/flat/20130124215715.GE4528%40alvh.no-ip.org In that thread I posted a patch to implement a prioritisation scheme for autovacuum, based on an equation which was still under discussion when I abandoned it. Chris Browne proposed a crazy equation to mix in both XID age and fraction of dead tuples; probably that idea is worth studying further. I tried to implement that in my patch but I probably didn't do it correctly (because, as I recall, it failed to work as expected). Nowadays I think we would also consider the multixact freeze age, too. Maybe that's worth giving a quick look in case some of the ideas there are useful for the patches now being proposed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
От
Masahiko Sawada
Дата:
On Tue, Mar 6, 2018 at 11:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Hello > > I haven't read your respective patches yet, but both these threads > brought to memory a patch I proposed a few years ago that I never > completed: > > https://www.postgresql.org/message-id/flat/20130124215715.GE4528%40alvh.no-ip.org Thank you for sharing the thread. > > In that thread I posted a patch to implement a prioritisation scheme for > autovacuum, based on an equation which was still under discussion when > I abandoned it. Chris Browne proposed a crazy equation to mix in both > XID age and fraction of dead tuples; probably that idea is worth > studying further. I tried to implement that in my patch but I probably > didn't do it correctly (because, as I recall, it failed to work as > expected). Nowadays I think we would also consider the multixact freeze > age, too. > > Maybe that's worth giving a quick look in case some of the ideas there > are useful for the patches now being proposed. Yeah, that's definitely useful for the patches. I'll look at this and will discuss that here. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
От
Masahiko Sawada
Дата:
On Wed, Mar 14, 2018 at 11:29 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Tue, Mar 6, 2018 at 11:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> Hello >> >> I haven't read your respective patches yet, but both these threads >> brought to memory a patch I proposed a few years ago that I never >> completed: >> >> https://www.postgresql.org/message-id/flat/20130124215715.GE4528%40alvh.no-ip.org > > Thank you for sharing the thread. > >> >> In that thread I posted a patch to implement a prioritisation scheme for >> autovacuum, based on an equation which was still under discussion when >> I abandoned it. Chris Browne proposed a crazy equation to mix in both >> XID age and fraction of dead tuples; probably that idea is worth >> studying further. I tried to implement that in my patch but I probably >> didn't do it correctly (because, as I recall, it failed to work as >> expected). Nowadays I think we would also consider the multixact freeze >> age, too. >> >> Maybe that's worth giving a quick look in case some of the ideas there >> are useful for the patches now being proposed. > > Yeah, that's definitely useful for the patches. I'll look at this and > will discuss that here. > I read that discussion. If I understand correctly, the patch proposed in that discussion mainly focuses on table-selection algorithm for each AV worker. But the patch I proposed also changes it but the main part of this patch is to change the database-selection algorithm by AV launcher (sorry, maybe the subject leads misleading). The problem I'd like to deal with is, when there is at least one database needing anti-wraparound vacuum it concentrates launching new AV workers on one database. If we change the table-selection algorithm to the "oldest table first" algorithm in that case, we can deal with that problem by this patch. However, I think it's worthwhile to consider another table-selection algorithm such as the proposal on that old thread. So I think it would be better to think these issue separately. That is, we can change the database-selection algorithm in order to prevent the concentration problem while not changing table-selection algorithm. This brought me another idea; having AV workers report its autovacuum results at its AV worker slot so that they can teach autovacuum status to AV launcher. For example, each AV worker can report the following information. * Database oid * The number of tables needing vacuum. * The number of vacuumed tables. * The number of skipped tables due to being processed by other AV workers. * Timestamp of last update If there is an up-to-date information meaning either that there is no tables needing vacuum or that there is only table needing vacuum but being vacuumed by other worker, AV launcher can launches new one to other database. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Changing the autovacuum launcher scheduling; oldest table firstalgorithm
От
Michael Paquier
Дата:
On Thu, Jun 28, 2018 at 04:20:53PM +0900, Masahiko Sawada wrote: > If there is an up-to-date information meaning either that there is no > tables needing vacuum or that there is only table needing vacuum but > being vacuumed by other worker, AV launcher can launches new one to > other database. I am not completely sure what we want to do with this patch in particular as there are many approaches and things which can be discussed. For now, the latest patch proposed does not apply, so I am moving it to next CF, waiting for its author. -- Michael
Вложения
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
От
Dmitry Dolgov
Дата:
> On Tue, Oct 2, 2018 at 4:42 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Jun 28, 2018 at 04:20:53PM +0900, Masahiko Sawada wrote: > > If there is an up-to-date information meaning either that there is no > > tables needing vacuum or that there is only table needing vacuum but > > being vacuumed by other worker, AV launcher can launches new one to > > other database. > > I am not completely sure what we want to do with this patch in > particular as there are many approaches and things which can be > discussed. For now, the latest patch proposed does not apply, so I am > moving it to next CF, waiting for its author. Nothing changed since then, but also the patch got not enough review to say that there was substantial feedback. I'll move it to the next CF.
Re: Changing the autovacuum launcher scheduling; oldest table firstalgorithm
От
Michael Paquier
Дата:
On Thu, Nov 29, 2018 at 06:21:34PM +0100, Dmitry Dolgov wrote: > Nothing changed since then, but also the patch got not enough review to say > that there was substantial feedback. I'll move it to the next CF. I would have suggested to mark the patch as returned with feedback instead as the thing does not apply for some time now, and the author has been notified about a rebase. -- Michael
Вложения
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
От
Masahiko Sawada
Дата:
On Fri, Nov 30, 2018 at 10:48 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Nov 29, 2018 at 06:21:34PM +0100, Dmitry Dolgov wrote: > > Nothing changed since then, but also the patch got not enough review to say > > that there was substantial feedback. I'll move it to the next CF. > > I would have suggested to mark the patch as returned with feedback > instead as the thing does not apply for some time now, and the author > has been notified about a rebase. Agreed and sorry for the late reply. The design of this patch would need to be reconsidered based on suggestions and discussions we had before. I'll propose it again after considerations. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Changing the autovacuum launcher scheduling; oldest table first algorithm
От
Dmitry Dolgov
Дата:
> On Fri, Nov 30, 2018 at 3:05 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Nov 30, 2018 at 10:48 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Thu, Nov 29, 2018 at 06:21:34PM +0100, Dmitry Dolgov wrote: > > > Nothing changed since then, but also the patch got not enough review to say > > > that there was substantial feedback. I'll move it to the next CF. > > > > I would have suggested to mark the patch as returned with feedback > > instead as the thing does not apply for some time now, and the author > > has been notified about a rebase. > > Agreed and sorry for the late reply. > > The design of this patch would need to be reconsidered based on > suggestions and discussions we had before. I'll propose it again after > considerations. Well, taking into account this information, then yes, it makes sense and I'll mark it as "Returned with feedback", thanks!