Обсуждение: after 9.2.4 patch vacuumdb -avz not analyzing all tables
After patching to 9.2.4 I am noticing some mysterious behavior in my nightly vacuumdb cron job.
I have been running vacuumdb -avz nightly for a while now, and have a script that tells me the next day if all the tables in pg_stat_user_tables have been vacuumed and analyzed in the last 24 hours.
Since the patch some tables do not seem to be getting analyzed. This did not happen after I patched from earlier versions of 9.2 like 9.2.1 to 9.2.2, 9.2.3.
I pipe the output of the vacuumdb command to a log file, and there dont appear to be any errors, its just not analyzing all the tables as it did before. Some tables are still being analyzed ok. Its vacuuming all but a few tables that I know are being blocked by idle in transaction sessions.
Manually analyzing the table with the psql analyze command updates the timestamp in pg_stat_user_tables as expected. I patched about 8 database servers to 9.2.4 and this is the only one with this problem.
Any advice?
Mike Broers <mbroers@gmail.com> writes: > After patching to 9.2.4 I am noticing some mysterious behavior in my > nightly vacuumdb cron job. > I have been running vacuumdb -avz nightly for a while now, and have a > script that tells me the next day if all the tables in pg_stat_user_tables > have been vacuumed and analyzed in the last 24 hours. > Since the patch some tables do not seem to be getting analyzed. This did > not happen after I patched from earlier versions of 9.2 like 9.2.1 to > 9.2.2, 9.2.3. > I pipe the output of the vacuumdb command to a log file, and there dont > appear to be any errors, its just not analyzing all the tables as it did > before. Some tables are still being analyzed ok. Its vacuuming all but a > few tables that I know are being blocked by idle in transaction sessions. [ pokes around ... ] You certain 9.2.3 didn't do this too? This appears to be an intentional behavior of the 9.2.3 patch that made it cancel truncation when there were conflicting lock requests: /* * Report results to the stats collector, too. An early terminated * lazy_truncate_heap attempt suppresses the message and also cancels the * execution of ANALYZE, if that was ordered. */ if (!vacrelstats->lock_waiter_detected) pgstat_report_vacuum(RelationGetRelid(onerel), onerel->rd_rel->relisshared, new_rel_tuples); else vacstmt->options &= ~VACOPT_ANALYZE; However I've got to say that both of those side-effects of exclusive-lock abandonment seem absolutely brain dead now that I see them. Why would we not bother to tell the stats collector what we've done? Why would we think we should not do ANALYZE when we were told to? Would someone care to step forward and defend this behavior? Because it's not going to be there very long otherwise. regards, tom lane
On Thursday, April 11, 2013, Tom Lane wrote:
[ pokes around ... ] You certain 9.2.3 didn't do this too? This
appears to be an intentional behavior of the 9.2.3 patch that made it
cancel truncation when there were conflicting lock requests:
/*
* Report results to the stats collector, too. An early terminated
* lazy_truncate_heap attempt suppresses the message and also cancels the
* execution of ANALYZE, if that was ordered.
*/
if (!vacrelstats->lock_waiter_detected)
pgstat_report_vacuum(RelationGetRelid(onerel),
onerel->rd_rel->relisshared,
new_rel_tuples);
else
vacstmt->options &= ~VACOPT_ANALYZE;
However I've got to say that both of those side-effects of
exclusive-lock abandonment seem absolutely brain dead now that I see
them. Why would we not bother to tell the stats collector what we've
done? Why would we think we should not do ANALYZE when we were told to?
I believe the rationale was so that an autovacuum would still look like it was needed, and get fired again the next naptime, so that it could continue with the truncation attempts. (Rather than waiting for 20% turnover in the table before trying again). I'm not convinced by this argument. If the DBA is desperate to get the space back, they can go do vacuum full. Otherwise, let the space get nibbled away on the ordinary autovac schedule.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > I believe the rationale was so that an autovacuum would still look like it > was needed, and get fired again the next naptime, so that it could continue > with the truncation attempts. (Rather than waiting for 20% turnover in the > table before trying again). I'm not convinced by this argument. If the > DBA is desperate to get the space back, they can go do vacuum full. Well, that's why I think the lock abandonment shouldn't apply to manual plain vacuum. You shouldn't need to do a vacuum full for that; that'd be a huge increase in the cost, not to mention that it'd transiently require twice the disk space, hardly a good thing if you're short. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > However I've got to say that both of those side-effects of > exclusive-lock abandonment seem absolutely brain dead now that I > see them. Why would we not bother to tell the stats collector > what we've done? Why would we think we should not do ANALYZE > when we were told to? > > Would someone care to step forward and defend this behavior? > Because it's not going to be there very long otherwise. I'm pretty sure that nobody involved noticed the impact on VACUUM ANALYZE command; all discussion was around autovacuum impact; and Jan argued that this was leaving things in a status quo for that, so I conceded the point and left it for a follow-on patch if someone felt the behavior needed to change. Sorry for the miss. http://www.postgresql.org/message-id/50BB700E.8060404@Yahoo.com As far as I'm concerned all effects on the explicit command were unintended and should be reverted. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On further review this particular server skipped from 9.2.2 to 9.2.4. This is my most busy and downtime sensitive server and I was waiting on a maintenance window to patch to 9.2.3 when 9.2.4 dropped and bumped up the urgency. However, I have 3 other less busy production servers that were all running 9.2.3 for a while, didnt exhibit the problem, and still dont on 9.2.4.
psql> analyze seems to work ok in the meantime, I'll report back if I notice any problems with that.
Thanks very much for the response and investigation, it is much appreciated!
On Thu, Apr 11, 2013 at 8:48 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Tom Lane <tgl@sss.pgh.pa.us> wrote:I'm pretty sure that nobody involved noticed the impact on VACUUM
> However I've got to say that both of those side-effects of
> exclusive-lock abandonment seem absolutely brain dead now that I
> see them. Why would we not bother to tell the stats collector
> what we've done? Why would we think we should not do ANALYZE
> when we were told to?
>
> Would someone care to step forward and defend this behavior?
> Because it's not going to be there very long otherwise.
ANALYZE command; all discussion was around autovacuum impact; and
Jan argued that this was leaving things in a status quo for that,
so I conceded the point and left it for a follow-on patch if
someone felt the behavior needed to change. Sorry for the miss.
http://www.postgresql.org/message-id/50BB700E.8060404@Yahoo.com
As far as I'm concerned all effects on the explicit command were
unintended and should be reverted.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Does this behavior only affect the 9.2 branch? Or was it ported to 9.1 or 9.0 or 8.4 as well?
On Thu, Apr 11, 2013 at 7:48 PM, Kevin Grittner <kgrittn@ymail.com> wrote:
Tom Lane <tgl@sss.pgh.pa.us> wrote:I'm pretty sure that nobody involved noticed the impact on VACUUM
> However I've got to say that both of those side-effects of
> exclusive-lock abandonment seem absolutely brain dead now that I
> see them. Why would we not bother to tell the stats collector
> what we've done? Why would we think we should not do ANALYZE
> when we were told to?
>
> Would someone care to step forward and defend this behavior?
> Because it's not going to be there very long otherwise.
ANALYZE command; all discussion was around autovacuum impact; and
Jan argued that this was leaving things in a status quo for that,
so I conceded the point and left it for a follow-on patch if
someone felt the behavior needed to change. Sorry for the miss.
http://www.postgresql.org/message-id/50BB700E.8060404@Yahoo.com
As far as I'm concerned all effects on the explicit command were
unintended and should be reverted.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
To understand recursion, one must first understand recursion.
Scott Marlowe <scott.marlowe@gmail.com> wrote: > Does this behavior only affect the 9.2 branch? Or was it ported > to 9.1 or 9.0 or 8.4 as well? After leaving it on master for a while to see if anyone reported problems in development, I back-patched as far as 9.0 in time for the 9.2.3 (and related) patches. Prior to that the code was too different for it to be the same patch, and (perhaps not entirely coincidentally) I had not seen the problems before 9.0. From 9.0 on I have seen multiple sites (all using queuing from Slony or a JMS implementation) with recurring problems when the queue temporarily got large, shrank again, and then wrapped around to the beginning of the table's file space. In some cases performance was so impaired that when such an event was triggered they would shut down their application until a manual VACUUM could be run. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Looks like psql> vacuum (verbose, analyze) is not reflecting in pg_stat_user_tables as well in some cases. In this scenario I run the command, it outputs all the deleted pages etc (unlike the vacuumdb -avz analyze that seemed to be skipped in the log), but it does not update pg_stat_user_tables. Thats probably expected based on the description previously reported, but I wanted to confirm what I was seeing.
On Fri, Apr 12, 2013 at 10:36 AM, Kevin Grittner <kgrittn@ymail.com> wrote:
Scott Marlowe <scott.marlowe@gmail.com> wrote:After leaving it on master for a while to see if anyone reported
> Does this behavior only affect the 9.2 branch? Or was it ported
> to 9.1 or 9.0 or 8.4 as well?
problems in development, I back-patched as far as 9.0 in time for
the 9.2.3 (and related) patches. Prior to that the code was too
different for it to be the same patch, and (perhaps not entirely
coincidentally) I had not seen the problems before 9.0. From 9.0
on I have seen multiple sites (all using queuing from Slony or a
JMS implementation) with recurring problems when the queue
temporarily got large, shrank again, and then wrapped around to the
beginning of the table's file space. In some cases performance was
so impaired that when such an event was triggered they would shut
down their application until a manual VACUUM could be run.