Обсуждение: Tracking last scan time
Often it is beneficial to review one's schema with a view to removing indexes (and sometimes tables) that are no longer required. It's very difficult to understand when that is the case by looking at the number of scans of a relation as, for example, an index may be used infrequently but may be critical in those times when it is used.
The attached patch against HEAD adds optional tracking of the last scan time for relations. It updates pg_stat_*_tables with new last_seq_scan and last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to help with this.
Due to the use of gettimeofday(), those values are only maintained if a new GUC, track_scans, is set to on. By default, it is off.
--
I did run a 12 hour test to see what the performance impact is. pgbench was run with scale factor 10000 and 75 users across 4 identical bare metal machines running Rocky 8 in parallel which showed roughly a -2% average performance penalty against HEAD with track_scans enabled. Machines were PowerEdge R7525's with 128GB RAM, dual 16C/32T AMD 7302 CPUs, with the data directory on 6 x 800GB 12Gb/s SSD SAS drives in RAID 0. Kernel time source is tsc.
HEAD track_scans Penalty (%)
box1 19582.49735 19341.8881 -1.22869541
box2 19936.55513 19928.07479 -0.04253664659
box3 19631.78895 18649.64379 -5.002830696
box4 19810.86767 19420.67192 -1.969604525
Average 19740.42728 19335.06965 -2.05343896
box1 19582.49735 19341.8881 -1.22869541
box2 19936.55513 19928.07479 -0.04253664659
box3 19631.78895 18649.64379 -5.002830696
box4 19810.86767 19420.67192 -1.969604525
Average 19740.42728 19335.06965 -2.05343896
Doc and test updates included.
Вложения
On Tue, 23 Aug 2022 at 11:00, Dave Page <dpage@pgadmin.org> wrote: > > Often it is beneficial to review one's schema with a view to removing indexes (and sometimes tables) that are no longerrequired. It's very difficult to understand when that is the case by looking at the number of scans of a relation as,for example, an index may be used infrequently but may be critical in those times when it is used. I think this is easy to answer in a prometheus/datadog/etc world since you can consult the history of the count to see when it was last incremented. (Or do effectively that continously). I guess that just reinforces the idea that it should be optional. Perhaps there's room for some sort of general feature for controlling various time series aggregates like max() and min() sum() or, uhm, timeoflastchange() on whatever stats you want. That would let us remove a bunch of stuff from pg_stat_statements and let users turn on just the ones they want. And also let users enable things like time of last rollback or conflict etc. But that's just something to think about down the road. > The attached patch against HEAD adds optional tracking of the last scan time for relations. It updates pg_stat_*_tableswith new last_seq_scan and last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to helpwith this. > > Due to the use of gettimeofday(), those values are only maintained if a new GUC, track_scans, is set to on. By default,it is off. Bikeshedding warning -- "track_scans" could equally apply to almost any stats about scans. I think the really relevant thing here is the times, not the scans. I think the GUC should be "track_scan_times". Or could that still be confused with scan durations? Maybe "track_scan_timestamps"? You could maybe make the gettimeofday cheaper by doing it less often. Like, skipping the increment if the old timestamp is newer than 1s before the transaction start time (I think that's available free if some other guc is enabled but I don't recall). Or isn't this cb normally happening after transaction end? So xactStopTimestamp might be available already? -- greg
Hi
On Tue, 23 Aug 2022 at 13:07, Greg Stark <stark@mit.edu> wrote:
On Tue, 23 Aug 2022 at 11:00, Dave Page <dpage@pgadmin.org> wrote:
>
> Often it is beneficial to review one's schema with a view to removing indexes (and sometimes tables) that are no longer required. It's very difficult to understand when that is the case by looking at the number of scans of a relation as, for example, an index may be used infrequently but may be critical in those times when it is used.
I think this is easy to answer in a prometheus/datadog/etc world since
you can consult the history of the count to see when it was last
incremented. (Or do effectively that continously).
Yes. But not every PostgreSQL instance is monitored in that way.
I guess that just reinforces the idea that it should be optional.
Perhaps there's room for some sort of general feature for controlling
various time series aggregates like max() and min() sum() or, uhm,
timeoflastchange() on whatever stats you want. That would let us
remove a bunch of stuff from pg_stat_statements and let users turn on
just the ones they want. And also let users enable things like time of
last rollback or conflict etc. But that's just something to think
about down the road.
It's certainly an interesting idea.
> The attached patch against HEAD adds optional tracking of the last scan time for relations. It updates pg_stat_*_tables with new last_seq_scan and last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to help with this.
>
> Due to the use of gettimeofday(), those values are only maintained if a new GUC, track_scans, is set to on. By default, it is off.
Bikeshedding warning -- "track_scans" could equally apply to almost
any stats about scans. I think the really relevant thing here is the
times, not the scans. I think the GUC should be "track_scan_times". Or
could that still be confused with scan durations? Maybe
"track_scan_timestamps"?
The latter seems reasonable.
You could maybe make the gettimeofday cheaper by doing it less often.
Like, skipping the increment if the old timestamp is newer than 1s
before the transaction start time (I think that's available free if
some other guc is enabled but I don't recall). Or isn't this cb
normally happening after transaction end? So xactStopTimestamp might
be available already?
Something like:
if (pgstat_track_scan_timestamps && lstats->t_counts.t_numscans &&
tabentry->lastscan + USECS_PER_SEC < GetCurrentTransactionStopTimestamp())
tabentry->lastscan = GetCurrentTimestamp();
tabentry->lastscan + USECS_PER_SEC < GetCurrentTransactionStopTimestamp())
tabentry->lastscan = GetCurrentTimestamp();
?
On Tue, Aug 23, 2022 at 10:55:09AM +0100, Dave Page wrote: > Often it is beneficial to review one's schema with a view to removing indexes > (and sometimes tables) that are no longer required. It's very difficult to > understand when that is the case by looking at the number of scans of a > relation as, for example, an index may be used infrequently but may be critical > in those times when it is used. > > The attached patch against HEAD adds optional tracking of the last scan time > for relations. It updates pg_stat_*_tables with new last_seq_scan and > last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to > help with this. Would it be simpler to allow the sequential and index scan columns to be cleared so you can look later to see if it is non-zero? Should we allow arbitrary clearing of stat columns? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
On Wed, 24 Aug 2022 at 15:18, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Aug 23, 2022 at 10:55:09AM +0100, Dave Page wrote:
> Often it is beneficial to review one's schema with a view to removing indexes
> (and sometimes tables) that are no longer required. It's very difficult to
> understand when that is the case by looking at the number of scans of a
> relation as, for example, an index may be used infrequently but may be critical
> in those times when it is used.
>
> The attached patch against HEAD adds optional tracking of the last scan time
> for relations. It updates pg_stat_*_tables with new last_seq_scan and
> last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to
> help with this.
Would it be simpler to allow the sequential and index scan columns to be
cleared so you can look later to see if it is non-zero? Should we allow
arbitrary clearing of stat columns?
I don't think so, because then stat values wouldn't necessarily correlate with each other, and you wouldn't know when any of them were last reset unless we started tracking each individual reset. At least now you can see when they were all reset, and you know they were reset at the same time.
On Wed, Aug 24, 2022 at 04:01:21PM +0100, Dave Page wrote: > On Wed, 24 Aug 2022 at 15:18, Bruce Momjian <bruce@momjian.us> wrote: > > On Tue, Aug 23, 2022 at 10:55:09AM +0100, Dave Page wrote: > > Often it is beneficial to review one's schema with a view to removing > indexes > > (and sometimes tables) that are no longer required. It's very difficult > to > > understand when that is the case by looking at the number of scans of a > > relation as, for example, an index may be used infrequently but may be > critical > > in those times when it is used. > > > > The attached patch against HEAD adds optional tracking of the last scan > time > > for relations. It updates pg_stat_*_tables with new last_seq_scan and > > last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column > to > > help with this. > > Would it be simpler to allow the sequential and index scan columns to be > cleared so you can look later to see if it is non-zero? Should we allow > > I don't think so, because then stat values wouldn't necessarily correlate with > each other, and you wouldn't know when any of them were last reset unless we > started tracking each individual reset. At least now you can see when they were > all reset, and you know they were reset at the same time. Yeah, true. I was more asking if these two columns are in some way special or if people would want a more general solution, and if so, is that something we want in core Postgres. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
On Wed, 24 Aug 2022 at 16:03, Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Aug 24, 2022 at 04:01:21PM +0100, Dave Page wrote:
> On Wed, 24 Aug 2022 at 15:18, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Tue, Aug 23, 2022 at 10:55:09AM +0100, Dave Page wrote:
> > Often it is beneficial to review one's schema with a view to removing
> indexes
> > (and sometimes tables) that are no longer required. It's very difficult
> to
> > understand when that is the case by looking at the number of scans of a
> > relation as, for example, an index may be used infrequently but may be
> critical
> > in those times when it is used.
> >
> > The attached patch against HEAD adds optional tracking of the last scan
> time
> > for relations. It updates pg_stat_*_tables with new last_seq_scan and
> > last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column
> to
> > help with this.
>
> Would it be simpler to allow the sequential and index scan columns to be
> cleared so you can look later to see if it is non-zero? Should we allow
>
> I don't think so, because then stat values wouldn't necessarily correlate with
> each other, and you wouldn't know when any of them were last reset unless we
> started tracking each individual reset. At least now you can see when they were
> all reset, and you know they were reset at the same time.
Yeah, true. I was more asking if these two columns are in some way
special or if people would want a more general solution, and if so, is
that something we want in core Postgres.
They're special in the sense that they're the ones you're most likely going to look at to see how much a relation is used I think (at least, I'd look at them rather than the tuple counts).
There are certainly other things for which a last usage value may be useful. Functions/procedures for example, or views. The benefits to removing unused objects of that type are far, far lower than indexes or tables of course.
There are other potential use cases for similar timestamps, such as object creation times (and creating user), but they are more useful for auditing than monitoring and optimisation.
On Thu, 25 Aug 2022 at 03:03, Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, Aug 24, 2022 at 04:01:21PM +0100, Dave Page wrote: > > On Wed, 24 Aug 2022 at 15:18, Bruce Momjian <bruce@momjian.us> wrote: > > Would it be simpler to allow the sequential and index scan columns to be > > cleared so you can look later to see if it is non-zero? Should we allow > > > > I don't think so, because then stat values wouldn't necessarily correlate with > > each other, and you wouldn't know when any of them were last reset unless we > > started tracking each individual reset. At least now you can see when they were > > all reset, and you know they were reset at the same time. > > Yeah, true. I was more asking if these two columns are in some way > special or if people would want a more general solution, and if so, is > that something we want in core Postgres. Back when I used to do a bit of PostgreSQL DBA stuff, I had a nightly job setup to record the state of pg_stat_all_tables and put that into another table along with the current date. I then had a view that did some calculations with col - LAG(col) OVER (PARTITION BY relid ORDER BY date) to fetch the numerical values for each date. I didn't ever want to reset the stats because it messes with autovacuum. If you zero out n_ins_since_vacuum more often than auto-vacuum would trigger, then bad things happen over time (we should really warn about that in the docs). I don't have a particular opinion about the patch, I'm just pointing out that there are other ways. Even just writing down the numbers on a post-it note and coming back in a month to see if they've changed is enough to tell if the table or index has been used. We do also need to consider now that stats are stored in shared memory that any fields we add are in RAM. David
Hi
On Thu, 25 Aug 2022 at 01:44, David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 25 Aug 2022 at 03:03, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Aug 24, 2022 at 04:01:21PM +0100, Dave Page wrote:
> > On Wed, 24 Aug 2022 at 15:18, Bruce Momjian <bruce@momjian.us> wrote:
> > Would it be simpler to allow the sequential and index scan columns to be
> > cleared so you can look later to see if it is non-zero? Should we allow
> >
> > I don't think so, because then stat values wouldn't necessarily correlate with
> > each other, and you wouldn't know when any of them were last reset unless we
> > started tracking each individual reset. At least now you can see when they were
> > all reset, and you know they were reset at the same time.
>
> Yeah, true. I was more asking if these two columns are in some way
> special or if people would want a more general solution, and if so, is
> that something we want in core Postgres.
Back when I used to do a bit of PostgreSQL DBA stuff, I had a nightly
job setup to record the state of pg_stat_all_tables and put that into
another table along with the current date. I then had a view that did
some calculations with col - LAG(col) OVER (PARTITION BY relid ORDER
BY date) to fetch the numerical values for each date. I didn't ever
want to reset the stats because it messes with autovacuum. If you zero
out n_ins_since_vacuum more often than auto-vacuum would trigger, then
bad things happen over time (we should really warn about that in the
docs).
I don't have a particular opinion about the patch, I'm just pointing
out that there are other ways. Even just writing down the numbers on a
post-it note and coming back in a month to see if they've changed is
enough to tell if the table or index has been used.
There are usually other ways to perform monitoring tasks, but there is something to be said for the convenience of having functionality built in and not having to rely on tools, scripts, or post-it notes :-)
We do also need to consider now that stats are stored in shared memory
that any fields we add are in RAM.
That is a fair point. I believe this is both minimal, and useful though.
I've attached a v2 patch that incorporates Greg's suggestions.
Вложения
On Fri, Aug 26, 2022 at 02:05:36PM +0100, Dave Page wrote: > On Thu, 25 Aug 2022 at 01:44, David Rowley <dgrowleyml@gmail.com> wrote: > I don't have a particular opinion about the patch, I'm just pointing > out that there are other ways. Even just writing down the numbers on a > post-it note and coming back in a month to see if they've changed is > enough to tell if the table or index has been used. > > > There are usually other ways to perform monitoring tasks, but there is > something to be said for the convenience of having functionality built in and > not having to rely on tools, scripts, or post-it notes :-) Should we consider using something cheaper like time() so we don't need a GUC to enable this? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
On Tue, 30 Aug 2022 at 19:46, Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Aug 26, 2022 at 02:05:36PM +0100, Dave Page wrote:
> On Thu, 25 Aug 2022 at 01:44, David Rowley <dgrowleyml@gmail.com> wrote:
> I don't have a particular opinion about the patch, I'm just pointing
> out that there are other ways. Even just writing down the numbers on a
> post-it note and coming back in a month to see if they've changed is
> enough to tell if the table or index has been used.
>
>
> There are usually other ways to perform monitoring tasks, but there is
> something to be said for the convenience of having functionality built in and
> not having to rely on tools, scripts, or post-it notes :-)
Should we consider using something cheaper like time() so we don't need
a GUC to enable this?
Interesting idea, but on my mac at least, 100,000,000 gettimeofday() calls takes about 2 seconds, whilst 100,000,000 time() calls takes 14(!) seconds.
On Wed, Aug 31, 2022 at 05:02:33PM +0100, Dave Page wrote: > > > On Tue, 30 Aug 2022 at 19:46, Bruce Momjian <bruce@momjian.us> wrote: > > On Fri, Aug 26, 2022 at 02:05:36PM +0100, Dave Page wrote: > > On Thu, 25 Aug 2022 at 01:44, David Rowley <dgrowleyml@gmail.com> wrote: > > I don't have a particular opinion about the patch, I'm just pointing > > out that there are other ways. Even just writing down the numbers on > a > > post-it note and coming back in a month to see if they've changed is > > enough to tell if the table or index has been used. > > > > > > There are usually other ways to perform monitoring tasks, but there is > > something to be said for the convenience of having functionality built in > and > > not having to rely on tools, scripts, or post-it notes :-) > > Should we consider using something cheaper like time() so we don't need > a GUC to enable this? > > > Interesting idea, but on my mac at least, 100,000,000 gettimeofday() calls > takes about 2 seconds, whilst 100,000,000 time() calls takes 14(!) seconds. Wow. I was just thinking you need second-level accuracy, which must be cheap somewhere. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Hi, On 2022-08-23 10:55:09 +0100, Dave Page wrote: > Often it is beneficial to review one's schema with a view to removing > indexes (and sometimes tables) that are no longer required. It's very > difficult to understand when that is the case by looking at the number of > scans of a relation as, for example, an index may be used infrequently but > may be critical in those times when it is used. > > The attached patch against HEAD adds optional tracking of the last scan > time for relations. It updates pg_stat_*_tables with new last_seq_scan and > last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to > help with this. > > Due to the use of gettimeofday(), those values are only maintained if a new > GUC, track_scans, is set to on. By default, it is off. > > I did run a 12 hour test to see what the performance impact is. pgbench was > run with scale factor 10000 and 75 users across 4 identical bare metal > machines running Rocky 8 in parallel which showed roughly a -2% average > performance penalty against HEAD with track_scans enabled. Machines were > PowerEdge R7525's with 128GB RAM, dual 16C/32T AMD 7302 CPUs, with the data > directory on 6 x 800GB 12Gb/s SSD SAS drives in RAID 0. Kernel time source > is tsc. > > HEAD track_scans Penalty (%) > box1 19582.49735 19341.8881 -1.22869541 > box2 19936.55513 19928.07479 -0.04253664659 > box3 19631.78895 18649.64379 -5.002830696 > box4 19810.86767 19420.67192 -1.969604525 > Average 19740.42728 19335.06965 -2.05343896 Based on the size of those numbers this was a r/w pgbench. If it has this noticable an impact for r/w, with a pretty low number of scans/sec, how's the overhead for r/o (which can have 2 orders of magnitude more scans/sec)? It must be quite bad. I don't think we should accept this feature with this overhead - but I also think we can do better, by accepting a bit less accuracy. For this to be useful we don't need a perfectly accurate timestamp. The statement start time is probably not accurate enough, but we could just have bgwriter or such update one in shared memory every time we wake up? Or perhaps we could go to an even lower granularity, by putting in the current LSN or such? Greetings, Andres Freund
On Wed, 31 Aug 2022 at 18:21, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-08-23 10:55:09 +0100, Dave Page wrote: > > Often it is beneficial to review one's schema with a view to removing > > indexes (and sometimes tables) that are no longer required. It's very > > difficult to understand when that is the case by looking at the number of > > scans of a relation as, for example, an index may be used infrequently but > > may be critical in those times when it is used. > > > > The attached patch against HEAD adds optional tracking of the last scan > > time for relations. It updates pg_stat_*_tables with new last_seq_scan and > > last_idx_scan columns, and pg_stat_*_indexes with a last_idx_scan column to > > help with this. > > > > Due to the use of gettimeofday(), those values are only maintained if a new > > GUC, track_scans, is set to on. By default, it is off. > > > > I did run a 12 hour test to see what the performance impact is. pgbench was > > run with scale factor 10000 and 75 users across 4 identical bare metal > > machines running Rocky 8 in parallel which showed roughly a -2% average > > performance penalty against HEAD with track_scans enabled. Machines were > > PowerEdge R7525's with 128GB RAM, dual 16C/32T AMD 7302 CPUs, with the data > > directory on 6 x 800GB 12Gb/s SSD SAS drives in RAID 0. Kernel time source > > is tsc. > > > > HEAD track_scans Penalty (%) > > box1 19582.49735 19341.8881 -1.22869541 > > box2 19936.55513 19928.07479 -0.04253664659 > > box3 19631.78895 18649.64379 -5.002830696 > > box4 19810.86767 19420.67192 -1.969604525 > > Average 19740.42728 19335.06965 -2.05343896 > > Based on the size of those numbers this was a r/w pgbench. If it has this > noticable an impact for r/w, with a pretty low number of scans/sec, how's the > overhead for r/o (which can have 2 orders of magnitude more scans/sec)? It > must be quite bad. > > I don't think we should accept this feature with this overhead - but I also > think we can do better, by accepting a bit less accuracy. For this to be > useful we don't need a perfectly accurate timestamp. The statement start time > is probably not accurate enough, but we could just have bgwriter or such > update one in shared memory every time we wake up? Or perhaps we could go to > an even lower granularity, by putting in the current LSN or such? I don't think that LSN is precise enough. For example, if you're in a (mostly) read-only system, the system may go long times without any meaningful records being written. As for having a lower granularity and preventing the one-syscall-per-Relation issue, can't we reuse the query_start or state_change timestamps that appear in pg_stat_activity (potentially updated immediately before this stat flush), or some other per-backend timestamp that is already maintained and considered accurate enough for this use? Regardless, with this patch as it is we get a new timestamp for each relation processed, which I think is a waste of time (heh) even in VDSO-enabled systems. Apart from the above, I don't have any other meaningful opinion on this patch - it might be a good addition, but I don't consume stats often enough to make a good cost / benefit comparison. Kind regards, Matthias van de Meent
On Wed, Aug 31, 2022 at 07:52:49PM +0200, Matthias van de Meent wrote: > As for having a lower granularity and preventing the > one-syscall-per-Relation issue, can't we reuse the query_start or > state_change timestamps that appear in pg_stat_activity (potentially Yeah, query start should be fine, but not transaction start time. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Hi, On 2022-08-31 19:52:49 +0200, Matthias van de Meent wrote: > As for having a lower granularity and preventing the > one-syscall-per-Relation issue, can't we reuse the query_start or > state_change timestamps that appear in pg_stat_activity (potentially > updated immediately before this stat flush), or some other per-backend > timestamp that is already maintained and considered accurate enough > for this use? The problem is that it won't change at all for a query that runs for a week - and we'll report the timestamp from a week ago when it finally ends. But given this is done when stats are flushed, which only happens after the transaction ended, we can just use GetCurrentTransactionStopTimestamp() - if we got to flushing the transaction stats we'll already have computed that. > tabentry->numscans += lstats->t_counts.t_numscans; > + if (pgstat_track_scans && lstats->t_counts.t_numscans) > + tabentry->lastscan = GetCurrentTimestamp(); Besides replacing GetCurrentTimestamp() with GetCurrentTransactionStopTimestamp(), this should then also check if tabentry->lastscan is already newer than the new timestamp. Greetings, Andres Freund
On Wed, Aug 31, 2022 at 11:56:29AM -0700, Andres Freund wrote: > Hi, > > On 2022-08-31 19:52:49 +0200, Matthias van de Meent wrote: > > As for having a lower granularity and preventing the > > one-syscall-per-Relation issue, can't we reuse the query_start or > > state_change timestamps that appear in pg_stat_activity (potentially > > updated immediately before this stat flush), or some other per-backend > > timestamp that is already maintained and considered accurate enough > > for this use? > > The problem is that it won't change at all for a query that runs for a week - > and we'll report the timestamp from a week ago when it finally ends. > > But given this is done when stats are flushed, which only happens after the > transaction ended, we can just use GetCurrentTransactionStopTimestamp() - if > we got to flushing the transaction stats we'll already have computed that. Oh, good point --- it is safer to show a more recent time than a too-old time. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
On Wed, 31 Aug 2022 at 17:13, Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Aug 31, 2022 at 05:02:33PM +0100, Dave Page wrote:
>
>
> On Tue, 30 Aug 2022 at 19:46, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Fri, Aug 26, 2022 at 02:05:36PM +0100, Dave Page wrote:
> > On Thu, 25 Aug 2022 at 01:44, David Rowley <dgrowleyml@gmail.com> wrote:
> > I don't have a particular opinion about the patch, I'm just pointing
> > out that there are other ways. Even just writing down the numbers on
> a
> > post-it note and coming back in a month to see if they've changed is
> > enough to tell if the table or index has been used.
> >
> >
> > There are usually other ways to perform monitoring tasks, but there is
> > something to be said for the convenience of having functionality built in
> and
> > not having to rely on tools, scripts, or post-it notes :-)
>
> Should we consider using something cheaper like time() so we don't need
> a GUC to enable this?
>
>
> Interesting idea, but on my mac at least, 100,000,000 gettimeofday() calls
> takes about 2 seconds, whilst 100,000,000 time() calls takes 14(!) seconds.
Wow. I was just thinking you need second-level accuracy, which must be
cheap somewhere.
Second-level accuracy would indeed be fine for this. Frankly, for my use case just the date would be enough, but I can imagine people wanting greater accuracy than that.
And yes, I was very surprised by the timing results I got as well. I guess it's a quirk of macOS - on a Linux box I get ~4s for gettimeofday() and ~1s for time().
On Thu, Sep 1, 2022 at 09:46:59AM +0100, Dave Page wrote: > On Wed, 31 Aug 2022 at 17:13, Bruce Momjian <bruce@momjian.us> wrote: > Wow. I was just thinking you need second-level accuracy, which must be > cheap somewhere. > > > Second-level accuracy would indeed be fine for this. Frankly, for my use case > just the date would be enough, but I can imagine people wanting greater > accuracy than that. > > And yes, I was very surprised by the timing results I got as well. I guess it's > a quirk of macOS - on a Linux box I get ~4s for gettimeofday() and ~1s for time > (). i think we lose 95% of our users if we require it to be enabled so let's work to find a way it can be always enabled. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
On Thu, 1 Sept 2022 at 13:04, Bruce Momjian <bruce@momjian.us> wrote:
On Thu, Sep 1, 2022 at 09:46:59AM +0100, Dave Page wrote:
> On Wed, 31 Aug 2022 at 17:13, Bruce Momjian <bruce@momjian.us> wrote:
> Wow. I was just thinking you need second-level accuracy, which must be
> cheap somewhere.
>
>
> Second-level accuracy would indeed be fine for this. Frankly, for my use case
> just the date would be enough, but I can imagine people wanting greater
> accuracy than that.
>
> And yes, I was very surprised by the timing results I got as well. I guess it's
> a quirk of macOS - on a Linux box I get ~4s for gettimeofday() and ~1s for time
> ().
i think we lose 95% of our users if we require it to be enabled so let's
work to find a way it can be always enabled.
So based on Andres' suggestion, something like this seems like it might work:
{
TimestampTz t = GetCurrentTransactionStopTimestamp();
if (t > tabentry->lastscan)
tabentry->lastscan = t;
}
If that seems like a good option, I can run some more benchmarks (and then remove the GUC if it looks good).
On Wed, 31 Aug 2022 at 20:56, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-08-31 19:52:49 +0200, Matthias van de Meent wrote: > > As for having a lower granularity and preventing the > > one-syscall-per-Relation issue, can't we reuse the query_start or > > state_change timestamps that appear in pg_stat_activity (potentially > > updated immediately before this stat flush), or some other per-backend > > timestamp that is already maintained and considered accurate enough > > for this use? > > The problem is that it won't change at all for a query that runs for a week - > and we'll report the timestamp from a week ago when it finally ends. This earlier proposal to reuse pg_stat_activity values is also invalid because those timestamps don't exist when you SET track_activities = OFF. > But given this is done when stats are flushed, which only happens after the > transaction ended, we can just use GetCurrentTransactionStopTimestamp() - if > we got to flushing the transaction stats we'll already have computed that. I'm not entirely happy with that, as that would still add function call overhead, and potentially still call GetCurrentTimestamp() in this somewhat hot loop. As an alternative, we could wire the `now` variable in pgstat_report_stat (generated from GetCurrentTransactionStopTimestamp() into pgstat_flush_pending_entries and then into flush_pending_cb (or, store this in a static variable) so that we can reuse that value, saving any potential function call overhead. > > tabentry->numscans += lstats->t_counts.t_numscans; > > + if (pgstat_track_scans && lstats->t_counts.t_numscans) > > + tabentry->lastscan = GetCurrentTimestamp(); > > Besides replacing GetCurrentTimestamp() with > GetCurrentTransactionStopTimestamp(), this should then also check if > tabentry->lastscan is already newer than the new timestamp. I wonder how important that is. This value only gets set in a stats flush, which may skew the stat update by several seconds (up to PGSTAT_MAX_INTERVAL). I don't expect concurrent flushes to take so long that it will set the values to It is possible, but I think it is extremely unlikely that this is going to be important when you consider that these stat flushes are not expected to run for more than 1 second. Kind regards, Matthias van de Meent
Hi, On 2022-09-01 14:18:42 +0200, Matthias van de Meent wrote: > On Wed, 31 Aug 2022 at 20:56, Andres Freund <andres@anarazel.de> wrote: > > But given this is done when stats are flushed, which only happens after the > > transaction ended, we can just use GetCurrentTransactionStopTimestamp() - if > > we got to flushing the transaction stats we'll already have computed that. > > I'm not entirely happy with that, as that would still add function > call overhead, and potentially still call GetCurrentTimestamp() in > this somewhat hot loop. We already used GetCurrentTransactionStopTimestamp() (as you reference below) before we get to this point, so I doubt that we'll ever call GetCurrentTimestamp(). And it's hard to imagine that the function call overhead of GetCurrentTransactionStopTimestamp() matters compared to acquiring locks etc. > As an alternative, we could wire the `now` variable in > pgstat_report_stat (generated from > GetCurrentTransactionStopTimestamp() into pgstat_flush_pending_entries > and then into flush_pending_cb (or, store this in a static variable) > so that we can reuse that value, saving any potential function call > overhead. Passing it in doesn't clearly seem an improvement, but I also don't have a strong opinion on it. I am strongly against the static variable approach. > > > tabentry->numscans += lstats->t_counts.t_numscans; > > > + if (pgstat_track_scans && lstats->t_counts.t_numscans) > > > + tabentry->lastscan = GetCurrentTimestamp(); > > > > Besides replacing GetCurrentTimestamp() with > > GetCurrentTransactionStopTimestamp(), this should then also check if > > tabentry->lastscan is already newer than the new timestamp. > > I wonder how important that is. This value only gets set in a stats > flush, which may skew the stat update by several seconds (up to > PGSTAT_MAX_INTERVAL). I don't expect concurrent flushes to take so > long that it will set the values to It is possible, but I think it is > extremely unlikely that this is going to be important when you > consider that these stat flushes are not expected to run for more than > 1 second. I think it'll be confusing if you have values going back and forth, even if just by a little. And it's cheap to defend against, so why not just do that? Greetings, Andres Freund
Hi
On Thu, 1 Sept 2022 at 19:35, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2022-09-01 14:18:42 +0200, Matthias van de Meent wrote:
> On Wed, 31 Aug 2022 at 20:56, Andres Freund <andres@anarazel.de> wrote:
> > But given this is done when stats are flushed, which only happens after the
> > transaction ended, we can just use GetCurrentTransactionStopTimestamp() - if
> > we got to flushing the transaction stats we'll already have computed that.
>
> I'm not entirely happy with that, as that would still add function
> call overhead, and potentially still call GetCurrentTimestamp() in
> this somewhat hot loop.
We already used GetCurrentTransactionStopTimestamp() (as you reference below)
before we get to this point, so I doubt that we'll ever call
GetCurrentTimestamp(). And it's hard to imagine that the function call
overhead of GetCurrentTransactionStopTimestamp() matters compared to acquiring
locks etc.
Vik and I looked at this a little, and found that we actually don't have generally have GetCurrentTransactionStopTimestamp() at this point - a simple 'select * from pg_class' will result in 9 passes of this code, none of which have xactStopTimestamp != 0.
After discussing it a little, we came to the conclusion that for the stated use case, xactStartTimestamp is actually accurate enough, provided that we only ever update it with a newer value. It would only likely be in extreme edge-cases where the difference between start and end transaction time would have any bearing on whether or not one might drop a table/index for lack of use.
Doing it this way also means we no longer need the GUC to enable the feature, which as Bruce notes, is likely to lose 95% of users.
Updated patch attached:
- GUC removed.
- The timestamp recorded is xactStartTimestamp.
- Docs updated to make it clear we're recording transaction start time.
Вложения
Hi, On 2022-09-06 14:15:56 +0100, Dave Page wrote: > Vik and I looked at this a little, and found that we actually don't have > generally have GetCurrentTransactionStopTimestamp() at this point - a > simple 'select * from pg_class' will result in 9 passes of this code, none > of which have xactStopTimestamp != 0. Huh, pgstat_report_stat() used GetCurrentTransactionStopTimestamp() has used for a long time. Wonder when that was broken. Looks like it's set only when a xid is assigned. We should fix this. > After discussing it a little, we came to the conclusion that for the stated > use case, xactStartTimestamp is actually accurate enough, provided that we > only ever update it with a newer value. It would only likely be in extreme > edge-cases where the difference between start and end transaction time > would have any bearing on whether or not one might drop a table/index for > lack of use. I don't at all agree with this. Since we already use GetCurrentTransactionStopTimestamp() in this path we should fix it. Greetings, Andres Freund
At Tue, 6 Sep 2022 08:53:25 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2022-09-06 14:15:56 +0100, Dave Page wrote: > > Vik and I looked at this a little, and found that we actually don't have > > generally have GetCurrentTransactionStopTimestamp() at this point - a > > simple 'select * from pg_class' will result in 9 passes of this code, none > > of which have xactStopTimestamp != 0. > > Huh, pgstat_report_stat() used GetCurrentTransactionStopTimestamp() has used > for a long time. Wonder when that was broken. Looks like it's set only when a > xid is assigned. We should fix this. /* * GetCurrentTransactionStopTimestamp * * We return current time if the transaction stop time hasn't been set * (which can happen if we decide we don't need to log an XLOG record). So, that seems like intentional since 2007 (957d08c81f). It seems to me that the patch assumes that the only other use of the timstamp is pgstats and it didn't let GetCurrentTransactionStopTimestamp() set the variable for future use. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi
On Tue, 6 Sept 2022 at 16:53, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2022-09-06 14:15:56 +0100, Dave Page wrote:
> Vik and I looked at this a little, and found that we actually don't have
> generally have GetCurrentTransactionStopTimestamp() at this point - a
> simple 'select * from pg_class' will result in 9 passes of this code, none
> of which have xactStopTimestamp != 0.
Huh, pgstat_report_stat() used GetCurrentTransactionStopTimestamp() has used
for a long time. Wonder when that was broken. Looks like it's set only when a
xid is assigned. We should fix this.
> After discussing it a little, we came to the conclusion that for the stated
> use case, xactStartTimestamp is actually accurate enough, provided that we
> only ever update it with a newer value. It would only likely be in extreme
> edge-cases where the difference between start and end transaction time
> would have any bearing on whether or not one might drop a table/index for
> lack of use.
I don't at all agree with this. Since we already use
GetCurrentTransactionStopTimestamp() in this path we should fix it.
I just spent some time looking at this, and as far as I can see, we only set xactStopTimestamp if the transaction needs to be WAL logged (and in those cases, it is set before the stats callback runs). As you note though, we are already calling GetCurrentTransactionStopTimestamp() in the read-only case anyway, and thus already incurring the cost of gettimeofday().
Here's a v4 patch. This reverts to using GetCurrentTransactionStopTimestamp() for the last_scan times, and will set xactStopTimestamp the first time GetCurrentTransactionStopTimestamp() is called, thus avoiding multiple gettimeofday() calls. SetCurrentTransactionStopTimestamp() is removed, as is use of xactStopTimestamp (except when resetting it to 0).
Вложения
On 9/7/22 12:03, Dave Page wrote: > Here's a v4 patch. This reverts to using > GetCurrentTransactionStopTimestamp() for the last_scan times, and will > set xactStopTimestamp the first time GetCurrentTransactionStopTimestamp() > is called, thus avoiding multiple gettimeofday() calls. > SetCurrentTransactionStopTimestamp() is removed, as is use > of xactStopTimestamp (except when resetting it to 0). This patch looks good to me and has much saner behavior than what it replaces. As a matter of process, the oid for the new function should be in the 8000-9000 range and the catversion should be bumped by the committer. Marked as Ready for Committer. Thanks for the patch! -- Vik Fearing
Hi, On 2022-09-30 17:58:31 +0200, Vik Fearing wrote: > On 9/7/22 12:03, Dave Page wrote: > > Here's a v4 patch. This reverts to using > > GetCurrentTransactionStopTimestamp() for the last_scan times, and will > > set xactStopTimestamp the first time GetCurrentTransactionStopTimestamp() > > is called, thus avoiding multiple gettimeofday() calls. > > SetCurrentTransactionStopTimestamp() is removed, as is use > > of xactStopTimestamp (except when resetting it to 0). > > This patch looks good to me and has much saner behavior than what it > replaces. I agree. However, it seems like a significant enough behavioural change that I'd rather commit it as a separate patch. I agree with Vik's judgement that the patch otherwise is otherwise ready. Happy to do that split myself, or you can do it... Greetings, Andres Freund
Hi
On Fri, 30 Sept 2022 at 18:58, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2022-09-30 17:58:31 +0200, Vik Fearing wrote:
> On 9/7/22 12:03, Dave Page wrote:
> > Here's a v4 patch. This reverts to using
> > GetCurrentTransactionStopTimestamp() for the last_scan times, and will
> > set xactStopTimestamp the first time GetCurrentTransactionStopTimestamp()
> > is called, thus avoiding multiple gettimeofday() calls.
> > SetCurrentTransactionStopTimestamp() is removed, as is use
> > of xactStopTimestamp (except when resetting it to 0).
>
> This patch looks good to me and has much saner behavior than what it
> replaces.
I agree. However, it seems like a significant enough behavioural change that
I'd rather commit it as a separate patch. I agree with Vik's judgement that
the patch otherwise is otherwise ready. Happy to do that split myself, or you
can do it...
Thanks. It's just the changes in xact.c, so it doesn't seem like it would cause you any more work either way, in which case, I'll leave it to you :-)
FYI, the OID I chose was simply the closest single value to those used for the other related functions (e.g. pg_stat_get_numscans). Seemed like a good way to use up one more random unused value, but I don't care if it gets changed to the 8000+ range.
On Mon, Oct 03, 2022 at 12:55:40PM +0100, Dave Page wrote: > Thanks. It's just the changes in xact.c, so it doesn't seem like it would > cause you any more work either way, in which case, I'll leave it to you :-) Okay, I have just moved the patch to the next CF then, still marked as ready for committer. Are you planning to look at that? -- Michael
Вложения
On Wed, 12 Oct 2022 at 07:40, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Oct 03, 2022 at 12:55:40PM +0100, Dave Page wrote:
> Thanks. It's just the changes in xact.c, so it doesn't seem like it would
> cause you any more work either way, in which case, I'll leave it to you :-)
Okay, I have just moved the patch to the next CF then, still marked as
ready for committer. Are you planning to look at that?
Thanks. Was the question directed at me or Andres?
On Wed, Oct 12, 2022 at 09:09:46AM +0100, Dave Page wrote: > On Wed, 12 Oct 2022 at 07:40, Michael Paquier <michael@paquier.xyz> wrote: >> Okay, I have just moved the patch to the next CF then, still marked as >> ready for committer. Are you planning to look at that? > > Thanks. Was the question directed at me or Andres? Apologies for the confusion. This question is addressed to Andres. -- Michael
Вложения
Hi, On 2022-10-12 15:40:21 +0900, Michael Paquier wrote: > On Mon, Oct 03, 2022 at 12:55:40PM +0100, Dave Page wrote: > > Thanks. It's just the changes in xact.c, so it doesn't seem like it would > > cause you any more work either way, in which case, I'll leave it to you :-) > > Okay, I have just moved the patch to the next CF then, still marked as > ready for committer. Are you planning to look at that? Yep, doing so right now. I think this should have at a basic test in src/test/regress/sql/stats.sql. If I can write one in a few minutes I'll go for that, otherwise will reply detailing difficulties. > + <para> > + The time of the last sequential scan of this table, based on the > + most recent transaction stop time > + </para></entry> Related rows seem to say "on this table". Greetings, Andres Freund
Hi, On 2022-10-12 12:50:31 -0700, Andres Freund wrote: > I think this should have at a basic test in src/test/regress/sql/stats.sql. If > I can write one in a few minutes I'll go for that, otherwise will reply > detailing difficulties. Took a bit longer (+lunch). Attached. In the attached 0001, the patch to make GetCurrentTransactionStopTimestamp() set xactStopTimestamp, I added a few comment updates and an Assert() to ensure that CurrentTransactionState->state is TRANS_(DEFAULT|COMMIT|ABORT|PREPARE). I am worried that otherwise we might end up with someone ending up using it in a place before the end of the transaction, which'd then end up recording the wrong timestamp in the commit/abort record. For 0002, the commit adding lastscan, I added catversion/stats version bumps (because I was planning to commit it already...), a commit message, and that minor docs change mentioned earlier. 0003 adds the tests mentioned above. I plan to merge them with 0002, but left them separate for easier review for now. To be able to compare timestamps for > not just >= we need to make sure that two subsequent timestamps differ. The attached achieves this by sleeping for 100ms between those points - we do that in other places already. I'd started out with 10ms, which I am fairly sure would suffice, but then deciced to copy the existing 100ms sleeps. I verified tests pass under valgrind, debug_discard_caches and after I make pgstat_report_stat() only flush when force is passed in. Greetings, Andres Freund
Вложения
Hi
On Wed, 12 Oct 2022 at 23:52, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2022-10-12 12:50:31 -0700, Andres Freund wrote:
> I think this should have at a basic test in src/test/regress/sql/stats.sql. If
> I can write one in a few minutes I'll go for that, otherwise will reply
> detailing difficulties.
Took a bit longer (+lunch). Attached.
In the attached 0001, the patch to make GetCurrentTransactionStopTimestamp()
set xactStopTimestamp, I added a few comment updates and an Assert() to ensure
that CurrentTransactionState->state is TRANS_(DEFAULT|COMMIT|ABORT|PREPARE). I
am worried that otherwise we might end up with someone ending up using it in a
place before the end of the transaction, which'd then end up recording the
wrong timestamp in the commit/abort record.
For 0002, the commit adding lastscan, I added catversion/stats version bumps
(because I was planning to commit it already...), a commit message, and that
minor docs change mentioned earlier.
0003 adds the tests mentioned above. I plan to merge them with 0002, but left
them separate for easier review for now.
To be able to compare timestamps for > not just >= we need to make sure that
two subsequent timestamps differ. The attached achieves this by sleeping for
100ms between those points - we do that in other places already. I'd started
out with 10ms, which I am fairly sure would suffice, but then deciced to copy
the existing 100ms sleeps.
I verified tests pass under valgrind, debug_discard_caches and after I make
pgstat_report_stat() only flush when force is passed in.
Thanks for that. It looks good to me, bar one comment (repeated 3 times in the sql and expected files):
fetch timestamps from before the next test
"from " should be removed.
Hi, On 2022-10-13 14:38:06 +0100, Dave Page wrote: > Thanks for that. It looks good to me, bar one comment (repeated 3 times in > the sql and expected files): > > fetch timestamps from before the next test > > "from " should be removed. I was trying to say something with that from, but clearly it wasn't understandable :). Removed. With that I pushed the changes and marked the CF entry as committed. Thanks for the feature Dave and the reviews everyone. Greetings, Andres Freund
On Fri, 14 Oct 2022 at 19:16, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2022-10-13 14:38:06 +0100, Dave Page wrote:
> Thanks for that. It looks good to me, bar one comment (repeated 3 times in
> the sql and expected files):
>
> fetch timestamps from before the next test
>
> "from " should be removed.
I was trying to say something with that from, but clearly it wasn't
understandable :). Removed.
With that I pushed the changes and marked the CF entry as committed.
Thanks!
On Fri, Oct 14, 2022 at 2:55 PM Dave Page <dpage@pgadmin.org> wrote: > On Fri, 14 Oct 2022 at 19:16, Andres Freund <andres@anarazel.de> wrote: >> On 2022-10-13 14:38:06 +0100, Dave Page wrote: >> > Thanks for that. It looks good to me, bar one comment (repeated 3 times in >> > the sql and expected files): >> > >> > fetch timestamps from before the next test >> > >> > "from " should be removed. >> >> I was trying to say something with that from, but clearly it wasn't >> understandable :). Removed. >> >> With that I pushed the changes and marked the CF entry as committed. > > > Thanks! > Hey folks, I was looking at this a bit further (great addition btw) and noticed the following behavior (this is a mre of the original testing that uncovered this): pagila=# select * from pg_stat_user_tables ; relid | schemaname | relname | seq_scan | last_seq_scan | seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count -------+------------+---------+----------+---------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------------- (0 rows) pagila=# create table x (xx int); CREATE TABLE Time: 2.145 ms pagila=# select * from pg_stat_user_tables ; relid | schemaname | relname | seq_scan | last_seq_scan | seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count -------+------------+---------+----------+---------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------------- 16392 | public | x | 0 | [null] | 0 | [null] | [null] | [null] | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | [null] | [null] | [null] | [null] | 0 | 0 | 0 | 0 (1 row) pagila=# insert into x select 1; INSERT 0 1 pagila=# select * from pg_stat_user_tables ; relid | schemaname | relname | seq_scan | last_seq_scan | seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count -------+------------+---------+----------+------------------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------------- 16392 | public | x | 0 | 1999-12-31 19:00:00-05 | 0 | [null] | [null] | [null] | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | [null] | [null] | [null] | [null] | 0 | 0 | 0 | 0 (1 row) Normally we populate "last" columns with a NULL value when the corresponding marker is zero, which seems correct in the first query, but no longer matches in the second. I can see an argument that this is a necessary exception to that rule (I'm not sure I agree with it, but I see it) but even in that scenario, ISTM we should avoid populating the table with a "special value", which generally goes against observability best practices, and I believe we've been able to avoid it elsewhere. Beyond that, I also notice the behavior changes when adding a table with a PK, though not necessarily better... pagila=# drop table x; DROP TABLE Time: 2.896 ms pagila=# select * from pg_stat_user_tables ; relid | schemaname | relname | seq_scan | last_seq_scan | seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count -------+------------+---------+----------+---------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------------- (0 rows) pagila=# create table x (xx int primary key) ; CREATE TABLE pagila=# select * from pg_stat_user_tables ; relid | schemaname | relname | seq_scan | last_seq_scan | seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count -------+------------+---------+----------+-------------------------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------------- 16400 | public | x | 1 | 2022-10-23 15:53:04.935192-04 | 0 | 0 | [null] | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | [null] | [null] | [null] | [null] | 0 | 0 | 0 | 0 (1 row) pagila=# insert into x select 1; INSERT 0 1 pagila=# select * from pg_stat_user_tables ; relid | schemaname | relname | seq_scan | last_seq_scan | seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count -------+------------+---------+----------+-------------------------------+--------------+----------+------------------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+------------------- 16400 | public | x | 1 | 2022-10-23 15:53:04.935192-04 | 0 | 0 | 1999-12-31 19:00:00-05 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | [null] | [null] | [null] | [null] | 0 | 0 | 0 | 0 (1 row) This time the create table both populate a sequential scan and populates the last_seq_scan with a real/correct value. However an insert into the table neither advances the seq_scan nor the last_seq_scan values which seems like different behavior from my original example, with the added negative that the last_idx_scan is now populated with a special value :-( I think the simplest fix which should correspond to existing versions behavior would be to just ensure that we replace any "special value" timestamps with a real transaction timestamp, and then maybe note that these fields may be advanced by operations which don't strictly show up as a sequential or index scan. Robert Treat https://xzilla.net
FYI, this is not intentional, and I do plan to look into it, however I've been somewhat busy with pgconfeu, and am travelling for the rest of this week as well.
On Sun, 23 Oct 2022 at 21:09, Robert Treat <rob@xzilla.net> wrote:
On Fri, Oct 14, 2022 at 2:55 PM Dave Page <dpage@pgadmin.org> wrote:
> On Fri, 14 Oct 2022 at 19:16, Andres Freund <andres@anarazel.de> wrote:
>> On 2022-10-13 14:38:06 +0100, Dave Page wrote:
>> > Thanks for that. It looks good to me, bar one comment (repeated 3 times in
>> > the sql and expected files):
>> >
>> > fetch timestamps from before the next test
>> >
>> > "from " should be removed.
>>
>> I was trying to say something with that from, but clearly it wasn't
>> understandable :). Removed.
>>
>> With that I pushed the changes and marked the CF entry as committed.
>
>
> Thanks!
>
Hey folks,
I was looking at this a bit further (great addition btw) and noticed
the following behavior (this is a mre of the original testing that
uncovered this):
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+---------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
(0 rows)
pagila=# create table x (xx int);
CREATE TABLE
Time: 2.145 ms
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+---------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
16392 | public | x | 0 | [null] |
0 | [null] | [null] | [null] | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | [null] | [null] | [null]
| [null] | 0 | 0 | 0 |
0
(1 row)
pagila=# insert into x select 1;
INSERT 0 1
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+------------------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
16392 | public | x | 0 | 1999-12-31 19:00:00-05 |
0 | [null] | [null] | [null] | 1 |
0 | 0 | 0 | 1 | 0 |
1 | 1 | [null] | [null] |
[null] | [null] | 0 | 0 |
0 | 0
(1 row)
Normally we populate "last" columns with a NULL value when the
corresponding marker is zero, which seems correct in the first query,
but no longer matches in the second. I can see an argument that this
is a necessary exception to that rule (I'm not sure I agree with it,
but I see it) but even in that scenario, ISTM we should avoid
populating the table with a "special value", which generally goes
against observability best practices, and I believe we've been able to
avoid it elsewhere.
Beyond that, I also notice the behavior changes when adding a table
with a PK, though not necessarily better...
pagila=# drop table x;
DROP TABLE
Time: 2.896 ms
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+---------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
(0 rows)
pagila=# create table x (xx int primary key) ;
CREATE TABLE
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan
| seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch |
n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup |
n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+-------------------------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
16400 | public | x | 1 | 2022-10-23
15:53:04.935192-04 | 0 | 0 | [null] |
0 | 0 | 0 | 0 | 0 | 0
| 0 | 0 | 0 | [null]
| [null] | [null] | [null] | 0 |
0 | 0 | 0
(1 row)
pagila=# insert into x select 1;
INSERT 0 1
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan
| seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch |
n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup |
n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+-------------------------------+--------------+----------+------------------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
16400 | public | x | 1 | 2022-10-23
15:53:04.935192-04 | 0 | 0 | 1999-12-31 19:00:00-05
| 0 | 1 | 0 | 0 | 0 |
1 | 0 | 1 | 1 |
[null] | [null] | [null] | [null] |
0 | 0 | 0 | 0
(1 row)
This time the create table both populate a sequential scan and
populates the last_seq_scan with a real/correct value. However an
insert into the table neither advances the seq_scan nor the
last_seq_scan values which seems like different behavior from my
original example, with the added negative that the last_idx_scan is
now populated with a special value :-(
I think the simplest fix which should correspond to existing versions
behavior would be to just ensure that we replace any "special value"
timestamps with a real transaction timestamp, and then maybe note that
these fields may be advanced by operations which don't strictly show
up as a sequential or index scan.
Robert Treat
https://xzilla.net
On Mon, 31 Oct 2022 at 07:36, Dave Page <dpage@pgadmin.org> wrote:
FYI, this is not intentional, and I do plan to look into it, however I've been somewhat busy with pgconfeu, and am travelling for the rest of this week as well.
Here's a patch to fix this issue. Many thanks to Peter Eisentraut who figured it out in a few minutes after I spent far too long looking down rabbit holes in entirely the wrong place.
Thanks for the bug report.
On Sun, 23 Oct 2022 at 21:09, Robert Treat <rob@xzilla.net> wrote:On Fri, Oct 14, 2022 at 2:55 PM Dave Page <dpage@pgadmin.org> wrote:
> On Fri, 14 Oct 2022 at 19:16, Andres Freund <andres@anarazel.de> wrote:
>> On 2022-10-13 14:38:06 +0100, Dave Page wrote:
>> > Thanks for that. It looks good to me, bar one comment (repeated 3 times in
>> > the sql and expected files):
>> >
>> > fetch timestamps from before the next test
>> >
>> > "from " should be removed.
>>
>> I was trying to say something with that from, but clearly it wasn't
>> understandable :). Removed.
>>
>> With that I pushed the changes and marked the CF entry as committed.
>
>
> Thanks!
>
Hey folks,
I was looking at this a bit further (great addition btw) and noticed
the following behavior (this is a mre of the original testing that
uncovered this):
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+---------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
(0 rows)
pagila=# create table x (xx int);
CREATE TABLE
Time: 2.145 ms
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+---------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
16392 | public | x | 0 | [null] |
0 | [null] | [null] | [null] | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | [null] | [null] | [null]
| [null] | 0 | 0 | 0 |
0
(1 row)
pagila=# insert into x select 1;
INSERT 0 1
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+------------------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
16392 | public | x | 0 | 1999-12-31 19:00:00-05 |
0 | [null] | [null] | [null] | 1 |
0 | 0 | 0 | 1 | 0 |
1 | 1 | [null] | [null] |
[null] | [null] | 0 | 0 |
0 | 0
(1 row)
Normally we populate "last" columns with a NULL value when the
corresponding marker is zero, which seems correct in the first query,
but no longer matches in the second. I can see an argument that this
is a necessary exception to that rule (I'm not sure I agree with it,
but I see it) but even in that scenario, ISTM we should avoid
populating the table with a "special value", which generally goes
against observability best practices, and I believe we've been able to
avoid it elsewhere.
Beyond that, I also notice the behavior changes when adding a table
with a PK, though not necessarily better...
pagila=# drop table x;
DROP TABLE
Time: 2.896 ms
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan |
seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch | n_tup_ins |
n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup |
n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+---------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
(0 rows)
pagila=# create table x (xx int primary key) ;
CREATE TABLE
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan
| seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch |
n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup |
n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+-------------------------------+--------------+----------+---------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
16400 | public | x | 1 | 2022-10-23
15:53:04.935192-04 | 0 | 0 | [null] |
0 | 0 | 0 | 0 | 0 | 0
| 0 | 0 | 0 | [null]
| [null] | [null] | [null] | 0 |
0 | 0 | 0
(1 row)
pagila=# insert into x select 1;
INSERT 0 1
pagila=# select * from pg_stat_user_tables ;
relid | schemaname | relname | seq_scan | last_seq_scan
| seq_tup_read | idx_scan | last_idx_scan | idx_tup_fetch |
n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup |
n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze | vacuum_count |
autovacuum_count | analyze_count | autoanalyze_count
-------+------------+---------+----------+-------------------------------+--------------+----------+------------------------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+--------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
16400 | public | x | 1 | 2022-10-23
15:53:04.935192-04 | 0 | 0 | 1999-12-31 19:00:00-05
| 0 | 1 | 0 | 0 | 0 |
1 | 0 | 1 | 1 |
[null] | [null] | [null] | [null] |
0 | 0 | 0 | 0
(1 row)
This time the create table both populate a sequential scan and
populates the last_seq_scan with a real/correct value. However an
insert into the table neither advances the seq_scan nor the
last_seq_scan values which seems like different behavior from my
original example, with the added negative that the last_idx_scan is
now populated with a special value :-(
I think the simplest fix which should correspond to existing versions
behavior would be to just ensure that we replace any "special value"
timestamps with a real transaction timestamp, and then maybe note that
these fields may be advanced by operations which don't strictly show
up as a sequential or index scan.
Robert Treat
https://xzilla.net--
Вложения
On Thu, Nov 03, 2022 at 04:44:16PM -0400, Dave Page wrote: > Here's a patch to fix this issue. Many thanks to Peter Eisentraut who > figured it out in a few minutes after I spent far too long looking down > rabbit holes in entirely the wrong place. FWIW, all the other areas of pgstatfuncs.c manipulate timestamptz fields with a style like the attached. That's a nit, still per the role of consistency with the surroundings.. Anyway, it seems to me that a regression test is in order before a scan happens just after the relation creation, and the same problem shows up with last_idx_scan. -- Michael
Вложения
On Mon, Nov 07, 2022 at 04:54:07PM +0900, Michael Paquier wrote: > FWIW, all the other areas of pgstatfuncs.c manipulate timestamptz > fields with a style like the attached. That's a nit, still per the > role of consistency with the surroundings.. > > Anyway, it seems to me that a regression test is in order before a > scan happens just after the relation creation, and the same problem > shows up with last_idx_scan. Hearing nothing, done this way as of d7744d5. Thanks for the report, Robert. And thanks for the patch, Dave. -- Michael
Вложения
On Tue, 8 Nov 2022 at 04:10, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Nov 07, 2022 at 04:54:07PM +0900, Michael Paquier wrote:
> FWIW, all the other areas of pgstatfuncs.c manipulate timestamptz
> fields with a style like the attached. That's a nit, still per the
> role of consistency with the surroundings..
>
> Anyway, it seems to me that a regression test is in order before a
> scan happens just after the relation creation, and the same problem
> shows up with last_idx_scan.
Hearing nothing, done this way as of d7744d5. Thanks for the report,
Robert. And thanks for the patch, Dave.
Thank you!