Yeah. The only use-case that's been suggested is detecting an
unresponsive stats collector, and the main timestamp should be plenty for
that.
Lately, I've spent most of my time doing investigation into increasing qps. Turned out we've been able to triple our throughput by monitoring experiments at highly granular time steps (1 to 2 seconds). Effects that were invisible with 30 second polls of the stats were obvious with 2 second polls.
The problem with doing highly granular snapshots is that the postgres counters are monotonically increasing, but only when stats are published. Currently you have no option except to divide by the delta of now() between the polling intervals. If you poll every 2 seconds the max error is about .5/2 or 25%. It makes reading those numbers a bit noisy. Using (snapshot_timestamp_new - snapshot_timestamp_old) as the denominator in that calculation should help to smooth out that noise and show a clearer picture.
However, I'm happy with the committed version. Thanks Tom.
- Matt K.