Обсуждение: Re: Make pg_stat_io view count IOs as bytes instead of blocks

Поиск
Список
Период
Сортировка

Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Melanie Plageman
Дата:
On Wed, Sep 11, 2024 at 7:19 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> Currently, in the pg_stat_io view, IOs are counted as blocks. However, there are two issues with this approach:
>
> 1- The actual number of IO requests to the kernel is lower because IO requests can be merged before sending the final
request.Additionally, it appears that all IOs are counted in block size.
 

I think this is a great idea. It will allow people to tune
io_combine_limit as you mention below.

> 2- Some IOs may not align with block size. For example, WAL read IOs are done in variable bytes and it is not
possibleto correctly show these IOs in the pg_stat_io view [1].
 

Yep, this makes a lot of sense as a solution.

> To address this, I propose showing the total number of IO requests to the kernel (as smgr function calls) and the
totalnumber of bytes in the IO. To implement this change, the op_bytes column will be removed from the pg_stat_io view.
Instead,the [reads | writes | extends] columns will track the total number of IO requests, and newly added [read |
write| extend]_bytes columns will track the total number of bytes in the IO.
 

smgr API seems like the right place for this.

> Example benefit of this change:
>
> Running query [2], the result is:
>
> ╔═══════════════════╦══════════╦══════════╦═══════════════╗
> ║    backend_type   ║  object  ║  context ║ avg_io_blocks ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║   client backend  ║ relation ║ bulkread ║     15.99     ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║ background worker ║ relation ║ bulkread ║     15.99     ║
> ╚═══════════════════╩══════════╩══════════╩═══════════════╝

I don't understand why background worker is listed here.

> You can rerun the same query [2] after setting io_combine_limit to 32 [3]. The result is:
>
> ╔═══════════════════╦══════════╦══════════╦═══════════════╗
> ║    backend_type   ║  object  ║  context ║ avg_io_blocks ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║   client backend  ║ relation ║ bulkread ║     31.70     ║
> ╠═══════════════════╬══════════╬══════════╬═══════════════╣
> ║ background worker ║ relation ║ bulkread ║     31.60     ║
> ╚═══════════════════╩══════════╩══════════╩═══════════════╝
>
> I believe that having visibility into avg_io_[bytes | blocks] is valuable information that could help optimize
Postgres.

In general, for this example, I think it would be more clear if you
compared what visibility we have in pg_stat_io on master with what
visibility we have with your patch.

I like that you show how io_combine_limit can be tuned using this, but
I don't think the problem statement is clear nor is the full
narrative.

> CREATE TABLE t as select i, repeat('a', 600) as filler from generate_series(1, 10000000) as i;
> SELECT pg_stat_reset_shared('io');
> SELECT * FROM t WHERE i = 0;
> SELECT backend_type, object, context, TRUNC((read_bytes / reads / (SELECT current_setting('block_size')::numeric)),
2)as avg_io_blocks FROM pg_stat_io WHERE reads > 0;
 

I like that you calculate the avg_io_blocks, but I think it is good to
show the raw columns as well.

- Melanie

Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Nov 27, 2024 at 11:08:01AM -0500, Melanie Plageman wrote:
> On Wed, Sep 11, 2024 at 7:19 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> >
> > Currently, in the pg_stat_io view, IOs are counted as blocks. However, there are two issues with this approach:
> >
> > 1- The actual number of IO requests to the kernel is lower because IO requests can be merged before sending the
finalrequest. Additionally, it appears that all IOs are counted in block size.
 
> 
> I think this is a great idea. It will allow people to tune
> io_combine_limit as you mention below.
> 
> > 2- Some IOs may not align with block size. For example, WAL read IOs are done in variable bytes and it is not
possibleto correctly show these IOs in the pg_stat_io view [1].
 
> 
> Yep, this makes a lot of sense as a solution.

Thanks for the patch! I also think it makes sense.

A few random comments:

=== 1

+       /*
+        * If IO done in bytes and byte is <= 0, this means there is an error
+        * while doing an IO. Don't count these IOs.
+        */

s/byte/bytes/?

also:

The pgstat_io_count_checks() parameter is uint64. Does it mean it has to be
changed to int64?

Also from what I can see the calls are done with those values:

- 0
- io_buffers_len * BLCKSZ
- extend_by * BLCKSZ
- BLCKSZ

could io_buffers_len and extend_by be < 0? If not, is the comment correct?

=== 2

+       Assert((io_op == IOOP_READ || io_op == IOOP_WRITE || io_op == IOOP_EXTEND

and

+       if ((io_op == IOOP_READ || io_op == IOOP_WRITE || io_op == IOOP_EXTEND) &&

What about ordering the enum in IOOp (no bytes/bytes) so that we could check
that io_op >= "our firt bytes enum" instead?

Also we could create a macro on top of that to make it clear. And a comment
would be needed around the IOOp definition.

I think that would be simpler to maintain should we add no bytes or bytes op in
the future.

=== 3

+pgstat_io_count_checks(IOObject io_object, IOContext io_context, IOOp io_op, uint64 bytes)
+{
+       Assert((unsigned int) io_object < IOOBJECT_NUM_TYPES);
+       Assert((unsigned int) io_context < IOCONTEXT_NUM_TYPES);
+       Assert((unsigned int) io_op < IOOP_NUM_TYPES);
+       Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));

IOObject and IOContext are passed only for the assertions. What about removing
them from there and put the asserts in other places?

=== 4

+       /* Only IOOP_READ, IOOP_WRITE and IOOP_EXTEND can do IO in bytes. */

Not sure about "can do IO in bytes" (same wording is used in multiple places).

=== 5

/* Convert to numeric. */

"convert to numeric"? to be consistent with others single line comments around.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Dec 04, 2024 at 02:49:11PM +0300, Nazir Bilal Yavuz wrote:
> On Thu, 28 Nov 2024 at 16:39, Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> You are right, no need to have this check; it can not be less than 0.
> I completely removed the function now.

Thanks! Yeah I think that makes sense. 

> > What about ordering the enum in IOOp (no bytes/bytes) so that we could check
> > that io_op >= "our firt bytes enum" instead?
> >
> > Also we could create a macro on top of that to make it clear. And a comment
> > would be needed around the IOOp definition.
> >
> > I think that would be simpler to maintain should we add no bytes or bytes op in
> > the future.
> 
> I think this is a good idea. I applied all the comments.

Thanks!

+ * non-byte-measured and byte-measured operations. So, that makes is easier
+ * to check whether IO is measured in bytes.

s/that makes is/that makes it/

+ *
+ * Note: If this enum is modified, ensure that both `IOOP_NUM_TYPES` macro
+ * and the `ioop_measured_in_bytes` function are updated accordingly.

Yeah, or?

"Ensure IOOP_EXTEND is the first and IOOP_WRITE the last ones in the measured in
bytes" group and that the groups stay in that order. 

That would make the "checks" local to the enum def should someone modify it.

> Created an
> inline function instead of macro and added this 'Assert((unsigned int)
> io_object < IOOBJECT_NUM_TYPES);' to function.

An inline function seems the right choice for me too.

> Done. I moved these checks to the pgstat_count_io_op_n() function. The
> code looks more like its previous version now.

Thanks! Yeah, more easier to follow now.

> > Not sure about "can do IO in bytes" (same wording is used in multiple places).
> 
> I changed it to 'measured in bytes' but I am not sure if this is
> better, open to suggestions.

I'm tempted to say something around "track", would mean things like:

1.

ioop_measured_in_bytes => is_ioop_tracked_in_bytes

2.

s/If an op does not do IO in bytes/If an op does not track bytes/

3.

s/not every IO measured in bytes/not every IO tracks bytes/

4.

s/non-byte-measured and byte-measured/non-tracking and tracking bytes/

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Jan 09, 2025 at 02:20:16PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Thu, 9 Jan 2025 at 11:11, Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Thu, Jan 09, 2025 at 10:15:20AM +0300, Nazir Bilal Yavuz wrote:
> > > I am a bit confused, are you suggesting these two alternatives:
> > > 1- Making pgstat_count_io_op_n() static and continuing to use
> > > pgstat_count_io_op() as it is.
> > > 2- Removing pgstat_count_io_op() and instead using
> > > pgstat_count_io_op_n() everywhere.
> >
> > Either of these options is OK by me.  The current state of things just
> > seems a bit strange because we publish a routine that's used nowhere.
> > If you have plans for it in a different patch, that's also fine.
> 
> I followed the second option as it is similar to
> pgstat_count_io_op_time() and also more future proof. I attached it as
> another patch. v7 is attached.

Thanks for the patches!

v7-0001:

+pg_attribute_unused()
+static inline bool
+is_ioop_tracked_in_bytes(IOOp io_op)
+{
+       Assert((unsigned int) io_op < IOOP_NUM_TYPES);
+       return io_op >= IOOP_EXTEND;
+}

and then

+       Assert(is_ioop_tracked_in_bytes(io_op) || bytes == 0);

We first use an Assert in is_ioop_tracked_in_bytes() and then we return
a value "just" to check another Assert. I wonder if it wouldn't make more sense
to get rid of this function and use a macro instead, something like?

#define is_ioop_tracked_in_bytes(io_op) \
    ((io_op) < IOOP_NUM_TYPES && (io_op) >= IOOP_EXTEND)

v7-0002:

I wonder if it wouldn't make more sense to remove pgstat_count_io_op() first
and then implement what currently is in v7-0001. What v7-0002 is removing is
not produced by v7-0001.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Nazir Bilal Yavuz
Дата:
Hi,

On Fri, 10 Jan 2025 at 04:47, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Jan 09, 2025 at 03:30:37PM +0000, Bertrand Drouvot wrote:
> > We first use an Assert in is_ioop_tracked_in_bytes() and then we return
> > a value "just" to check another Assert. I wonder if it wouldn't make more sense
> > to get rid of this function and use a macro instead, something like?
> >
> > #define is_ioop_tracked_in_bytes(io_op) \
> >     ((io_op) < IOOP_NUM_TYPES && (io_op) >= IOOP_EXTEND)
>
> Indeed.  Your suggestion to use a macro makes more sense to me because
> is_ioop_tracked_in_bytes() itself has an Assert(), while being itself
> only used in an assertion, as you say.  Better to document the
> dependency on the ordering of IOOp, even if that's kind of hard to
> miss.

I did not make it like this because now the
pgstat_is_ioop_tracked_in_bytes macro will return false when io_op >=
IOOP_NUM_TYPES. But I agree that having a macro has more benefits,
also there already is a check for the 'io_op < IOOP_NUM_TYPES' in the
pgstat_count_io_op() function.

>
> > v7-0002:
> >
> > I wonder if it wouldn't make more sense to remove pgstat_count_io_op() first
> > and then implement what currently is in v7-0001. What v7-0002 is removing is
> > not produced by v7-0001.
>
> This kind of cleanup should happen first, and it simplifies a bit the
> reading of v7-0001 as we would just append a new argument to the
> count_io_op() routine for the number of bytes.  I was looking at that
> and chose to stick with count_io_op() rather than count_io_op_n() as
> we would have only one routine.
>
>      pgstat_count_io_op_time(IOOBJECT_RELATION, io_context, IOOP_EXTEND,
> -                            io_start, extend_by);
> +                            io_start, 1, extend_by * BLCKSZ);
> [...]
>      pgstat_count_io_op_time(IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL, IOOP_EXTEND,
> -                            io_start, extend_by);
> +                            io_start, 1, extend_by * BLCKSZ);
>
> Something worth mentioning.  I had a small doubt about this one for
> temp and persistent relations, as it changes the extend count.
> Anyway, I think that this is better: we are only doing one extend
> batch, worth (extend_by * BLCKSZ).

I agree with you.

>
> Two times my fault today that the main patch does not apply anymore
> (three times at least in total), so I have rebased it myself, and did
> an extra review while on it, switching the code to use a macro.  That
> seems OK here.  Please let me know if you have more comments.

No worries, thank you for all of these!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Bertrand Drouvot
Дата:
Hi,

On Fri, Jan 10, 2025 at 10:15:52AM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> On Fri, 10 Jan 2025 at 04:47, Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Thu, Jan 09, 2025 at 03:30:37PM +0000, Bertrand Drouvot wrote:
> > > We first use an Assert in is_ioop_tracked_in_bytes() and then we return
> > > a value "just" to check another Assert. I wonder if it wouldn't make more sense
> > > to get rid of this function and use a macro instead, something like?
> > >
> > > #define is_ioop_tracked_in_bytes(io_op) \
> > >     ((io_op) < IOOP_NUM_TYPES && (io_op) >= IOOP_EXTEND)
> >
> > Indeed.  Your suggestion to use a macro makes more sense to me because
> > is_ioop_tracked_in_bytes() itself has an Assert(), while being itself
> > only used in an assertion, as you say.  Better to document the
> > dependency on the ordering of IOOp, even if that's kind of hard to
> > miss.
> 
> But I agree that having a macro has more benefits,
> also there already is a check for the 'io_op < IOOP_NUM_TYPES' in the
> pgstat_count_io_op() function.

Yeah, I think we can remove the "io_op < IOOP_NUM_TYPE" assertion in
pgstat_count_io_op() (but keep this check as part of the macro).

> > That
> > seems OK here.  Please let me know if you have more comments.

Appart from the above, LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Nazir Bilal Yavuz
Дата:
Hi,

On Tue, 14 Jan 2025 at 06:18, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Jan 10, 2025 at 08:23:46AM +0000, Bertrand Drouvot wrote:
> > On Fri, Jan 10, 2025 at 10:15:52AM +0300, Nazir Bilal Yavuz wrote:
> >> But I agree that having a macro has more benefits,
> >> also there already is a check for the 'io_op < IOOP_NUM_TYPES' in the
> >> pgstat_count_io_op() function.
> >
> > Yeah, I think we can remove the "io_op < IOOP_NUM_TYPE" assertion in
> > pgstat_count_io_op() (but keep this check as part of the macro).
> >
> > Appart from the above, LGTM.
>
> Okay, so applied.
>
> And I've somewhat managed to fat-finger the business with
> pgstat_count_io_op() with an incorrect rebase.  Will remove in a
> minute..

Thank you!

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Tom Lane
Дата:
Nazir Bilal Yavuz <byavuz81@gmail.com> writes:
> On Tue, 14 Jan 2025 at 06:18, Michael Paquier <michael@paquier.xyz> wrote:
>> And I've somewhat managed to fat-finger the business with
>> pgstat_count_io_op() with an incorrect rebase.  Will remove in a
>> minute..

> Thank you!

Commit f92c854cf has caused some of the buildfarm members to
spout a new warning, eg at [1]:

ccache clang -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla
-Werror=unguarded-availability-new-Wendif-labels -Wmissing-format-attribute -Wformat-security
-Wmissing-variable-declarations-fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -O2 -I. -I.
-I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2   -c -o pgstat_io.o pgstat_io.c 
pgstat_io.c:81:9: warning: comparison of constant 8 with expression of type 'IOOp' (aka 'enum IOOp') is always true
[-Wtautological-constant-out-of-range-compare]
        Assert(pgstat_is_ioop_tracked_in_bytes(io_op) || bytes == 0);
        ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pgstat_io.c:32:11: note: expanded from macro 'pgstat_is_ioop_tracked_in_bytes'
        ((io_op) < IOOP_NUM_TYPES && (io_op) >= IOOP_EXTEND)
                 ^ ~~~~~~~~~~~~~~
../../../../src/include/c.h:829:9: note: expanded from macro 'Assert'
                if (!(condition)) \\
                      ^~~~~~~~~
1 warning generated.

I don't see a reasonable way to alter that check to suppress this;
for instance, "(io_op) <= IOOP_WRITE" would probably still draw the
same warning.  I think most likely we have to remove that check, ie

 #define pgstat_is_ioop_tracked_in_bytes(io_op) \
-    ((io_op) < IOOP_NUM_TYPES && (io_op) >= IOOP_EXTEND)
+    ((io_op) >= IOOP_EXTEND)

I suppose one alternative is to re-order the enum so that the
upper-limit check in this macro *isn't* tautological ... but
that seems a bit silly.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=demoiselle&dt=2025-01-15%2005%3A20%3A59&stg=build



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> I cannot reproduce that, perhaps I'm just missing something with these
> switches.  Do you think that a cast would cool things?  Please see the
> attached for the idea.

There are only three animals showing this warning (ayu, batfish,
demoiselle) so it likely requires particular clang versions
as well as the right -W switches.

Maybe a cast would silence it, but your draft seems
underparenthesized.  (Which raises the question of whether we
should be using a macro for this at all --- probably there's
not much risk of double-evaluation being a problem, but a
static inline would remove the risk.)

            regards, tom lane



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> Just for an assert, I would just remove the macro rather than have an
> inline function.

Oh, I'd not noticed that there is only one caller.

However, the macro does provide a convenient place to hang the
warning comment about keeping it sync'd with the enum.
Personally I'd keep the macro but move it to pgstat.h, close
to the enum declaration, so that there's more than epsilon
chance of someone who's changing the enum noticing they need
to update it.

            regards, tom lane



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> Not completely sure about the number of parenthesis, but I hope that
> this should be enough (extra set around io_op):
> +#define pgstat_is_ioop_tracked_in_bytes(io_op) \
> +    (((unsigned int) (io_op)) < IOOP_NUM_TYPES && \
> +     ((unsigned int) (io_op)) >= IOOP_EXTEND)

Yeah, that's safe parenthesis-wise.  Whether it'll silence
the warning from those old clangs remains to be seen.

(But if it doesn't, maybe it's not worth working harder,
given that they're old.)

            regards, tom lane



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Jan 16, 2025 at 12:47:17AM -0500, Tom Lane wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > Not completely sure about the number of parenthesis, but I hope that
> > this should be enough (extra set around io_op):
> > +#define pgstat_is_ioop_tracked_in_bytes(io_op) \
> > +    (((unsigned int) (io_op)) < IOOP_NUM_TYPES && \
> > +     ((unsigned int) (io_op)) >= IOOP_EXTEND)
> 
> Yeah, that's safe parenthesis-wise.  Whether it'll silence
> the warning from those old clangs remains to be seen.

Thanks for the report and the proposed "fix".

From what I can see, the above proposal does (at least) silent the warning
here (clang 5.0.1 and same as demoiselle): https://godbolt.org/z/cGosfzGne (we
can see the warning by using the current define and that the warning is gone
with the new define).

Let's see on the BF.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Make pg_stat_io view count IOs as bytes instead of blocks

От
Nazir Bilal Yavuz
Дата:
Hi,

On Thu, 16 Jan 2025 at 10:12, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jan 16, 2025 at 12:47:17AM -0500, Tom Lane wrote:
> > Michael Paquier <michael@paquier.xyz> writes:
> > > Not completely sure about the number of parenthesis, but I hope that
> > > this should be enough (extra set around io_op):
> > > +#define pgstat_is_ioop_tracked_in_bytes(io_op) \
> > > +   (((unsigned int) (io_op)) < IOOP_NUM_TYPES && \
> > > +    ((unsigned int) (io_op)) >= IOOP_EXTEND)
> >
> > Yeah, that's safe parenthesis-wise.  Whether it'll silence
> > the warning from those old clangs remains to be seen.
>
> Thanks for the report and the proposed "fix".
>
> From what I can see, the above proposal does (at least) silent the warning
> here (clang 5.0.1 and same as demoiselle): https://godbolt.org/z/cGosfzGne (we
> can see the warning by using the current define and that the warning is gone
> with the new define).

Thanks all!

I checked clang 4 as well on the link you sent and it also fixes the
warning there.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft