Обсуждение: 2pc leaks fds

Поиск
Список
Период
Сортировка

2pc leaks fds

От
Andres Freund
Дата:
Hi,

Using 2PC with master very quickly leads to:

2020-04-05 19:42:18.368 PDT [2298126][5/2009:0] LOG:  out of file descriptors: Too many open files; release and retry
2020-04-05 19:42:18.368 PDT [2298126][5/2009:0] STATEMENT:  COMMIT PREPARED 'ptx_2';

This started with:

commit 0dc8ead46363fec6f621a12c7e1f889ba73b55a9 (HEAD -> master)
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   2019-11-25 15:04:54 -0300

    Refactor WAL file-reading code into WALRead()


I found this while trying to benchmark the effect of my snapshot changes
on 2pc. I just used the attached pgbench file.

andres@awork3:~/build/postgres/dev-assert/vpath$ pgbench -n -s 500 -c 4 -j 4 -T 100000 -P1 -f
~/tmp/pgbench-write-2pc.sql
progress: 1.0 s, 3723.8 tps, lat 1.068 ms stddev 0.305
client 2 script 0 aborted in command 8 query 0: ERROR:  could not seek to end of file "base/14036/16396": Too many open
files
client 1 script 0 aborted in command 8 query 0: ERROR:  could not seek to end of file "base/14036/16396": Too many open
files
client 3 script 0 aborted in command 8 query 0: ERROR:  could not seek to end of file "base/14036/16396": Too many open
files
client 0 script 0 aborted in command 8 query 0: ERROR:  could not seek to end of file "base/14036/16396": Too many open
files
transaction type: /home/andres/tmp/pgbench-write-2pc.sql

I've not yet reviewed the change sufficiently to pinpoint the issue.


It's a bit sad that nobody has hit this in the last few months :(.

Greetings,

Andres Freund

Вложения

Re: 2pc leaks fds

От
Michael Paquier
Дата:
On Sun, Apr 05, 2020 at 07:56:51PM -0700, Andres Freund wrote:
> I found this while trying to benchmark the effect of my snapshot changes
> on 2pc. I just used the attached pgbench file.
>
> I've not yet reviewed the change sufficiently to pinpoint the issue.

Indeed.  It takes seconds to show up.

> It's a bit sad that nobody has hit this in the last few months :(.

2PC shines with the code of xlogreader.c in this case because it keeps
opening and closing XLogReaderState for a short amount of time.  So it
is not surprising to me to see this error only months after the fact
because recovery or pg_waldump just use one XLogReaderState.  From
what I can see, the error is that the code only bothers closing
WALOpenSegment->seg when switching to a new segment, but we need also
to close it when finishing the business in XLogReaderFree().

I am adding an open item, and attached is a patch to take care of the
problem.  Thoughts?
--
Michael

Вложения

Re: 2pc leaks fds

От
Andres Freund
Дата:
Hi,

On 2020-04-06 14:26:48 +0900, Michael Paquier wrote:
> 2PC shines with the code of xlogreader.c in this case because it keeps
> opening and closing XLogReaderState for a short amount of time.  So it
> is not surprising to me to see this error only months after the fact
> because recovery or pg_waldump just use one XLogReaderState.

Well, it doesn't exactly signal that people (including me, up to just
now) are testing their changes all that carefully...


> From what I can see, the error is that the code only bothers closing
> WALOpenSegment->seg when switching to a new segment, but we need also
> to close it when finishing the business in XLogReaderFree().

Yea, I came to the same conclusion and locally fixed it the same way
(except having the close a bit earlier in XLogReaderFree()).


> diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
> index f3fea5132f..7e25e2050a 100644
> --- a/src/backend/access/transam/xlogreader.c
> +++ b/src/backend/access/transam/xlogreader.c
> @@ -144,6 +144,9 @@ XLogReaderFree(XLogReaderState *state)
>      if (state->main_data)
>          pfree(state->main_data);
>  
> +    if (state->seg.ws_file >= 0)
> +        close(state->seg.ws_file);
> +
>      pfree(state->errormsg_buf);
>      if (state->readRecordBuf)
>          pfree(state->readRecordBuf);

But I'm not sure it's quite the right idea. I'm not sure I fully
understand the design of 0dc8ead46, but it looks to me like it's
intended to allow users of the interface to have different ways of
opening files. If we just close() the fd that'd be a bit more limited.

OTOH, I think all but one (XLogPageRead()) of the current users of
XLogReader use WALRead(), which also close()s the fd (before calling the
WALSegmentOpen callback).


The XLogReader code flow has gotten quite complicated
:(. XLogReaderReadRecord()-> state->read_page() ->
logical_read_xlog_page etc -> WALRead() -> wal_segment_open callback etc.

There's been a fair bit of change, making the interface more generic /
powerful / reducing duplication, but not a lot of added / adapted
comments in the header...

Greetings,

Andres Freund



Re: 2pc leaks fds

От
Antonin Houska
Дата:
Andres Freund <andres@anarazel.de> wrote:

> > From what I can see, the error is that the code only bothers closing
> > WALOpenSegment->seg when switching to a new segment, but we need also
> > to close it when finishing the business in XLogReaderFree().
> 
> Yea, I came to the same conclusion and locally fixed it the same way
> (except having the close a bit earlier in XLogReaderFree()).

It's still not quite clear to me why the problem starts to appear after
0dc8ead46.  This patch does not remove any close() call from XLogReaderFree().

> But I'm not sure it's quite the right idea. I'm not sure I fully
> understand the design of 0dc8ead46, but it looks to me like it's
> intended to allow users of the interface to have different ways of
> opening files. If we just close() the fd that'd be a bit more limited.

It should have allowed users to have different ways to *locate the segment*
file. The WALSegmentOpen callback could actually return file path instead of
the file descriptor and let WALRead() perform the opening/closing, but then
the WALRead function would need to be aware whether it is executing in backend
or in frontend (so it can use the correct function to open/close the file).

I was aware of the problem that the correct function should be used to open
the file and that's why this comment was added (although "mandatory" would be
more suitable than "preferred"):

 * BasicOpenFile() is the preferred way to open the segment file in backend
 * code, whereas open(2) should be used in frontend.
 */
typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext *segcxt,
                               TimeLineID *tli_p);

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: 2pc leaks fds

От
Andres Freund
Дата:
Hi,

On 2020-04-06 09:12:32 +0200, Antonin Houska wrote:
> Andres Freund <andres@anarazel.de> wrote:
> 
> > > From what I can see, the error is that the code only bothers closing
> > > WALOpenSegment->seg when switching to a new segment, but we need also
> > > to close it when finishing the business in XLogReaderFree().
> > 
> > Yea, I came to the same conclusion and locally fixed it the same way
> > (except having the close a bit earlier in XLogReaderFree()).
> 
> It's still not quite clear to me why the problem starts to appear after
> 0dc8ead46.  This patch does not remove any close() call from XLogReaderFree().

Before that change the file was also kind of leaked, but would use the
same static variable to store the fd and thus close it.

Greetings,

Andres Freund



Re: 2pc leaks fds

От
Andres Freund
Дата:
Hi,

I pushed a fix. While it might not be the best medium/long term fix, it
unbreaks 2PC.  Perhaps we should add an open item to track whether we
want to fix this differently?


On 2020-04-06 09:12:32 +0200, Antonin Houska wrote:
> Andres Freund <andres@anarazel.de> wrote:
> It should have allowed users to have different ways to *locate the segment*
> file. The WALSegmentOpen callback could actually return file path instead of
> the file descriptor and let WALRead() perform the opening/closing, but then
> the WALRead function would need to be aware whether it is executing in backend
> or in frontend (so it can use the correct function to open/close the file).
> 
> I was aware of the problem that the correct function should be used to open
> the file and that's why this comment was added (although "mandatory" would be
> more suitable than "preferred"):
> 
>  * BasicOpenFile() is the preferred way to open the segment file in backend
>  * code, whereas open(2) should be used in frontend.
>  */
> typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext *segcxt,
>                                TimeLineID *tli_p);

I don't think that BasicOpenFile() really solves anything here? If
anything it *exascerbates* the problem, because it will trigger all of
the "virtual file descriptors" for already opened Files to close() the
underlying OS FDs.  So not even a fully cached table can be seqscanned,
because that tries to check the file size...

Greetings,

Andres Freund



Re: 2pc leaks fds

От
Michael Paquier
Дата:
On Tue, Apr 07, 2020 at 05:12:49PM -0700, Andres Freund wrote:
> I pushed a fix. While it might not be the best medium/long term fix, it
> unbreaks 2PC.  Perhaps we should add an open item to track whether we
> want to fix this differently?

Sounds fine to me.  I have updated the open item that we have now by
adding a comment that the leak has been fixed by 91c4054, but that
we should revisit the refactoring.
--
Michael

Вложения

Re: 2pc leaks fds

От
Antonin Houska
Дата:
Antonin Houska <ah@cybertec.at> wrote:

> Andres Freund <andres@anarazel.de> wrote:
> > But I'm not sure it's quite the right idea. I'm not sure I fully
> > understand the design of 0dc8ead46, but it looks to me like it's
> > intended to allow users of the interface to have different ways of
> > opening files. If we just close() the fd that'd be a bit more limited.
>
> It should have allowed users to have different ways to *locate the segment*
> file. The WALSegmentOpen callback could actually return file path instead of
> the file descriptor and let WALRead() perform the opening/closing, but then
> the WALRead function would need to be aware whether it is executing in backend
> or in frontend (so it can use the correct function to open/close the file).

Well, #ifdef FRONTEND can be used to distinguish the caller of
WALRead(). However now that I tried to adjust the API, I see that
pg_waldump.c:WALDumpOpenSegment uses specific logic to open the file. So if
the callback only returned the file name, there would be no suitable place for
the things that WALDumpOpenSegment does.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: 2pc leaks fds

От
Antonin Houska
Дата:
Andres Freund <andres@anarazel.de> wrote:

> On 2020-04-06 09:12:32 +0200, Antonin Houska wrote:
> > Andres Freund <andres@anarazel.de> wrote:
> > It should have allowed users to have different ways to *locate the segment*
> > file. The WALSegmentOpen callback could actually return file path instead of
> > the file descriptor and let WALRead() perform the opening/closing, but then
> > the WALRead function would need to be aware whether it is executing in backend
> > or in frontend (so it can use the correct function to open/close the file).
> >
> > I was aware of the problem that the correct function should be used to open
> > the file and that's why this comment was added (although "mandatory" would be
> > more suitable than "preferred"):
> >
> >  * BasicOpenFile() is the preferred way to open the segment file in backend
> >  * code, whereas open(2) should be used in frontend.
> >  */
> > typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext *segcxt,
> >                                TimeLineID *tli_p);
>
> I don't think that BasicOpenFile() really solves anything here? If
> anything it *exascerbates* the problem, because it will trigger all of
> the "virtual file descriptors" for already opened Files to close() the
> underlying OS FDs.  So not even a fully cached table can be seqscanned,
> because that tries to check the file size...

Specifically for 2PC, isn't it better to close some OS-level FD of an
unrelated table scan and then succeed than to ERROR immediately? Anyway,
0dc8ead46 hasn't changed this.

I at least admit that the comment should not recommend particular function,
and that WALRead() should call the appropriate function to close the file,
rather than always calling close().

Can we just pass the existing FD to the callback as an additional argument,
and let the callback close it?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: 2pc leaks fds

От
Ahsan Hadi
Дата:
I have tested with and without the commit from Andres using the pgbench script (below) provided in the initial email. 

pgbench -n -s 500 -c 4 -j 4 -T 100000 -P1 -f pgbench-write-2pc.sql

I am not getting the leak anymore, it seems to be holding up pretty well.


On Wed, Apr 8, 2020 at 12:59 PM Antonin Houska <ah@cybertec.at> wrote:
Andres Freund <andres@anarazel.de> wrote:

> On 2020-04-06 09:12:32 +0200, Antonin Houska wrote:
> > Andres Freund <andres@anarazel.de> wrote:
> > It should have allowed users to have different ways to *locate the segment*
> > file. The WALSegmentOpen callback could actually return file path instead of
> > the file descriptor and let WALRead() perform the opening/closing, but then
> > the WALRead function would need to be aware whether it is executing in backend
> > or in frontend (so it can use the correct function to open/close the file).
> >
> > I was aware of the problem that the correct function should be used to open
> > the file and that's why this comment was added (although "mandatory" would be
> > more suitable than "preferred"):
> >
> >  * BasicOpenFile() is the preferred way to open the segment file in backend
> >  * code, whereas open(2) should be used in frontend.
> >  */
> > typedef int (*WALSegmentOpen) (XLogSegNo nextSegNo, WALSegmentContext *segcxt,
> >                                                        TimeLineID *tli_p);
>
> I don't think that BasicOpenFile() really solves anything here? If
> anything it *exascerbates* the problem, because it will trigger all of
> the "virtual file descriptors" for already opened Files to close() the
> underlying OS FDs.  So not even a fully cached table can be seqscanned,
> because that tries to check the file size...

Specifically for 2PC, isn't it better to close some OS-level FD of an
unrelated table scan and then succeed than to ERROR immediately? Anyway,
0dc8ead46 hasn't changed this.

I at least admit that the comment should not recommend particular function,
and that WALRead() should call the appropriate function to close the file,
rather than always calling close().

Can we just pass the existing FD to the callback as an additional argument,
and let the callback close it?

--
Antonin Houska
Web: https://www.cybertec-postgresql.com




--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: 2pc leaks fds

От
Alvaro Herrera
Дата:
On 2020-Apr-08, Antonin Houska wrote:

> Specifically for 2PC, isn't it better to close some OS-level FD of an
> unrelated table scan and then succeed than to ERROR immediately? Anyway,
> 0dc8ead46 hasn't changed this.

I think for full generality of the interface, we pass a "close" callback
in addition to the "open" callback.  But if we were to pass it only for
WALRead, then there would be no way to call it during XLogReaderFree.

I think the fix Andres applied is okay as far as it goes, but for the
long term we may want to change the interface even further -- maybe by
having these functions be part of the XLogReader state struct.  I have
this code paged out of my head ATM, but maybe tomorrow I can give it a
little think.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 2pc leaks fds

От
Alvaro Herrera
Дата:
Concretely, I propose to have a new struct like

typedef struct xlogReaderFuncs
{
    XLogPageReadCB read_page;
    XLogSegmentOpenCB open_segment;
    XLogSegmentCloseCB open_segment;
} xlogReaderFuncs;

#define XLOGREADER_FUNCS(...) &(xlogReaderFuncs){__VA_ARGS__}

and then invoke it something like

    xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
                                    XLOGREADER_FUNCS(.readpage = &read_local_xlog_page,
                                              .opensegment = &wal_segment_open),
                                              .closesegment = &wal_segment_close),
                                    NULL);

(with suitable definitions for XLogSegmentOpenCB etc) so that the
support functions are all available at the xlogreader level, instead of
"open" being buried at the read-page level.  Any additional support
functions can be added easily.

This would give xlogreader a simpler interface.

If people like this, I could make this change for pg13 and avoid
changing the API again in pg14.

Thougths?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 2pc leaks fds

От
Andres Freund
Дата:
On 2020-04-22 13:57:54 -0400, Alvaro Herrera wrote:
> Concretely, I propose to have a new struct like
> 
> typedef struct xlogReaderFuncs
> {
>     XLogPageReadCB read_page;
>     XLogSegmentOpenCB open_segment;
>     XLogSegmentCloseCB open_segment;
> } xlogReaderFuncs;
> 
> #define XLOGREADER_FUNCS(...) &(xlogReaderFuncs){__VA_ARGS__}

Not sure I quite see the point of that helper macro...

> and then invoke it something like
> 
>     xlogreader = XLogReaderAllocate(wal_segment_size, NULL,
>                                     XLOGREADER_FUNCS(.readpage = &read_local_xlog_page,
>                                               .opensegment = &wal_segment_open),
>                                               .closesegment = &wal_segment_close),
>                                     NULL);
> 
> (with suitable definitions for XLogSegmentOpenCB etc) so that the
> support functions are all available at the xlogreader level, instead of
> "open" being buried at the read-page level.  Any additional support
> functions can be added easily.
> 
> This would give xlogreader a simpler interface.

My first reaction was that this looks like it'd make it harder to read
WAL from memory. But that's not really a problem, since
opensegment/closesegment don't have to do anything.

I think reducing the levels of indirection around xlogreader would be a
good idea. The control flow currently is *really* complicated: With the
page read callback at the xlogreader level, as well as separate
callbacks set from within the page read callback and passed to
WALRead(). And even though the WALOpenSegment, WALSegmentContext are
really private to WALRead, not XLogReader as a whole, they are members
of XLogReaderState.  I think the PG13 changes made it considerably
harder to understand xlogreader / xlogreader using code.

Note that the WALOpenSegment callback currently does not have access to
XLogReaderState->private_data, which I think is a pretty significant new
restriction. Afaict it's not nicely possible anymore to have two
xlogreaders inside the the same process that read from different data
directories or other cases where opening the segment requires context
information.

> If people like this, I could make this change for pg13 and avoid
> changing the API again in pg14.

I'm in favor of doing so. Not necessarily primarily to avoid repeated
API changes, but because I don't think the v13 changes went in the quite
right direction.

ISTM that we should:
- have the three callbacks you mention above
- change WALSegmentOpen to also get the XLogReaderState
- add private state to WALOpenSegment, so it can be used even when not
  accessing data in files / when one needs more information to close the
  file.
- disambiguate between WALOpenSegment (struct describing an open
  segment) and WALSegmentOpen (callback to open a segment) (note that
  the read page callback uses a *CB naming, why not follow?)

Greetings,

Andres Freund



Re: 2pc leaks fds

От
Alvaro Herrera
Дата:
On 2020-Apr-22, Andres Freund wrote:

> On 2020-04-22 13:57:54 -0400, Alvaro Herrera wrote:
> > Concretely, I propose to have a new struct like
> > 
> > typedef struct xlogReaderFuncs
> > {
> >     XLogPageReadCB read_page;
> >     XLogSegmentOpenCB open_segment;
> >     XLogSegmentCloseCB open_segment;
> > } xlogReaderFuncs;
> > 
> > #define XLOGREADER_FUNCS(...) &(xlogReaderFuncs){__VA_ARGS__}
> 
> Not sure I quite see the point of that helper macro...

Avoid the ugly cast -- same discussion we had for ARCHIVE_OPTS in
pg_dump code in commit f831d4accda0.


> ISTM that we should:
> - have the three callbacks you mention above
> - change WALSegmentOpen to also get the XLogReaderState
> - add private state to WALOpenSegment, so it can be used even when not
>   accessing data in files / when one needs more information to close the
>   file.
> - disambiguate between WALOpenSegment (struct describing an open
>   segment) and WALSegmentOpen (callback to open a segment) (note that
>   the read page callback uses a *CB naming, why not follow?)

Sounds good.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 2pc leaks fds

От
Alvaro Herrera
Дата:
On 2020-Apr-22, Andres Freund wrote:

> I'm in favor of doing so. Not necessarily primarily to avoid repeated
> API changes, but because I don't think the v13 changes went in the quite
> right direction.
> 
> ISTM that we should:
> - have the three callbacks you mention above
> - change WALSegmentOpen to also get the XLogReaderState
> - add private state to WALOpenSegment, so it can be used even when not
>   accessing data in files / when one needs more information to close the
>   file.
> - disambiguate between WALOpenSegment (struct describing an open
>   segment) and WALSegmentOpen (callback to open a segment) (note that
>   the read page callback uses a *CB naming, why not follow?)

Here's a first attempt at that.  The segment_open/close callbacks are
now given at XLogReaderAllocate time, and are passed the XLogReaderState
pointer.  I wrote a comment to explain that the page_read callback can
use WALRead() if it wishes to do so; but if it does, then segment_open
has to be provided.  segment_close is mandatory (since we call it at
XLogReaderFree).

Of the half a dozen cases that exist, three are slightly weird:

* Physical walsender does not use a xlogreader at all.  I think we could
  beat that code up so that it does.  But for the moment I just cons up
  a fake xlogreader, which only has the segment_open pointer set up, so
  that it can call WALRead.

* main xlog.c uses an xlogreader with XLogPageRead(), which does not use
  WALRead.  Therefore it does not pass open_segment.  It does not use
  xlogreader->seg.ws_file either.  Eventually we may want to beat this
  one up also.

* pg_rewind has its own page read callback, SimpleXLogPageRead, which
  does all the required opening and closing.  I don't think it'd be an
  improvement to force this to use segment_open.  Oddly enough, it calls
  itself "simple" but is unique in having the ability to read files from
  the wal archive.

All tests are passing for me.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: 2pc leaks fds

От
Kyotaro Horiguchi
Дата:
At Thu, 23 Apr 2020 19:16:03 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> On 2020-Apr-22, Andres Freund wrote:
> 
> > I'm in favor of doing so. Not necessarily primarily to avoid repeated
> > API changes, but because I don't think the v13 changes went in the quite
> > right direction.
> > 
> > ISTM that we should:
> > - have the three callbacks you mention above
> > - change WALSegmentOpen to also get the XLogReaderState
> > - add private state to WALOpenSegment, so it can be used even when not
> >   accessing data in files / when one needs more information to close the
> >   file.
> > - disambiguate between WALOpenSegment (struct describing an open
> >   segment) and WALSegmentOpen (callback to open a segment) (note that
> >   the read page callback uses a *CB naming, why not follow?)
> 
> Here's a first attempt at that.  The segment_open/close callbacks are
> now given at XLogReaderAllocate time, and are passed the XLogReaderState
> pointer.  I wrote a comment to explain that the page_read callback can
> use WALRead() if it wishes to do so; but if it does, then segment_open
> has to be provided.  segment_close is mandatory (since we call it at
> XLogReaderFree).
> 
> Of the half a dozen cases that exist, three are slightly weird:
> 
> * Physical walsender does not use a xlogreader at all.  I think we could
>   beat that code up so that it does.  But for the moment I just cons up
>   a fake xlogreader, which only has the segment_open pointer set up, so
>   that it can call WALRead.
> 
> * main xlog.c uses an xlogreader with XLogPageRead(), which does not use
>   WALRead.  Therefore it does not pass open_segment.  It does not use
>   xlogreader->seg.ws_file either.  Eventually we may want to beat this
>   one up also.
> 
> * pg_rewind has its own page read callback, SimpleXLogPageRead, which
>   does all the required opening and closing.  I don't think it'd be an
>   improvement to force this to use segment_open.  Oddly enough, it calls
>   itself "simple" but is unique in having the ability to read files from
>   the wal archive.
> 
> All tests are passing for me.

I modestly object to such many call-back functions.  FWIW I'm writing
this with [1] in my mind.

An open-callback is bound to a read-callback. A close-callback is
bound to the way the read-callback opens a segment (or the
open-callback).  I'm afraid that only adding "cleanup" callback might
be sufficient.

[1] https://www.postgresql.org/message-id/20200422.101246.331162888498679491.horikyota.ntt%40gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: 2pc leaks fds

От
Alvaro Herrera
Дата:
On 2020-Apr-24, Kyotaro Horiguchi wrote:

> At Thu, 23 Apr 2020 19:16:03 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 

> > Here's a first attempt at that.  The segment_open/close callbacks are
> > now given at XLogReaderAllocate time, and are passed the XLogReaderState
> > pointer.  I wrote a comment to explain that the page_read callback can
> > use WALRead() if it wishes to do so; but if it does, then segment_open
> > has to be provided.  segment_close is mandatory (since we call it at
> > XLogReaderFree).

> I modestly object to such many call-back functions.  FWIW I'm writing
> this with [1] in my mind.
> [1] https://www.postgresql.org/message-id/20200422.101246.331162888498679491.horikyota.ntt%40gmail.com

Hmm.  Looking at your 0001, I think there's nothing in that patch that's
not compatible with my proposed API change.

0002 is a completely different story of course; but that patch is a
radical change of spirit for xlogreader, in the sense that it's no
longer a "reader", but rather just an interpreter of bytes from a WAL
byte sequence into WAL records; and shifts the responsibility of the
actual reading to the caller.  That's why xlogreader no longer receives
the page_read callback (and why it doesn't need the segment_open,
segment_close callbacks).

I have to admit that until today I hadn't realized that that's what your
patch series was doing.  I'm not familiar with how you intend to
implement WAL encryption on top of this, but on first blush I'm not
liking this proposed design too much.

> An open-callback is bound to a read-callback. A close-callback is
> bound to the way the read-callback opens a segment (or the
> open-callback).  I'm afraid that only adding "cleanup" callback might
> be sufficient.

Well, the complaint is that the current layering is weird, in that there
are two levels at which we pass callbacks: one is XLogReaderAllocate,
where you specify the page_read callback; and the other layer is inside
the page_read callback, if that layer uses the WALRead auxiliary
function.  The thing that my patch is doing is pass all three callbacks
at the XLogReaderAllocate level.  So when xlogreader drills down to
read_page, xlogreader already has the segment_open callback handy if it
needs it.  Conceptually, this seems more sensible.

I think a "cleanup" callback might also be sensible in general terms,
but we have a problem with the specifics -- namely that the "state" that
we need to clean up (the file descriptor of the open segment) is part of
xlogreader's state.  And we obviously cannot remove the FD from
XLogReaderState, because when we need the FD to do things with it to
obtain data from the file.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: 2pc leaks fds

От
Kyotaro Horiguchi
Дата:
At Fri, 24 Apr 2020 11:48:46 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> On 2020-Apr-24, Kyotaro Horiguchi wrote:
> 
> > At Thu, 23 Apr 2020 19:16:03 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> 
> > > Here's a first attempt at that.  The segment_open/close callbacks are
> > > now given at XLogReaderAllocate time, and are passed the XLogReaderState
> > > pointer.  I wrote a comment to explain that the page_read callback can
> > > use WALRead() if it wishes to do so; but if it does, then segment_open
> > > has to be provided.  segment_close is mandatory (since we call it at
> > > XLogReaderFree).
> 
> > I modestly object to such many call-back functions.  FWIW I'm writing
> > this with [1] in my mind.
> > [1] https://www.postgresql.org/message-id/20200422.101246.331162888498679491.horikyota.ntt%40gmail.com
> 
> Hmm.  Looking at your 0001, I think there's nothing in that patch that's
> not compatible with my proposed API change.
> 
> 0002 is a completely different story of course; but that patch is a
> radical change of spirit for xlogreader, in the sense that it's no
> longer a "reader", but rather just an interpreter of bytes from a WAL
> byte sequence into WAL records; and shifts the responsibility of the
> actual reading to the caller.  That's why xlogreader no longer receives
> the page_read callback (and why it doesn't need the segment_open,
> segment_close callbacks).

Sorry for the ambiguity, I didn't meant I minded that this conflicts
with my patch or I don't want this to be committed. It is easily
rebased on this patch.  What I was anxious about is that the new
callback struct might be too flexible than required. So I "mildly"
objected, and I won't be dissapointed if this patch is committed.

> I have to admit that until today I hadn't realized that that's what your
> patch series was doing.  I'm not familiar with how you intend to
> implement WAL encryption on top of this, but on first blush I'm not
> liking this proposed design too much.

Right. I might be too much in detail, but it simplifies the call
tree. Anyway that is another discussion, though:)

> > An open-callback is bound to a read-callback. A close-callback is
> > bound to the way the read-callback opens a segment (or the
> > open-callback).  I'm afraid that only adding "cleanup" callback might
> > be sufficient.
> 
> Well, the complaint is that the current layering is weird, in that there
> are two levels at which we pass callbacks: one is XLogReaderAllocate,
> where you specify the page_read callback; and the other layer is inside
> the page_read callback, if that layer uses the WALRead auxiliary
> function.  The thing that my patch is doing is pass all three callbacks
> at the XLogReaderAllocate level.  So when xlogreader drills down to
> read_page, xlogreader already has the segment_open callback handy if it
> needs it.  Conceptually, this seems more sensible.

It looks like as if the open/read/close-callbacks are generic and on
the same interface layer, but actually open-callback is dedicate to
WALRead and it is useless when the read-callback doesn't use
WALRead. What I was anxious about is that the open-callback is
uselessly exposing the secret of the read-callback.

> I think a "cleanup" callback might also be sensible in general terms,
> but we have a problem with the specifics -- namely that the "state" that
> we need to clean up (the file descriptor of the open segment) is part of
> xlogreader's state.  And we obviously cannot remove the FD from
> XLogReaderState, because when we need the FD to do things with it to
> obtain data from the file.

I meant concretely that we only have read- and cleanup- callbacks in
xlogreader state.  The caller of XLogReaderAllocate specifies the
cleanup-callback that is to be used to clean up what the
reader-callback left behind, in the same manner with this patch does.
The only reason it is not named close-callback is that it is used only
when the xlogreader-state is destroyed. So I'm fine with
read/close-callbacks.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: 2pc leaks fds

От
Alvaro Herrera
Дата:
On 2020-Apr-27, Kyotaro Horiguchi wrote:

> At Fri, 24 Apr 2020 11:48:46 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 

> Sorry for the ambiguity, I didn't meant I minded that this conflicts
> with my patch or I don't want this to be committed. It is easily
> rebased on this patch.  What I was anxious about is that the new
> callback struct might be too flexible than required. So I "mildly"
> objected, and I won't be dissapointed if this patch is committed.

... well, yeah, maybe it is too flexible.  And perhaps we could further
tweak this interface so that the file descriptor is not part of
XLogReader at all -- with such a change, it would make more sense to
worry about the "close" callback not being "close" but something like
"cleanup", as you suggest.  But right now, and thinking from the point
of view of going into postgres 13 beta shortly, it seems to me that
XLogReader is just a very leaky abstraction since both itself and its
users are aware of the fact that there is a file descriptor.

Maybe with your rework for encryption you'll want to remove the FD from
XLogReader at all, and move it elsewhere.  Or maybe not.  But it seems
to me that my suggested approach is sensible, and better than the
current situation.  (Let's keep in mind that the primary concern here is
that the callstack is way too complicated -- you ask XlogReader for
data, it calls your Read callback, that one calls WALRead passing your
openSegment callback and stuffs the FD in XLogReaderState ... a sieve it
is, the way it leaks, not an abstraction.)

> > I have to admit that until today I hadn't realized that that's what your
> > patch series was doing.  I'm not familiar with how you intend to
> > implement WAL encryption on top of this, but on first blush I'm not
> > liking this proposed design too much.
> 
> Right. I might be too much in detail, but it simplifies the call
> tree. Anyway that is another discussion, though:)

Okay.  We can discuss further changes later, of course.

> It looks like as if the open/read/close-callbacks are generic and on
> the same interface layer, but actually open-callback is dedicate to
> WALRead and it is useless when the read-callback doesn't use
> WALRead. What I was anxious about is that the open-callback is
> uselessly exposing the secret of the read-callback.

Well, I don't think we care about that.  WALRead can be thought of as
just a helper function that you may use to write your read callback.
The comments I added explain this.

> I meant concretely that we only have read- and cleanup- callbacks in
> xlogreader state.  The caller of XLogReaderAllocate specifies the
> cleanup-callback that is to be used to clean up what the
> reader-callback left behind, in the same manner with this patch does.
> The only reason it is not named close-callback is that it is used only
> when the xlogreader-state is destroyed. So I'm fine with
> read/close-callbacks.

We can revisit the current design in the future.  For example for
encryption we might decide to remove the current-open-segment FD from
XLogReaderState and then things might be different.  (I think the
current design is based a lot on historical code, rather than being
optimal.)

Since your objection isn't strong, I propose to commit same patch as
before, only rebased as conflicted with cd123234404e and this comment
prologuing WALRead:

/*
 * Helper function to ease writing of XLogRoutine->page_read callbacks.
 * If this function is used, caller must supply an open_segment callback in
 * 'state', as that is used here.
  [... rest is same as before ...]


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: 2pc leaks fds

От
Kyotaro Horiguchi
Дата:
At Thu, 7 May 2020 19:28:55 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> On 2020-Apr-27, Kyotaro Horiguchi wrote:
> 
> > At Fri, 24 Apr 2020 11:48:46 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> 
> > Sorry for the ambiguity, I didn't meant I minded that this conflicts
> > with my patch or I don't want this to be committed. It is easily
> > rebased on this patch.  What I was anxious about is that the new
> > callback struct might be too flexible than required. So I "mildly"
> > objected, and I won't be dissapointed if this patch is committed.
> 
> ... well, yeah, maybe it is too flexible.  And perhaps we could further
> tweak this interface so that the file descriptor is not part of
> XLogReader at all -- with such a change, it would make more sense to
> worry about the "close" callback not being "close" but something like
> "cleanup", as you suggest.  But right now, and thinking from the point
> of view of going into postgres 13 beta shortly, it seems to me that
> XLogReader is just a very leaky abstraction since both itself and its
> users are aware of the fact that there is a file descriptor.

Agreed.

> Maybe with your rework for encryption you'll want to remove the FD from
> XLogReader at all, and move it elsewhere.  Or maybe not.  But it seems
> to me that my suggested approach is sensible, and better than the
> current situation.  (Let's keep in mind that the primary concern here is
> that the callstack is way too complicated -- you ask XlogReader for
> data, it calls your Read callback, that one calls WALRead passing your
> openSegment callback and stuffs the FD in XLogReaderState ... a sieve it
> is, the way it leaks, not an abstraction.)

I agree that new callback functions is most sensible for getting into
13, of course.

> > > I have to admit that until today I hadn't realized that that's what your
> > > patch series was doing.  I'm not familiar with how you intend to
> > > implement WAL encryption on top of this, but on first blush I'm not
> > > liking this proposed design too much.
> > 
> > Right. I might be too much in detail, but it simplifies the call
> > tree. Anyway that is another discussion, though:)
> 
> Okay.  We can discuss further changes later, of course.
> 
> > It looks like as if the open/read/close-callbacks are generic and on
> > the same interface layer, but actually open-callback is dedicate to
> > WALRead and it is useless when the read-callback doesn't use
> > WALRead. What I was anxious about is that the open-callback is
> > uselessly exposing the secret of the read-callback.
> 
> Well, I don't think we care about that.  WALRead can be thought of as
> just a helper function that you may use to write your read callback.
> The comments I added explain this.

Thanks.

> > I meant concretely that we only have read- and cleanup- callbacks in
> > xlogreader state.  The caller of XLogReaderAllocate specifies the
> > cleanup-callback that is to be used to clean up what the
> > reader-callback left behind, in the same manner with this patch does.
> > The only reason it is not named close-callback is that it is used only
> > when the xlogreader-state is destroyed. So I'm fine with
> > read/close-callbacks.
> 
> We can revisit the current design in the future.  For example for
> encryption we might decide to remove the current-open-segment FD from
> XLogReaderState and then things might be different.  (I think the
> current design is based a lot on historical code, rather than being
> optimal.)
> 
> Since your objection isn't strong, I propose to commit same patch as
> before, only rebased as conflicted with cd123234404e and this comment
> prologuing WALRead:
> 
> /*
>  * Helper function to ease writing of XLogRoutine->page_read callbacks.
>  * If this function is used, caller must supply an open_segment callback in
>  * 'state', as that is used here.
>   [... rest is same as before ...]

I agree to the direction of this patch. Thanks for the explanation.
The patch looks good to me except the two points below.


+    /* XXX for xlogreader use, we'd call XLogBeginRead+XLogReadRecord here */
+    if (!WALRead(&fake_xlogreader,
+                 &output_message.data[output_message.len],

I'm not sure the point of the XXX comment, but I think WALRead here is
the right thing and we aren't going to use
XLogBeginRead+XLogReadRecord here.  So it seems to me the comment is
misleading and instead we need such a comment for fake_xlogreader like
this.

+    static XLogReaderState fake_xlogreader =
+    {
+        /* fake reader state only to let WALRead to use the callbacks */


wal_segment_close(XlogReaderState *state) is setting
state->seg.ws_file to -1.  On the other hand wal_segment_close(state,..)
doesn't update ws_file and the caller sets the returned value to
(eventually) the same field.

+            seg->ws_file = state->routine.segment_open(state, nextSegNo,
+                                                       segcxt, &tli);

If you are willing to do so, I think it is better to make the callback
functions are responsible to update the seg.ws_file and the callers
don't care.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: 2pc leaks fds

От
Alvaro Herrera
Дата:
On 2020-May-08, Kyotaro Horiguchi wrote:

> I agree to the direction of this patch. Thanks for the explanation.
> The patch looks good to me except the two points below.

Thanks!  I pushed the patch.  I fixed the walsender commentary as you
suggested, but I'm still of the opinion that we might want to use the
XLogReader abstraction in physical walsender than work without it; if
nothing else, that would simplify WALRead's API.

I didn't change this one though:

> wal_segment_close(XlogReaderState *state) is setting
> state->seg.ws_file to -1.  On the other hand wal_segment_close(state,..)
> doesn't update ws_file and the caller sets the returned value to
> (eventually) the same field.
> 
> +            seg->ws_file = state->routine.segment_open(state, nextSegNo,
> +                                                       segcxt, &tli);
> 
> If you are willing to do so, I think it is better to make the callback
> functions are responsible to update the seg.ws_file and the callers
> don't care.

I agree that this would be a good idea, but it's more than just a
handful of lines of changes so I think we should consider it separately.
Attached as 0002.  I also realized while doing this that we can further
simplify WALRead()'s API if we're willing to bend walsender a little bit
more into the fake xlogreader thing; that's 0001.

I marked the open item closed nonetheless.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: 2pc leaks fds

От
Antonin Houska
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> On 2020-May-08, Kyotaro Horiguchi wrote:
> 
> > I agree to the direction of this patch. Thanks for the explanation.
> > The patch looks good to me except the two points below.
> 
> Thanks!  I pushed the patch.

I tried to follow the discussion but haven't had better idea. This looks
better than the previous version. Thanks!

While looking at the changes, I've noticed a small comment issue in the
XLogReaderRoutine structure definition:

    * "tli_p" is an input/output argument. XLogRead() uses it to pass the

The XLogRead() function has been renamed to WALRead() in 0dc8ead4. (This
incorrect comment was actually introduced by that commit.)

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com



Re: 2pc leaks fds

От
Alvaro Herrera
Дата:
Hi Antonin, thanks for the review.


On 2020-May-11, Antonin Houska wrote:

> While looking at the changes, I've noticed a small comment issue in the
> XLogReaderRoutine structure definition:
> 
>     * "tli_p" is an input/output argument. XLogRead() uses it to pass the
> 
> The XLogRead() function has been renamed to WALRead() in 0dc8ead4. (This
> incorrect comment was actually introduced by that commit.)

Ah.  I'll fix this, thanks for pointing it out.

(It might be that the TLI situation can be improved with some callback,
too.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services