Обсуждение: replication_origin and replication_origin_lsn usage on subscriber

Поиск
Список
Период
Сортировка

replication_origin and replication_origin_lsn usage on subscriber

От
Amit Kapila
Дата:
During logical decoding, we send replication_origin and
replication_origin_lsn when we decode commit.  In pgoutput_begin_txn,
we send values for these two but never used on the subscriber side.
Though we have provided a function (logicalrep_read_origin) to read
these two values but that is not used in code anywhere.

I think this is primarily for external application usage, but it is
not very clear how will they use it.  As far as I understand, the
value of origin can be used to avoid loops in bi-directional
replication, and origin_lsn can be used to track how far subscriber
has recevied changes.  I am not sure about this and particularly how
origin_lsn can be used in external applications.

This has come up in the discussion of the "logical streaming of large
in-progress transactions" [1]. Basically, we are not sure when to send
these values during streaming as we don't know its clear usage.

Thoughts?

[1] - https://www.postgresql.org/message-id/CAFiTN-skHvSWDHV66qpzMfnHH6AvsE2YAjvh4Kt613E8ZD8WoQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Amit Kapila
Дата:
On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> During logical decoding, we send replication_origin and
> replication_origin_lsn when we decode commit.  In pgoutput_begin_txn,
> we send values for these two but never used on the subscriber side.
> Though we have provided a function (logicalrep_read_origin) to read
> these two values but that is not used in code anywhere.
>

For the purpose of decoding in-progress transactions, I think we can
send replication_origin in the first 'start' message as it is present
with each WAL record, however replication_origin_lsn is only logged at
commit time, so can't send it before commit.  The
replication_origin_lsn is set by pg_replication_origin_xact_setup()
but it is not clear how and when that function can be used.  Do we
really need replication_origin_lsn before we decode the commit record?

Note- I have added few more people which I could see are working in a
similar area to get some response.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Petr Jelinek
Дата:
On 09/07/2020 13:10, Amit Kapila wrote:
> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> During logical decoding, we send replication_origin and
>> replication_origin_lsn when we decode commit.  In pgoutput_begin_txn,
>> we send values for these two but never used on the subscriber side.
>> Though we have provided a function (logicalrep_read_origin) to read
>> these two values but that is not used in code anywhere.
>>

We don't use the origin message anywhere really because we don't support 
origin forwarding in the built-in replication yet. That part I left out 
intentionally in the original PG10 patchset as it's mostly useful for 
circular replication detection when you want to replicate both ways. 
However that's relatively useless without also having some kind of 
conflict detection which would be another huge pile of code and I 
expected we would end up not getting logical replication in PG10 at all 
if I tried to push conflict detection as well :)

> 
> For the purpose of decoding in-progress transactions, I think we can
> send replication_origin in the first 'start' message as it is present
> with each WAL record, however replication_origin_lsn is only logged at
> commit time, so can't send it before commit.  The
> replication_origin_lsn is set by pg_replication_origin_xact_setup()
> but it is not clear how and when that function can be used.  Do we
> really need replication_origin_lsn before we decode the commit record?
> 

That's the SQL interface, C interface does not require that and I don't 
think we need to do that. The existing apply code sets the 
replorigin_session_origin_lsn only when processing commit message IIRC.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Amit Kapila
Дата:
On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> On 09/07/2020 13:10, Amit Kapila wrote:
> > On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >> During logical decoding, we send replication_origin and
> >> replication_origin_lsn when we decode commit.  In pgoutput_begin_txn,
> >> we send values for these two but never used on the subscriber side.
> >> Though we have provided a function (logicalrep_read_origin) to read
> >> these two values but that is not used in code anywhere.
> >>
>
> We don't use the origin message anywhere really because we don't support
> origin forwarding in the built-in replication yet. That part I left out
> intentionally in the original PG10 patchset as it's mostly useful for
> circular replication detection when you want to replicate both ways.
> However that's relatively useless without also having some kind of
> conflict detection which would be another huge pile of code and I
> expected we would end up not getting logical replication in PG10 at all
> if I tried to push conflict detection as well :)
>

Fair enough.  However, without tests and more documentation about this
concept, it is likely that future development might break it.  It is
good that you and others who know this part well are there to respond
but still, the more documentation and tests would be preferred.

> >
> > For the purpose of decoding in-progress transactions, I think we can
> > send replication_origin in the first 'start' message as it is present
> > with each WAL record, however replication_origin_lsn is only logged at
> > commit time, so can't send it before commit.  The
> > replication_origin_lsn is set by pg_replication_origin_xact_setup()
> > but it is not clear how and when that function can be used.  Do we
> > really need replication_origin_lsn before we decode the commit record?
> >
>
> That's the SQL interface, C interface does not require that and I don't
> think we need to do that.
>

I think when you are saying SQL interface, you referred to
pg_replication_origin_xact_setup() but I am not sure which C interface
you are referring to in the above sentence?

> The existing apply code sets the
> replorigin_session_origin_lsn only when processing commit message IIRC.
>

That's correct.  However, we do send it via 'begin' callback which
won't be possible with the streaming of in-progress transactions.  Do
we need to send this origin related information (origin, origin_lsn)
while streaming of in-progress transactions?  If so, when?  As far as
I can see, the origin_id can be sent with the first 'start' message.
The origin_lsn and origin_commit can be sent with the last 'start' of
streaming commit if we want but not sure if that is of use.  If we
need to send origin_lsn earlier than that then we need to record it
with other WAL records (other than Commit WAL record).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Petr Jelinek
Дата:
Hi,

On 09/07/2020 14:34, Amit Kapila wrote:
> On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
>>
>> On 09/07/2020 13:10, Amit Kapila wrote:
>>> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>
>>>> During logical decoding, we send replication_origin and
>>>> replication_origin_lsn when we decode commit.  In pgoutput_begin_txn,
>>>> we send values for these two but never used on the subscriber side.
>>>> Though we have provided a function (logicalrep_read_origin) to read
>>>> these two values but that is not used in code anywhere.
>>>>
>>
>> We don't use the origin message anywhere really because we don't support
>> origin forwarding in the built-in replication yet. That part I left out
>> intentionally in the original PG10 patchset as it's mostly useful for
>> circular replication detection when you want to replicate both ways.
>> However that's relatively useless without also having some kind of
>> conflict detection which would be another huge pile of code and I
>> expected we would end up not getting logical replication in PG10 at all
>> if I tried to push conflict detection as well :)
>>
> 
> Fair enough.  However, without tests and more documentation about this
> concept, it is likely that future development might break it.  It is
> good that you and others who know this part well are there to respond
> but still, the more documentation and tests would be preferred.
> 

Honestly that part didn't even need to be committed given it's unused. 
Protocol supports versioning so it could have been added at later time.

>>>
>>> For the purpose of decoding in-progress transactions, I think we can
>>> send replication_origin in the first 'start' message as it is present
>>> with each WAL record, however replication_origin_lsn is only logged at
>>> commit time, so can't send it before commit.  The
>>> replication_origin_lsn is set by pg_replication_origin_xact_setup()
>>> but it is not clear how and when that function can be used.  Do we
>>> really need replication_origin_lsn before we decode the commit record?
>>>
>>
>> That's the SQL interface, C interface does not require that and I don't
>> think we need to do that.
>>
> 
> I think when you are saying SQL interface, you referred to
> pg_replication_origin_xact_setup() but I am not sure which C interface
> you are referring to in the above sentence?
> 

All the stuff pg_replication_origin_xact_setup does internally.

>> The existing apply code sets the
>> replorigin_session_origin_lsn only when processing commit message IIRC.
>>
> 
> That's correct.  However, we do send it via 'begin' callback which
> won't be possible with the streaming of in-progress transactions.  Do
> we need to send this origin related information (origin, origin_lsn)
> while streaming of in-progress transactions?  If so, when?  As far as
> I can see, the origin_id can be sent with the first 'start' message.
> The origin_lsn and origin_commit can be sent with the last 'start' of
> streaming commit if we want but not sure if that is of use.  If we
> need to send origin_lsn earlier than that then we need to record it
> with other WAL records (other than Commit WAL record).
> 

If we were to support the origin forwarding, then strictly speaking we 
need everything only at commit time from correctness perspective, but 
ideally origin_id would be best sent with first message as it can be 
used to filter out changes at decoding stage rather than while we 
process the commit so having it set early improves performance of decoding.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Amit Kapila
Дата:
On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> Hi,
>
> On 09/07/2020 14:34, Amit Kapila wrote:
> > On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> >>
> >> On 09/07/2020 13:10, Amit Kapila wrote:
> >>> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>>
> >>>> During logical decoding, we send replication_origin and
> >>>> replication_origin_lsn when we decode commit.  In pgoutput_begin_txn,
> >>>> we send values for these two but never used on the subscriber side.
> >>>> Though we have provided a function (logicalrep_read_origin) to read
> >>>> these two values but that is not used in code anywhere.
> >>>>
> >>
> >> We don't use the origin message anywhere really because we don't support
> >> origin forwarding in the built-in replication yet. That part I left out
> >> intentionally in the original PG10 patchset as it's mostly useful for
> >> circular replication detection when you want to replicate both ways.
> >> However that's relatively useless without also having some kind of
> >> conflict detection which would be another huge pile of code and I
> >> expected we would end up not getting logical replication in PG10 at all
> >> if I tried to push conflict detection as well :)
> >>
> >
> > Fair enough.  However, without tests and more documentation about this
> > concept, it is likely that future development might break it.  It is
> > good that you and others who know this part well are there to respond
> > but still, the more documentation and tests would be preferred.
> >
>
> Honestly that part didn't even need to be committed given it's unused.
> Protocol supports versioning so it could have been added at later time.
>
> >>>
> >>> For the purpose of decoding in-progress transactions, I think we can
> >>> send replication_origin in the first 'start' message as it is present
> >>> with each WAL record, however replication_origin_lsn is only logged at
> >>> commit time, so can't send it before commit.  The
> >>> replication_origin_lsn is set by pg_replication_origin_xact_setup()
> >>> but it is not clear how and when that function can be used.  Do we
> >>> really need replication_origin_lsn before we decode the commit record?
> >>>
> >>
> >> That's the SQL interface, C interface does not require that and I don't
> >> think we need to do that.
> >>
> >
> > I think when you are saying SQL interface, you referred to
> > pg_replication_origin_xact_setup() but I am not sure which C interface
> > you are referring to in the above sentence?
> >
>
> All the stuff pg_replication_origin_xact_setup does internally.
>
> >> The existing apply code sets the
> >> replorigin_session_origin_lsn only when processing commit message IIRC.
> >>
> >
> > That's correct.  However, we do send it via 'begin' callback which
> > won't be possible with the streaming of in-progress transactions.  Do
> > we need to send this origin related information (origin, origin_lsn)
> > while streaming of in-progress transactions?  If so, when?  As far as
> > I can see, the origin_id can be sent with the first 'start' message.
> > The origin_lsn and origin_commit can be sent with the last 'start' of
> > streaming commit if we want but not sure if that is of use.  If we
> > need to send origin_lsn earlier than that then we need to record it
> > with other WAL records (other than Commit WAL record).
> >
>
> If we were to support the origin forwarding, then strictly speaking we
> need everything only at commit time from correctness perspective,
>

Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'

> but
> ideally origin_id would be best sent with first message as it can be
> used to filter out changes at decoding stage rather than while we
> process the commit so having it set early improves performance of decoding.
>

Yeah, makes sense.  So, we will just send origin_id (with first
streaming start message) and leave others.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Dilip Kumar
Дата:
On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> >
> > Hi,
> >
> > On 09/07/2020 14:34, Amit Kapila wrote:
> > > On Thu, Jul 9, 2020 at 5:16 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> > >>
> > >> On 09/07/2020 13:10, Amit Kapila wrote:
> > >>> On Thu, Feb 6, 2020 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >>>>
> > >>>> During logical decoding, we send replication_origin and
> > >>>> replication_origin_lsn when we decode commit.  In pgoutput_begin_txn,
> > >>>> we send values for these two but never used on the subscriber side.
> > >>>> Though we have provided a function (logicalrep_read_origin) to read
> > >>>> these two values but that is not used in code anywhere.
> > >>>>
> > >>
> > >> We don't use the origin message anywhere really because we don't support
> > >> origin forwarding in the built-in replication yet. That part I left out
> > >> intentionally in the original PG10 patchset as it's mostly useful for
> > >> circular replication detection when you want to replicate both ways.
> > >> However that's relatively useless without also having some kind of
> > >> conflict detection which would be another huge pile of code and I
> > >> expected we would end up not getting logical replication in PG10 at all
> > >> if I tried to push conflict detection as well :)
> > >>
> > >
> > > Fair enough.  However, without tests and more documentation about this
> > > concept, it is likely that future development might break it.  It is
> > > good that you and others who know this part well are there to respond
> > > but still, the more documentation and tests would be preferred.
> > >
> >
> > Honestly that part didn't even need to be committed given it's unused.
> > Protocol supports versioning so it could have been added at later time.
> >
> > >>>
> > >>> For the purpose of decoding in-progress transactions, I think we can
> > >>> send replication_origin in the first 'start' message as it is present
> > >>> with each WAL record, however replication_origin_lsn is only logged at
> > >>> commit time, so can't send it before commit.  The
> > >>> replication_origin_lsn is set by pg_replication_origin_xact_setup()
> > >>> but it is not clear how and when that function can be used.  Do we
> > >>> really need replication_origin_lsn before we decode the commit record?
> > >>>
> > >>
> > >> That's the SQL interface, C interface does not require that and I don't
> > >> think we need to do that.
> > >>
> > >
> > > I think when you are saying SQL interface, you referred to
> > > pg_replication_origin_xact_setup() but I am not sure which C interface
> > > you are referring to in the above sentence?
> > >
> >
> > All the stuff pg_replication_origin_xact_setup does internally.
> >
> > >> The existing apply code sets the
> > >> replorigin_session_origin_lsn only when processing commit message IIRC.
> > >>
> > >
> > > That's correct.  However, we do send it via 'begin' callback which
> > > won't be possible with the streaming of in-progress transactions.  Do
> > > we need to send this origin related information (origin, origin_lsn)
> > > while streaming of in-progress transactions?  If so, when?  As far as
> > > I can see, the origin_id can be sent with the first 'start' message.
> > > The origin_lsn and origin_commit can be sent with the last 'start' of
> > > streaming commit if we want but not sure if that is of use.  If we
> > > need to send origin_lsn earlier than that then we need to record it
> > > with other WAL records (other than Commit WAL record).
> > >
> >
> > If we were to support the origin forwarding, then strictly speaking we
> > need everything only at commit time from correctness perspective,
> >
>
> Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'
>
> > but
> > ideally origin_id would be best sent with first message as it can be
> > used to filter out changes at decoding stage rather than while we
> > process the commit so having it set early improves performance of decoding.
> >
>
> Yeah, makes sense.  So, we will just send origin_id (with first
> streaming start message) and leave others.

So IIUC, currently we are sending the latest origin_id which is set
during the commit time.  So in our case, while we start streaming we
will send the origin_id of the latest change in the current stream
right?  I think we will always have to remember the latest origin id
in top-level ReorderBufferTXN as well.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Amit Kapila
Дата:
On Tue, Jul 14, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> > >
> > >
> > > If we were to support the origin forwarding, then strictly speaking we
> > > need everything only at commit time from correctness perspective,
> > >
> >
> > Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'
> >
> > > but
> > > ideally origin_id would be best sent with first message as it can be
> > > used to filter out changes at decoding stage rather than while we
> > > process the commit so having it set early improves performance of decoding.
> > >
> >
> > Yeah, makes sense.  So, we will just send origin_id (with first
> > streaming start message) and leave others.
>
> So IIUC, currently we are sending the latest origin_id which is set
> during the commit time.  So in our case, while we start streaming we
> will send the origin_id of the latest change in the current stream
> right?
>

It has to be sent only once with the first start message not with
consecutive start messages.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Dilip Kumar
Дата:
On Tue, Jul 14, 2020 at 11:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 14, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> > > >
> > > >
> > > > If we were to support the origin forwarding, then strictly speaking we
> > > > need everything only at commit time from correctness perspective,
> > > >
> > >
> > > Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'
> > >
> > > > but
> > > > ideally origin_id would be best sent with first message as it can be
> > > > used to filter out changes at decoding stage rather than while we
> > > > process the commit so having it set early improves performance of decoding.
> > > >
> > >
> > > Yeah, makes sense.  So, we will just send origin_id (with first
> > > streaming start message) and leave others.
> >
> > So IIUC, currently we are sending the latest origin_id which is set
> > during the commit time.  So in our case, while we start streaming we
> > will send the origin_id of the latest change in the current stream
> > right?
> >
>
> It has to be sent only once with the first start message not with
> consecutive start messages.

Okay,  so do you mean to say that with the first start message we send
the origin_id of the latest change?  because during the transaction
lifetime, the origin id can be changed.  Currently, we send the
origin_id of the latest WAL i.e. origin id of the commit.  so I think
it will be on a similar line if with every stream_start we send the
origin_id of the latest change in that stream.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Amit Kapila
Дата:
On Tue, Jul 14, 2020 at 12:05 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jul 14, 2020 at 11:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jul 14, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> > > > >
> > > > >
> > > > > If we were to support the origin forwarding, then strictly speaking we
> > > > > need everything only at commit time from correctness perspective,
> > > > >
> > > >
> > > > Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'
> > > >
> > > > > but
> > > > > ideally origin_id would be best sent with first message as it can be
> > > > > used to filter out changes at decoding stage rather than while we
> > > > > process the commit so having it set early improves performance of decoding.
> > > > >
> > > >
> > > > Yeah, makes sense.  So, we will just send origin_id (with first
> > > > streaming start message) and leave others.
> > >
> > > So IIUC, currently we are sending the latest origin_id which is set
> > > during the commit time.  So in our case, while we start streaming we
> > > will send the origin_id of the latest change in the current stream
> > > right?
> > >
> >
> > It has to be sent only once with the first start message not with
> > consecutive start messages.
>
> Okay,  so do you mean to say that with the first start message we send
> the origin_id of the latest change?
>

Yes.

>  because during the transaction
> lifetime, the origin id can be changed.
>

Yeah, it could be changed but if we have to send again apart from with
the first message then it should be sent with each message.  So, I
think it is better to just send it once during the transaction as we
do it now (send with begin message).


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Petr Jelinek
Дата:
Hi,

On 14/07/2020 10:29, Amit Kapila wrote:
> On Tue, Jul 14, 2020 at 12:05 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>
>> On Tue, Jul 14, 2020 at 11:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> On Tue, Jul 14, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>>>
>>>> On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>>
>>>>> On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
>>>>>>
>>>>>>
>>>>>> If we were to support the origin forwarding, then strictly speaking we
>>>>>> need everything only at commit time from correctness perspective,
>>>>>>
>>>>>
>>>>> Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'
>>>>>
>>>>>> but
>>>>>> ideally origin_id would be best sent with first message as it can be
>>>>>> used to filter out changes at decoding stage rather than while we
>>>>>> process the commit so having it set early improves performance of decoding.
>>>>>>
>>>>>
>>>>> Yeah, makes sense.  So, we will just send origin_id (with first
>>>>> streaming start message) and leave others.
>>>>
>>>> So IIUC, currently we are sending the latest origin_id which is set
>>>> during the commit time.  So in our case, while we start streaming we
>>>> will send the origin_id of the latest change in the current stream
>>>> right?
>>>>
>>>
>>> It has to be sent only once with the first start message not with
>>> consecutive start messages.
>>
>> Okay,  so do you mean to say that with the first start message we send
>> the origin_id of the latest change?
>>
> 
> Yes.
> 
>>   because during the transaction
>> lifetime, the origin id can be changed.
>>
> 
> Yeah, it could be changed but if we have to send again apart from with
> the first message then it should be sent with each message.  So, I
> think it is better to just send it once during the transaction as we
> do it now (send with begin message).
> 
> 

I am not sure if I can follow the discussion here very well, but if I 
understand correctly I'd like to clarify two things:
- origin id does not change mid transaction as you can only have one per xid
- until we have origin forwarding feature, the origin id is always same 
for a given subscription

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Dilip Kumar
Дата:
On Tue, Jul 14, 2020 at 2:47 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> Hi,
>
> On 14/07/2020 10:29, Amit Kapila wrote:
> > On Tue, Jul 14, 2020 at 12:05 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >>
> >> On Tue, Jul 14, 2020 at 11:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>
> >>> On Tue, Jul 14, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >>>>
> >>>> On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>>>
> >>>>> On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>> If we were to support the origin forwarding, then strictly speaking we
> >>>>>> need everything only at commit time from correctness perspective,
> >>>>>>
> >>>>>
> >>>>> Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'
> >>>>>
> >>>>>> but
> >>>>>> ideally origin_id would be best sent with first message as it can be
> >>>>>> used to filter out changes at decoding stage rather than while we
> >>>>>> process the commit so having it set early improves performance of decoding.
> >>>>>>
> >>>>>
> >>>>> Yeah, makes sense.  So, we will just send origin_id (with first
> >>>>> streaming start message) and leave others.
> >>>>
> >>>> So IIUC, currently we are sending the latest origin_id which is set
> >>>> during the commit time.  So in our case, while we start streaming we
> >>>> will send the origin_id of the latest change in the current stream
> >>>> right?
> >>>>
> >>>
> >>> It has to be sent only once with the first start message not with
> >>> consecutive start messages.
> >>
> >> Okay,  so do you mean to say that with the first start message we send
> >> the origin_id of the latest change?
> >>
> >
> > Yes.
> >
> >>   because during the transaction
> >> lifetime, the origin id can be changed.
> >>
> >
> > Yeah, it could be changed but if we have to send again apart from with
> > the first message then it should be sent with each message.  So, I
> > think it is better to just send it once during the transaction as we
> > do it now (send with begin message).
> >
> >
>
> I am not sure if I can follow the discussion here very well, but if I
> understand correctly I'd like to clarify two things:
> - origin id does not change mid transaction as you can only have one per xid

Actually, I was talking about if someone changes the session origin
then which origin id we should send?  currently, we send data only
during the commit so we take the origin id from the commit wal and
send the same.  In the below example, I am inserting 2 records in a
transaction and each of them has different origin id.

begin;
select pg_replication_origin_session_setup('o1');
insert into t values(1, 'test');
select pg_replication_origin_session_reset();
select pg_replication_origin_session_setup('o2');   --> Origin ID changed
insert into t values(2, 'test');
commit;

> - until we have origin forwarding feature, the origin id is always same
> for a given subscription

ok

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Amit Kapila
Дата:
On Tue, Jul 14, 2020 at 2:47 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> Hi,
>
> On 14/07/2020 10:29, Amit Kapila wrote:
> > On Tue, Jul 14, 2020 at 12:05 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >>
> >> On Tue, Jul 14, 2020 at 11:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>
> >>> On Tue, Jul 14, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >>>>
> >>>> On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>>>>
> >>>>> On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>> If we were to support the origin forwarding, then strictly speaking we
> >>>>>> need everything only at commit time from correctness perspective,
> >>>>>>
> >>>>>
> >>>>> Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'
> >>>>>
> >>>>>> but
> >>>>>> ideally origin_id would be best sent with first message as it can be
> >>>>>> used to filter out changes at decoding stage rather than while we
> >>>>>> process the commit so having it set early improves performance of decoding.
> >>>>>>
> >>>>>
> >>>>> Yeah, makes sense.  So, we will just send origin_id (with first
> >>>>> streaming start message) and leave others.
> >>>>
> >>>> So IIUC, currently we are sending the latest origin_id which is set
> >>>> during the commit time.  So in our case, while we start streaming we
> >>>> will send the origin_id of the latest change in the current stream
> >>>> right?
> >>>>
> >>>
> >>> It has to be sent only once with the first start message not with
> >>> consecutive start messages.
> >>
> >> Okay,  so do you mean to say that with the first start message we send
> >> the origin_id of the latest change?
> >>
> >
> > Yes.
> >
> >>   because during the transaction
> >> lifetime, the origin id can be changed.
> >>
> >
> > Yeah, it could be changed but if we have to send again apart from with
> > the first message then it should be sent with each message.  So, I
> > think it is better to just send it once during the transaction as we
> > do it now (send with begin message).
> >
> >
>
> I am not sure if I can follow the discussion here very well, but if I
> understand correctly I'd like to clarify two things:
> - origin id does not change mid transaction as you can only have one per xid
>

As shown by Dilip, I don't think currently we have any way to prevent
this from changing during the transaction.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Petr Jelinek
Дата:
On 14/07/2020 11:36, Dilip Kumar wrote:
> On Tue, Jul 14, 2020 at 2:47 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
>>
>> Hi,
>>
>> On 14/07/2020 10:29, Amit Kapila wrote:
>>> On Tue, Jul 14, 2020 at 12:05 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>>>
>>>> On Tue, Jul 14, 2020 at 11:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>>
>>>>> On Tue, Jul 14, 2020 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>>>>>
>>>>>> On Thu, Jul 9, 2020 at 6:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>>>>>
>>>>>>> On Thu, Jul 9, 2020 at 6:14 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> If we were to support the origin forwarding, then strictly speaking we
>>>>>>>> need everything only at commit time from correctness perspective,
>>>>>>>>
>>>>>>>
>>>>>>> Okay.  Anyway streaming mode is optional, so in such cases, we can keep it 'off'
>>>>>>>
>>>>>>>> but
>>>>>>>> ideally origin_id would be best sent with first message as it can be
>>>>>>>> used to filter out changes at decoding stage rather than while we
>>>>>>>> process the commit so having it set early improves performance of decoding.
>>>>>>>>
>>>>>>>
>>>>>>> Yeah, makes sense.  So, we will just send origin_id (with first
>>>>>>> streaming start message) and leave others.
>>>>>>
>>>>>> So IIUC, currently we are sending the latest origin_id which is set
>>>>>> during the commit time.  So in our case, while we start streaming we
>>>>>> will send the origin_id of the latest change in the current stream
>>>>>> right?
>>>>>>
>>>>>
>>>>> It has to be sent only once with the first start message not with
>>>>> consecutive start messages.
>>>>
>>>> Okay,  so do you mean to say that with the first start message we send
>>>> the origin_id of the latest change?
>>>>
>>>
>>> Yes.
>>>
>>>>    because during the transaction
>>>> lifetime, the origin id can be changed.
>>>>
>>>
>>> Yeah, it could be changed but if we have to send again apart from with
>>> the first message then it should be sent with each message.  So, I
>>> think it is better to just send it once during the transaction as we
>>> do it now (send with begin message).
>>>
>>>
>>
>> I am not sure if I can follow the discussion here very well, but if I
>> understand correctly I'd like to clarify two things:
>> - origin id does not change mid transaction as you can only have one per xid
> 
> Actually, I was talking about if someone changes the session origin
> then which origin id we should send?  currently, we send data only
> during the commit so we take the origin id from the commit wal and
> send the same.  In the below example, I am inserting 2 records in a
> transaction and each of them has different origin id.
> 
> begin;
> select pg_replication_origin_session_setup('o1');
> insert into t values(1, 'test');
> select pg_replication_origin_session_reset();
> select pg_replication_origin_session_setup('o2');   --> Origin ID changed
> insert into t values(2, 'test');
> commit;
> 

Commit record and commit_ts record will both include only 'o2', while 
individual DML WAL records will contain one or the other depending on 
when they were done.

The origin API is really not really prepared for this situation 
(independently of streaming) because the origin lookup for all rows in 
that transaction will return 'o2', but decoding will decode whatever is 
in the DML WAL record.

One can't even use this approach for sensible filtering as the ultimate 
faith of whole transaction is decided by what's in commit record since 
the filter callback only provides origin id, not record being processed 
so plugin can't differentiate. So it's hard to see how the above pattern 
could be used for anything but breaking things. Not sure what Andres' 
original intention was with allowing this.

-- 
Petr Jelinek
2ndQuadrant - PostgreSQL Solutions for the Enterprise
https://www.2ndQuadrant.com/



Re: replication_origin and replication_origin_lsn usage on subscriber

От
Amit Kapila
Дата:
On Tue, Jul 14, 2020 at 3:37 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
>
> On 14/07/2020 11:36, Dilip Kumar wrote:
> > On Tue, Jul 14, 2020 at 2:47 PM Petr Jelinek <petr@2ndquadrant.com> wrote:
> >>
> >> I am not sure if I can follow the discussion here very well, but if I
> >> understand correctly I'd like to clarify two things:
> >> - origin id does not change mid transaction as you can only have one per xid
> >
> > Actually, I was talking about if someone changes the session origin
> > then which origin id we should send?  currently, we send data only
> > during the commit so we take the origin id from the commit wal and
> > send the same.  In the below example, I am inserting 2 records in a
> > transaction and each of them has different origin id.
> >
> > begin;
> > select pg_replication_origin_session_setup('o1');
> > insert into t values(1, 'test');
> > select pg_replication_origin_session_reset();
> > select pg_replication_origin_session_setup('o2');   --> Origin ID changed
> > insert into t values(2, 'test');
> > commit;
> >
>
> Commit record and commit_ts record will both include only 'o2', while
> individual DML WAL records will contain one or the other depending on
> when they were done.
>
> The origin API is really not really prepared for this situation
> (independently of streaming) because the origin lookup for all rows in
> that transaction will return 'o2', but decoding will decode whatever is
> in the DML WAL record.
>
> One can't even use this approach for sensible filtering as the ultimate
> faith of whole transaction is decided by what's in commit record since
> the filter callback only provides origin id, not record being processed
> so plugin can't differentiate. So it's hard to see how the above pattern
> could be used for anything but breaking things.
>

Fair enough, I think we can proceed with the assumption that it won't
change during the transaction and send origin_id along with the very
first *start* message during the streaming of in-progress
transactions.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com