Обсуждение: [Patch] add new parameter to pg_replication_origin_session_setup
Hello all, While working on our internal tools that utilise replication, we realised that a new parameter was added to the internal C function corresponding to pg_replication_origin_session_setup. However this parameter wasn't included in the user-facing API [1]. In 'src/backend/replication/logical/origin.c' at line 1359, pg_replication_origin_session_setup function calls replorigin_session_setup(origin, 0); where currently 0 is assigned to the acquired_by parameter of the replorigin_session_setup. I made this patch to the master which adds a way to control this parameter by adding a new version of the pg_replication_origin_session_setup function with user facing parameters 'text int4' in place of the current 'text' while keeping the existing variant (ensuring backwards compatibility). Could someone take a look at it? [1]: https://www.postgresql.org/docs/current/functions-admin.html#PG-REPLICATION-ORIGIN-SESSION-SETUP --- Thanks for the help, Doruk Yılmaz
Вложения
On Mon, Aug 12, 2024, at 3:43 PM, Doruk Yilmaz wrote:
Hello all,
Hi!
While working on our internal tools that utilise replication, werealised that a new parameter was added to the internal C functioncorresponding to pg_replication_origin_session_setup.However this parameter wasn't included in the user-facing API [1].
I'm curious about your use case. Is it just because the internal function has a
different signature or your tool is capable of apply logical replication changes
in parallel using the SQL API?
I made this patch to the master which adds a way to control thisparameter by adding a new version of thepg_replication_origin_session_setup function with user facingparameters 'text int4' in place of the current 'text' while keepingthe existing variant(ensuring backwards compatibility). Could someone take a look at it?
I did a quick look at your patch and have a few suggestions.
* no documentation changes. Since the function you are changing has a new
signature, this change should be reflected in the documentation.
* no need for a new internal function. The second parameter (PID) can be
optional and defaults to 0 in this case. See how we changed the
pg_create_logical_replication_slot along the years add some IN parameters like
twophase and failover in the recent versions.
* add a CF entry [1] for this patch so we don't forget it. Another advantage is
that this patch is covered by CI [2][3].
I noticed that the patch needs rebasing, so here is the rebased version. Hopefully it makes to the commitfest. Doruk Yılmaz
Вложения
On Thu, Jan 9, 2025 at 3:26 AM Euler Taveira <euler@eulerto.com> wrote: > > On Thu, Aug 15, 2024, at 5:53 PM, Doruk Yilmaz wrote: > > Hello again, > > On Tue, Aug 13, 2024 at 12:48 AM Euler Taveira <euler@eulerto.com> wrote: > > I'm curious about your use case. Is it just because the internal function has a > > different signature or your tool is capable of apply logical replication changes > > in parallel using the SQL API? > > The latter is correct, it applies logical replication changes in parallel. > Since multiple connections may commit, we need all of them to be able > to advance the replication origin. > To use replication_origin by multiple processes, one must maintain the commit order as we do internally by allowing the leader process to wait for the parallel worker to finish the commit. See comments atop replorigin_session_setup(). Now, we could expose the pid parameter as proposed by the patch after documenting the additional requirements, but I am afraid that users may directly start using the API without following the commit order principle, which can lead to incorrect replication. So, isn't it better to do something to avoid the misuse of this feature before exposing it? -- With Regards, Amit Kapila.
On Mon, Mar 3, 2025 at 6:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > To use replication_origin by multiple processes, one must maintain the > commit order as we do internally by allowing the leader process to > wait for the parallel worker to finish the commit. See comments atop > replorigin_session_setup(). Now, we could expose the pid parameter as > proposed by the patch after documenting the additional requirements, > but I am afraid that users may directly start using the API without > following the commit order principle, which can lead to incorrect > replication. So, isn't it better to do something to avoid the misuse > of this feature before exposing it? Wouldn't mentioning/describing needing to follow the commit order principle on the documentation be enough for this? It is quite an advanced feature that I don't believe person intending to use it won't start with reading documentation first. Is there any updates on the commit? I see that intended commitfest window ended. Thanks, Doruk Yılmaz
On Tue, Jul 29, 2025 at 2:43 AM Doruk Yilmaz <doruk@mixrank.com> wrote: > > On Mon, Mar 3, 2025 at 6:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > To use replication_origin by multiple processes, one must maintain the > > commit order as we do internally by allowing the leader process to > > wait for the parallel worker to finish the commit. See comments atop > > replorigin_session_setup(). Now, we could expose the pid parameter as > > proposed by the patch after documenting the additional requirements, > > but I am afraid that users may directly start using the API without > > following the commit order principle, which can lead to incorrect > > replication. So, isn't it better to do something to avoid the misuse > > of this feature before exposing it? > > Wouldn't mentioning/describing needing to follow the commit order > principle on the documentation be enough for this? > It is quite an advanced feature that I don't believe person intending > to use it won't start with reading documentation first. > That is true but I still feel there has to be some mechanism where we can catch and give an ERROR to the user, if it doesn't follow the same. For example, pg_replication_origin_advance() always allows going backwards in terms of LSN which means if one doesn't follow commit order, it can lead to breaking the replication as after restart the client can ask to start replication from some prior point. > > Is there any updates on the commit? > I think we are still under discussion about the requirements and design for this API. Can you tell us the use case? Did you also intend to use it for parallel apply, if so, can you also tell at a high level, how you are planning to manage origin? It will help us to extend the API(s) in a meaningful way. -- With Regards, Amit Kapila.
On Mon, Jul 29, 2025 at 8:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > That is true but I still feel there has to be some mechanism where we > can catch and give an ERROR to the user, if it doesn't follow the > same. For example, pg_replication_origin_advance() always allows going > backwards in terms of LSN which means if one doesn't follow commit > order, it can lead to breaking the replication as after restart the > client can ask to start replication from some prior point. If you have any ideas for safeguards or API changes, I'd be happy to help implement them or discuss them. > Can you tell us the use case? Did you also intend to use it for parallel apply, if so, can you also tell at a high > level, how you are planning to manage origin? Yes, we use it for parallel apply. We have a custom logical replication system that applies changes using multiple worker processes, each with their own database connection. Our use case requires multiple connections to be able to advance the same replication origin. We handle this by having a master process coordinate the workers, where each worker process calls pg_replication_origin_session_setup with the master's PID as the second parameter. We may send operations out of order but we always commit in order, so there's no chance of creating inconsistencies. There's the chance of deadlocks, but these can be detected. It's really similar to the existing parallel apply implementation - the main difference is that we're applying from jsonl files instead of directly from another database. Currently we use a local patch to expose the PID parameter, but having this upstream would be great. It causes a lot of headaches for us to use a patched PostgreSQL. Thanks, Doruk Yılmaz
On Wed, Jul 30, 2025 at 12:00 AM Doruk Yilmaz <doruk@mixrank.com> wrote: > > On Mon, Jul 29, 2025 at 8:13 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > That is true but I still feel there has to be some mechanism where we > > can catch and give an ERROR to the user, if it doesn't follow the > > same. For example, pg_replication_origin_advance() always allows going > > backwards in terms of LSN which means if one doesn't follow commit > > order, it can lead to breaking the replication as after restart the > > client can ask to start replication from some prior point. > If you have any ideas for safeguards or API changes, I'd be happy to > help implement them or discuss them. > > Can you tell us the use case? Did you also intend to use it for parallel apply, if so, can you also tell at a high > > level, how you are planning to manage origin? > Yes, we use it for parallel apply. We have a custom logical > replication system that applies changes using multiple worker > processes, each with their own database connection. > Our use case requires multiple connections to be able to advance the > same replication origin. > How do you advance the origin? Did you use pg_replication_origin_advance()? If so, you should be aware that it can be used for initial setup; see comment in that API code: "Can't sensibly pass a local commit to be flushed at checkpoint - this xact hasn't committed yet. This is why this function should be used to set up the initial replication state, but not for replay." I wonder if you are using pg_replication_origin_advance(), won't its current implementation has the potential to cause a problem for your usecase? I think the problem it can cause is it may miss a transaction to apply after restart because we can use remote_lsn without a corresponding transaction (local_lsn) flushed on the subscriber. This can happen because ideally we want the transaction that is not successfully flushed to be replayed after restart. In general, I was thinking of adding a restriction pg_replication_origin_advance() such that it gives an ERROR when a user tries to move remote_lsn backward unless requested explicitly. It would be good to know the opinion of others involved in the original change of maintaining commit order for parallel apply of large transactions. -- With Regards, Amit Kapila.
On Mon, Aug 11, 2025 at 9:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > How do you advance the origin? Did you use > pg_replication_origin_advance()? If so, you should be aware that it > can be used for initial setup; see comment in that API code... No, we don't use pg_replication_origin_advance(). We use pg_replication_origin_xact_setup() instead as I mentioned before. Each worker does the following: 1. Sets up its own replication-origin session with pg_replication_origin_session_setup() (using the master process PID). 2. Applies changes inside transactions. 3. Right before commit, calls pg_replication_origin_xact_setup(lsn, commit_timestamp). 4. Commits only if everything succeeded, so the origin only advances on a real commit. That way, the origin LSN moves forward only when the transaction is actually committed. If something fails or the process crashes, the origin stays at the last successful commit, and on restart we replay from the correct spot. It's safer than advancing the origin without knowing the transaction made it to disk. So the issue you described is not relevant for our implementation.
On Mon, Aug 11, 2025 at 10:41 PM Doruk Yilmaz <doruk@mixrank.com> wrote: > > On Mon, Aug 11, 2025 at 9:44 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > How do you advance the origin? Did you use > pg_replication_origin_advance()? If so, you should be aware that it > > can be used for initial setup; see comment in that API code... > > No, we don't use pg_replication_origin_advance(). We use > pg_replication_origin_xact_setup() instead as I mentioned before. > > Each worker does the following: > 1. Sets up its own replication-origin session with > pg_replication_origin_session_setup() (using the master process PID). > 2. Applies changes inside transactions. > 3. Right before commit, calls pg_replication_origin_xact_setup(lsn, > commit_timestamp). > 4. Commits only if everything succeeded, so the origin only advances > on a real commit. > > That way, the origin LSN moves forward only when the transaction is > actually committed. If something fails or the process crashes, the > origin stays at the last successful commit, and on restart we replay > from the correct spot. It's safer than advancing the origin without > knowing the transaction made it to disk. > Your use looks good to me. So, maybe we can update the docs with the dangers if the users of API doesn't follow commit order then it may lead to data inconsistency should be sufficient. Additionally, we may want to give an example as to how to use this API for parallel apply. Thoughts? -- With Regards, Amit Kapila.
> Your use looks good to me. So, maybe we can update the docs with the > dangers if the users of API doesn't follow commit order then it may > lead to data inconsistency should be sufficient. Additionally, we may > want to give an example as to how to use this API for parallel apply. That sounds reasonable. I’ve updated the patch and added more information to the documentation covering the topics you mentioned. I also added a Caution block so potential users won’t miss it. I hope this patch meets your expectations.
Вложения
RE: [Patch] add new parameter to pg_replication_origin_session_setup
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Doruk, > That sounds reasonable. I’ve updated the patch and added more > information to the documentation covering the topics you mentioned. > I also added a Caution block so potential users won’t miss it. I hope > this patch meets your expectations. Can you explain more why we must extend the SQL interface? I read your use case [1], and looks like that a new type of background worker is introduced in your system. If so, why doesn't the worker directly call C-lang interface replorigin_session_setup()? Personally considered, SQL functions are usable by unfamiliar users so that this change may be dangerous. It is better if developers can use C APIs instead. [1]: https://www.postgresql.org/message-id/CAMPB6wckvkKrXVPH5j8Ske2cVedkb-TRLdnOb5e74zYM1CynGw%40mail.gmail.com Best regards, Hayato Kuroda FUJITSU LIMITED
Dear Hayato, > Can you explain more why we must extend the SQL interface? In our system the workers aren't background workers and we don't ship a server-side extension; they're plain external processes (Python in our case) talking over standard database connections. In many deployments -especially managed Postgres- we can't load custom C code even if we wanted to. That's why we want to expose the existing pid knob via SQL: it lets ordinary client sessions participate in the same, already-implemented origin coordination without maintaining a fork or an extension. This patch doesn't invent a new capability, it just makes the internal behavior reachable from SQL. The new argument is optional and defaults to the current behavior, so nothing changes for existing users. It also keeps the feature usable from any language/runtime that coordinates parallel apply at the application layer. And I don't believe it is that dangerous or risky. The actual code we use in python is not that complex that I believe a person using replication already should be able to set it up. I don't understand why being able to achieve parallel replication is not accessible via SQL already. I am happy to do changes to the patch if you think there should be more guardrails. Thanks, Doruk Yılmaz
RE: [Patch] add new parameter to pg_replication_origin_session_setup
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Doruk, > In our system the workers aren't background workers and we don't ship > a server-side extension; they're plain external processes (Python in > our case) talking over standard database connections. In many > deployments -especially managed Postgres- we can't load custom C code > even if we wanted to. That's why we want to expose the existing pid > knob via SQL: it lets ordinary client sessions participate in the > same, already-implemented origin coordination without maintaining a > fork or an extension. So, your python process establishes two connections, for publisher (replication connection) and subscriber (normal connection). It receives changes from the publisher, constructs SQL statements from the received results, and sends to subscriber's backend, is it right? I'm not sure it is the common approach, but I see your point that you cannot install your extensions on managed postgres. Anyway, I still feel bit dangerous but OK if others can accept. Regarding the patch, I want to ask one point. ``` +CREATE OR REPLACE FUNCTION + pg_replication_origin_session_setup(node_name text, pid integer DEFAULT 0) +RETURNS void +LANGUAGE INTERNAL +STRICT VOLATILE +AS 'pg_replication_origin_session_setup'; ... { oid => '6006', descr => 'configure session to maintain replication progress tracking for the passed in origin', proname => 'pg_replication_origin_session_setup', provolatile => 'v', - proparallel => 'u', prorettype => 'void', proargtypes => 'text', + proparallel => 'u', prorettype => 'void', proargtypes => 'text int4', prosrc => 'pg_replication_origin_session_setup' }, ``` Is there a rule which attribute is clarified and others are not? For example, VOLATILE is specified on both side, STRICT is written only in the system_functions.sql, and PARALLEL UNSAFE is set on pg_proc.dat. Best regards, Hayato Kuroda FUJITSU LIMITED
Dear Hayato, > So, your python process establishes two connections, for publisher (replication connection) > and subscriber (normal connection). It receives changes from the publisher, > constructs SQL statements from the received results, and sends to subscriber's > backend, is it right? Actually, it's a bit simpler than that - there are no two connections. Our program reads changes from JSONL files rather than directly from a publisher connection. We have multiple Python processes, each with a single database connection to the subscriber, reading from these files and applying changes in parallel. > Is there a rule which attribute is clarified and others are not? > For example, VOLATILE is specified on both side, STRICT is written only in the > system_functions.sql, and PARALLEL UNSAFE is set on pg_proc.dat. In pg_proc.dat, I believe the STRICT, IMMUTABLE, and PARALLEL SAFE are the defaults (check out pg_proc.h). So in pg_proc.dat, the ones that are specified are the ones that aren't defaults, there is provolatile => 'v' (for VOLATILE) and proparallel => 'u' (for UNSAFE), but no prostrict since it's already true by default. In system_functions.sql, I went with being explicit about all the attributes for clarity as it is the code declaration. If you want, I can also make the pg_proc.dat explicit. Thanks, Doruk Yılmaz
On Wed, Sep 3, 2025 at 6:13 PM Doruk Yilmaz <doruk@mixrank.com> wrote: > > Dear Hayato, > > > So, your python process establishes two connections, for publisher (replication connection) > > and subscriber (normal connection). It receives changes from the publisher, > > constructs SQL statements from the received results, and sends to subscriber's > > backend, is it right? > > Actually, it's a bit simpler than that - there are no two connections. > Our program reads changes from JSONL files rather than directly from a > publisher connection. > We have multiple Python processes, each with a single database > connection to the subscriber, > reading from these files and applying changes in parallel. > > > Is there a rule which attribute is clarified and others are not? > > For example, VOLATILE is specified on both side, STRICT is written only in the > > system_functions.sql, and PARALLEL UNSAFE is set on pg_proc.dat. > > In pg_proc.dat, I believe the STRICT, IMMUTABLE, and PARALLEL SAFE are > the defaults (check out pg_proc.h). > So in pg_proc.dat, the ones that are specified are the ones that > aren't defaults, > there is provolatile => 'v' (for VOLATILE) and proparallel => 'u' (for > UNSAFE), but no prostrict since it's already true by default. > In system_functions.sql, I went with being explicit about all the > attributes for clarity as it is the code declaration. > Then why didn't you specified PARALLEL UNSAFE as well? BTW, yesterday a new thread started with the same requirement [1]. It uses a slightly different way to define the new function. do you have any opinion on it? [1] - https://www.postgresql.org/message-id/CAE2gYzyTSNvHY1%2BiWUwykaLETSuAZsCWyryokjP6rG46ZvRgQA%40mail.gmail.com -- With Regards, Amit Kapila.
> Then why didn't you specified PARALLEL UNSAFE as well? You are correct, I missed marking the function as PARALLEL UNSAFE. I’ve attached a revised patch with the correct annotation. > BTW, yesterday a new thread started with the same requirement [1]. It > uses a slightly different way to define the new function. do you have > any opinion on it? I don’t think introducing a separate function is a good idea. It’s effectively the same behavior, technical debt, and maintenance overhead without a clear benefit. Our patch keeps a single function with a default parameter, so it’s not a breaking change. So I believe our approach is preferable. But I would say that, the fact that another patch is proposing the same capability indicates there’s broader demand for this change.
Вложения
RE: [Patch] add new parameter to pg_replication_origin_session_setup
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Doruk, Thanks for updating the patch and sorry for being late. The new patch looks good to me. Best regards, Hayato Kuroda FUJITSU LIMITED
Dear Hayato, Thanks for the feedback on the patch, I'm glad the latest version looks good. I was wondering if there is anything else I need to do on my end, or any other process I should be aware of, for this patch to move forward? I'm happy to make any further adjustments or provide more information if needed. Thanks, Doruk Yılmaz
On Tue, Sep 16, 2025 at 10:07 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear Doruk, > > Thanks for updating the patch and sorry for being late. > The new patch looks good to me. > Can we think of writing a few tests for this newly exposed functionality? -- With Regards, Amit Kapila.
RE: [Patch] add new parameter to pg_replication_origin_session_setup
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit, > Can we think of writing a few tests for this newly exposed functionality? I considered a test, please see attached files. 0001 was not changed from v6 and 0002 contained tests. Here, two sessions were opened and confirmed that they can set the same origin. BTW, while testing I found the existing issue of this function. Since the session_replication_state is set before the pid check, there is a case that session origin retains in case of failure. Here is a quick reproducer: ``` postgres=# SELECT pg_replication_origin_create('origin'); pg_replication_origin_create ------------------------------ 1 (1 row) postgres=# -- run origin_session_setup with incorrect parameter postgres=# SELECT pg_replication_origin_session_setup('origin', -1); ERROR: could not find replication state slot for replication origin with OID 1 which was acquired by -1 postgres=# -- run origin_session_setup again with correct parameter postgres=# SELECT pg_replication_origin_session_setup('origin'); ERROR: cannot setup replication origin when one is already setup ``` The issue has exist since we introduces the parallel apply, but it has not been found till now. Because parallel apply workers have not specified the invalid pid. It can be more likely to happen so it's time to fix at the same time. Idea for fix is that use local replication state and then at end assign it to process-level. 0003 implemented that. How do you feel? Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
RE: [Patch] add new parameter to pg_replication_origin_session_setup
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear hackers, > I considered a test, please see attached files. 0001 was not changed from v6 and > 0002 contained tests. Here, two sessions were opened and confirmed that they > can > set the same origin. After considering and verifying more, it is more efficient to test via isolation tester. Attached patchset does like that. On my env, the duration became 10x faster because it does not start the instance within the test. In the test file, two sessions s0 and s1 are launched, they set the same session origin. They insert local_lsn to a table and confirm latter insertion has larger value. One hacky point is to obtain pid for s0 from s1. Below contains my analysis. application_name is controlled by the isolation_main.c and isolationtester.c. When the isolation test works, initially isolation_main starts and launches isolaiontester process, one per spec file. In main.c, the application_name is set to "isolation/${testname}" at the starter function. Then, after isolationtester parses the spec file, it appends given name to each session. This is done at line 193. Best regards, Hayato Kuroda FUJITSU LIMITED
Вложения
On Thu, Sep 18, 2025 at 1:07 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > Dear hackers, > > > I considered a test, please see attached files. > Few comments: 1. +step "s0_compare" { + SELECT s0.lsn < s1.lsn + FROM local_lsn_store as s0, local_lsn_store as s1 + WHERE s0.session = 0 AND s1.session = 1; +} This appears to be a bit tricky to compare the values. Doing a sequential scan won't guarantee the order of rows' appearance. Can't we somehow get the two rows ordered by session_id and compare their values? 2. + else if (candidate_state->acquired_by != acquired_by) + { + if (initialized) + candidate_state->roident = InvalidRepOriginId; + elog(ERROR, "could not find replication state slot for replication origin with OID %u which was acquired by %d", node, acquired_by); + } This doesn't appear neat. Instead, how about checking this case before setting current_state as shown in attached. If we do that, we shouldn't even need new variables like current_state and initialized. Additionally, as shown in attached, it is better to make this a user-facing error by using ereport. 3. Merge all patches as I don't see the need to do any backpatch here. -- With Regards, Amit Kapila.
Вложения
RE: [Patch] add new parameter to pg_replication_origin_session_setup
От
"Hayato Kuroda (Fujitsu)"
Дата:
Dear Amit, > > Few comments: > 1. +step "s0_compare" { > + SELECT s0.lsn < s1.lsn > + FROM local_lsn_store as s0, local_lsn_store as s1 > + WHERE s0.session = 0 AND s1.session = 1; > +} > > This appears to be a bit tricky to compare the values. Doing a > sequential scan won't guarantee the order of rows' appearance. Can't > we somehow get the two rows ordered by session_id and compare their > values? I considered another way to use the CTE for session 0. How do you feel? > 2. > + else if (candidate_state->acquired_by != acquired_by) > + { > + if (initialized) > + candidate_state->roident = InvalidRepOriginId; > + > elog(ERROR, "could not find replication state slot for replication > origin with OID %u which was acquired by %d", > node, acquired_by); > + } > > This doesn't appear neat. Instead, how about checking this case before > setting current_state as shown in attached. If we do that, we > shouldn't even need new variables like current_state and initialized. Your approach cannot work when the specified origin is not used yet after the instance starts. In this case the origin has not exist in the replication_states yet and new slot is initialized. Per current understanding, two ERRORs are needed to avoid adding new variables; first one is in the loop, and second one is in session_replication_state == NULL case. Latter one indicates the case that origin is inactive but PID is specified so different error message can be set. > Additionally, as shown in attached, it is better to make this a > user-facing error by using ereport. Indeed, elog() were replaced with ereport(). > 3. Merge all patches as I don't see the need to do any backpatch here. Sure. Attached patch includes all changes. Thought? Best regards, Hayato Kuroda FUJITSU LIMITED