[HACKERS] Logical replication existing data copy
От | Petr Jelinek |
---|---|
Тема | [HACKERS] Logical replication existing data copy |
Дата | |
Msg-id | 3caaecb9-6c1e-7824-4004-97b89b449096@2ndquadrant.com обсуждение исходный текст |
Ответы |
Re: [HACKERS] Logical replication existing data copy
Re: [HACKERS] Logical replication existing data copy |
Список | pgsql-hackers |
Hi, as the logical replication patch is progressing closer to integration it seems to be good time to revisit what to do about preexisting data. This patch implements data synchronization for the logical replication. It works both for initial setup of subscription as well as for tables added later to replication when the subscription is already active. There is new catalog pg_subscription_rel which track synchronization/replication state of each table that is subscribed via publications. From the user perspective it adds some additional options/clauses to CREATE/ALTER SUBSCRIPTION namely: - CREATE SUBSCRIPTION ... [ WITH (COPY DATA | NOCOPY DATA) ] where user can specify if initial data should be copied or not, with the default COPY. - ALTER SUBSCRIPTION ... REFRESH PUBLICATION [ WITH (COPY DATA | NOCOPY DATA) ] which updates the local info about tables in the subscribed publication again with option to either copy or not copy data. - Also this WITH (COPY DATA | NOCOPY DATA) ] was added to ALTER SUBSCRIPTION ... SET PUBLICATION which automatically triggers the REFRESH PUBLICATION as well. So existing table data can be copied once subscription is created, but also new tables can be added and their data copied. This is where the REFRESH PUBLICATION comes into play. Adding table to publication does not make it automatically replicated by the subscription as the subscription does not have tracking info for that table. So to add new table user must call ALTER SUBSCRIPTION ... REFRESH PUBLICATION on subscriber otherwise the data won't be replicated. The copy proccess runs in parallel with normal replication and is visible in the pg_stat_subscription view. There is also new GUC max_subscription_sync_workers which specifies how many tables can be synchronized in parallel. Implementation-wise this adds modified apply process called sync (tablesync.c) which at the beginning creates temporary slot, copies data from it, then synces to the correct LSN with the main apply and exits (this is actually quite complex but it's explained in the tablesync.c header). Standard COPY TO/FROM is used. This is done by enhancing COPY FROM to accept data from callback function which the tablesync implements. On the publisher side, the walsender is enhanced to accept standard SQL (simple query protocol only) so that COPY can be run. This also helps getting info about table using plain SELECT. The way the SQL works is that the parser was changed to accept strings that are not valid replication protocol commands and if such string comes it's sent to exec_simple_query instead of walsender. The attached patch applies on top of logical replication patch-set v14. I will add this as separate entry to the CF. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: