BUG #15549: DDL with NOT NULL constraint and no default value canbreak logical replication
От | PG Bug reporting form |
---|---|
Тема | BUG #15549: DDL with NOT NULL constraint and no default value canbreak logical replication |
Дата | |
Msg-id | 15549-fe713b2586ce8796@postgresql.org обсуждение исходный текст |
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15549 Logged by: Mike Lissner Email address: mlissner@michaeljaylissner.com PostgreSQL version: 10.6 Operating system: Linux Description: As discussed on the mailing list[1] I recently ran into what feels like a bug in logical replication. The migration I ran had the following SQL: ALTER TABLE "search_docketentry" ADD COLUMN "recap_sequence_number" varchar(50) DEFAULT '' NOT NULL; ALTER TABLE "search_docketentry" ALTER COLUMN "recap_sequence_number" DROP DEFAULT; The documentation says: > *Logical replication is robust when schema definitions change in a live database:* When the schema is changed on the publisher and replicated data starts arriving at the subscriber but does not fit into the table schema, replication will error until the schema is updated. ...and: > In many cases, intermittent errors can be avoided by applying additive schema changes to the subscriber first. That made me think that it didn't much matter whether I applied the migration at the subscriber or the publisher first (it said things were robust, right?), so I did them at roughly the same time. The subscriber finished first. After the migration, I had the following errors: 2018-12-09 05:59:45 UTC::@:[13373]:LOG: logical replication apply worker for subscription "replicasubscription" has started 2018-12-09 05:59:45 UTC::@:[13373]:ERROR: null value in column "recap_sequence_number" violates not-null constraint 2018-12-09 05:59:45 UTC::@:[13373]:DETAIL: Failing row contains (48064261, 2018-12-07 04:48:40.388377+00, 2018-12-07 04:48:40.388402+00, null, 576, , 4571214, null, null). 2018-12-09 05:59:45 UTC::@:[6342]:LOG: worker process: logical replication worker for subscription 18390 (PID 13373) exited with exit code 1 The problem, from what we've discussed on the mailing list seems to be that: > Columns of a table are also matched by name. A different order of columns in the target table is allowed, but the column types have to match. The target table can have additional columns not provided by the published table. Those will be filled with their default values." ...and: > "If there is no default for a column, then the default is null." So what appears to have happened is that we disallowed nulls and didn't have a default value, with the result being breakage. As I discuss in the mailing list post, this was fixed by briefly allowing nulls on the subscriber, UPDATE'ing the null values in that column to be empty strings (the Django default), and then disallowing nulls again. I guess things worked more or less how the documentation described, but: 1. I'd expect a warning in the documentation about which types of additive changes cause breakage. 2. I'd love to see some documentation about the best practices for schema changes. There's none that I know of. 3. I'd expect the breakage to be "robust" as the documentation promises. I.e., when somebody adds a column at the subscriber that has these characteristics (NOT NULL and no DEFAULT), I'd expect the *replication* to break, but I'd expect that when the publisher's schema matched again, that the replication would return to full functionality. Under no circumstances do I expect the publisher to send faulty data to the subscriber that requires manual intervention to repair. There are more details on the mailing list, and I filed an issue in our bug tracker here: https://github.com/freelawproject/courtlistener/issues/919 Thank you, Mike [1]: https://www.postgresql.org/message-id/CAMp9%3DEydz258bTK-7%3DupANs%2BPff4wYymQuG%3DuNjeupCCrbpKFQ%40mail.gmail.com
В списке pgsql-bugs по дате отправления: