RE: How can end users know the cause of LR slot sync delays?

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: How can end users know the cause of LR slot sync delays?
Дата
Msg-id OSCPR01MB149667826BC79248A20999D1BF51AA@OSCPR01MB14966.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: How can end users know the cause of LR slot sync delays?  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
Ответы RE: How can end users know the cause of LR slot sync delays?
Список pgsql-hackers
Dear Shlok,

Thanks for updating the patch. Here are my comments.

01.
```
+       /* Update the slot sync reason */
+       SpinLockAcquire(&slot->mutex);
+       if (slot->slot_sync_skip_reason != skip_reason)
+               slot->slot_sync_skip_reason = skip_reason;
+       SpinLockRelease(&slot->mutex);
```

Per my understanding, spinlock is acquired when the attribute on the shared memory
is updating. Can you check other parts and follow the rukle?

02.
```
+                       SpinLockAcquire(&slot->mutex);
+                       synced = slot->data.synced;
+                       SpinLockRelease(&slot->mutex);
```

Same as 1.

03.
```
```


04.
```
+#ifdef USE_INJECTION_POINTS
+       INJECTION_POINT("slot-sync-skip", NULL);
+#endif
```

No need do do #ifdef here. INJECTION_POINT() itself checks internally.

05.
```
+# Initialize the primary cluster
+my $primary = PostgreSQL::Test::Cluster->new('publisher');
+$primary->init(
+       allows_streaming => 'logical',
+       auth_extra => [ '--create-role' => 'repl_role' ]);
```

Do we have to create repl_role? I'm not sure where it's used.

06.
```
+$primary->append_conf(
+       'postgresql.conf', qq{
+autovacuum = off
+max_prepared_transactions = 1
+});
```

Do we have to set max_prepared_transactions? PREPARE command is not used.

07.
```
+# Load the injection_points extension
+$primary->safe_psql('postgres', q(CREATE EXTENSION injection_points))
```

We must check whether the injection_points is available or not. See 047_checkpoint_physical_slot.pl.

08.
```
+# Initialize standby from primary backup
+my $standby1 = PostgreSQL::Test::Cluster->new('standby1');
+$standby1->init_from_backup(
+       $primary, $backup_name,
+       has_streaming => 1,
+       has_restoring => 1);
```

To clarify, is there a reason why we set has_restoring? Can we remove it?

09.
```
+my $connstr_1 = $primary->connstr;
```

Since this is an only connection string in the test, suffix _1 is not needed.

10.
```
+# Simulate standby connection failure by modifying pg_hba.conf
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf('pg_hba.conf',
+       qq{local   all             all                                     trust}
+);
```

What if the system does not have Unix domain socket? I'm afraid all connections
could be brocked in this case.

11.
```
+# Attempt to sync replication slots while standby is behind
+$standby1->psql('postgres', "SELECT pg_sync_replication_slots();");
```

When the command can be failed, can you set a error message from the command to
a variable? Otherwise an ERROR is output to the result file, which is suprising.

```
psql:<stdin>:1: ERROR:  skipping slot synchronization because the received slot sync LSN 0/03019E58 for slot
"slot_sync"is ahead of the standby position 0/030000D8
 
[17:23:21.495](0.384s) ok 2 - slot sync skip when standby is behind
```

12.
```
+# Restore connectivity between primary and standby
+unlink($primary->data_dir . '/pg_hba.conf');
+$primary->append_conf(
+       'pg_hba.conf',
+       qq{
+local   all             all                                     trust
+local   replication     all                                     trust
+});
```

Same as 10. Also, no need to do unlink() here.

13.
```
+# Create a new logical slot for testing injection point
+$primary->safe_psql('postgres',
+       "SELECT pg_create_logical_replication_slot('slot_sync', 'test_decoding', false, false, true)"
+);
```

Before here can you add a description what you would test?

14.
```
# Create a physical replication slot on primary for standby
$primary->psql('postgres',
    q{SELECT pg_create_physical_replication_slot('sb1_slot');});
```
Use safe_psql instead of psql().

15.
```
-       if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
+       if (slot || (slot = SearchNamedReplicationSlot(remote_slot->name, true)))
```

Is there a possibility that slot is not NULL? It is used when confirmed_flush exceeds
the latestFlushPtr, but in this case the function cannot reach here.


Best regards,
Hayato Kuroda
FUJITSU LIMITED


В списке pgsql-hackers по дате отправления: