RE: speed up a logical replica setup

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: speed up a logical replica setup
Дата
Msg-id TYCPR01MB12077756323B79042F29DDAEDF54C2@TYCPR01MB12077.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: speed up a logical replica setup  ("Euler Taveira" <euler@eulerto.com>)
Ответы RE: speed up a logical replica setup  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Re: speed up a logical replica setup  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
Dear Euler,

Here are comments for v21.

01. main
```
    /* rudimentary check for a data directory. */
...
    /* subscriber PID file. */
```

Initial char must be upper, and period is not needed.

02. check_data_directory
```
    snprintf(versionfile, MAXPGPATH, "%s/PG_VERSION", datadir);
```

You removed the version checking from PG_VERSION, but I think it is still needed.
Indeed v21 can detect the pg_ctl/pg_resetwal/pg_createsubscriber has different
verson, but this cannot ditect the started instance has the differnet version.
I.e., v20-0010 is partially needed.

03. store_pub_sub_info()
```
    SimpleStringListCell *cell;
```

This definition can be in loop variable.

04. get_standby_sysid()
```
    pfree(cf);
```

This can be pg_pfree().

05. check_subscriber
```
    /* The target server must be a standby */
    if (server_is_in_recovery(conn) == 0)
    {
        pg_log_error("The target server is not a standby");
        return false;
    }
```

What if the the function returns -1? Should we ditect (maybe the disconnection) here?

06. server_is_in_recovery
```
    ret = strcmp("t", PQgetvalue(res, 0, 0));

    PQclear(res);

    if (ret == 0)
        return 1;
    else if (ret > 0)
        return 0;
    else
        return -1;                /* should not happen */
```

But strcmp may return a negative value, right? Based on the comment atop function,
we should not do it. I think we can use ternary operator instead.

07. server_is_in_recovery

As the fisrt place, no one consider this returns -1. So can we change the bool
function and raise pg_fatal() in case of the error?

08. check_subscriber
```
    if (strcmp(PQgetvalue(res, 0, 1), "t") != 0)
    {
        pg_log_error("permission denied for function \"%s\"",
                     "pg_catalog.pg_replication_origin_advance(text, pg_lsn)");
        return false;
    }
```

I think the third argument must be 2.

09. check_subscriber
```
    pg_log_debug("subscriber: primary_slot_name: %s", primary_slot_name);
```

The output seems strange if the primary_slot_name is not set.

10. setup_publisher()
```
    PGconn       *conn;
    PGresult   *res;
```

Definitions can be in the loop.

11. create_publication()
```
    if (PQntuples(res) == 1)
    {
        /*
         * If publication name already exists and puballtables is true, let's
         * use it. A previous run of pg_createsubscriber must have created
         * this publication. Bail out.
         */
```

Hmm, but pre-existing publications may not send INSERT/UPDATE/DELETE/TRUNCATE.
They should be checked if we really want to reuse.
(I think it is OK to just raise ERROR)

12. create_publication()

Based on above, we do not have to check before creating publicatios. The publisher
can detect the duplication. I prefer it.

13. create_logical_replication_slot()
```
        if (PQresultStatus(res) != PGRES_TUPLES_OK)
        {
            pg_log_error("could not create replication slot \"%s\" on database \"%s\": %s",
                         slot_name, dbinfo->dbname,
                         PQresultErrorMessage(res));
            return lsn;
        }
```

I know lsn is always NULL, but can we use `return NULL`?

14. setup_subscriber()
```
    PGconn       *conn;

```

This definition can be in the loop.


15.

You said in case of failure, cleanups is not needed if the process exits soon [1].
But some functions call PQfinish() then exit(1) or pg_fatal(). Should we follow?

16.

Some places refer PGresult or PGConn even after the cleanup. They must be fixed.
```
        PQclear(res);
        disconnect_database(conn);
        pg_fatal("could not get system identifier: %s",
                 PQresultErrorMessage(res));
```

I think this is a root cause why sometimes the wrong error message has output.


17.

Some places call PQerrorMessage() and other places call PQresultErrorMessage().
I think it PQerrorMessage() should be used only after the connection establishment
functions. Thought?

18. 041_pg_createsubscriber_standby.pl
```
use warnings;
```

We must set "FATAL = all";

19.
```
my $node_p;
my $node_f;
my $node_s;
my $node_c;
my $result;
my $slotname;
```

I could not find forward declarations in perl file.
The node name might be bit a consuging, but I could not find better name.

20.
```
# On node P
# - create databases
# - create test tables
# - insert a row
# - create a physical replication slot
$node_p->safe_psql(
    'postgres', q(
    CREATE DATABASE pg1;
    CREATE DATABASE pg2;
));
$node_p->safe_psql('pg1', 'CREATE TABLE tbl1 (a text)');
$node_p->safe_psql('pg1', "INSERT INTO tbl1 VALUES('first row')");
$node_p->safe_psql('pg2', 'CREATE TABLE tbl2 (a text)');
my $slotname = 'physical_slot';
$node_p->safe_psql('pg2',
    "SELECT pg_create_physical_replication_slot('$slotname')");
```

I think setting of the same node can be gathered into one place.
Also, any settings and definitions should be done just before they are used.

21.
```
$node_s->append_conf(
    'postgresql.conf', qq[
log_min_messages = debug2
primary_slot_name = '$slotname'
]);
```

I could not find a reason why we set to debug2.

22.
```
command_fails
```

command_checks_all() can check returned value and outputs.
Should we use it?

23.

Can you add headers for each testcases? E.g., 

```
# ------------------------------
# Check pg_createsubscriber fails when the target server is not a
# standby of the source.
...
# ------------------------------
# Check pg_createsubscriber fails when the target server is not running
...
# ------------------------------
# Check pg_createsubscriber fails when the target server is a member of
# the cascading standby.
...
# ------------------------------
# Check successful dry-run 
...
# ------------------------------
# Check successful conversion
```

24.
```
# Stop node C
$node_c->teardown_node;
...
$node_p->stop;
$node_s->stop;
$node_f->stop;
```
Why you choose the teardown?

25.

The creation of subscriptions are not directory tested. @subnames contains the
name of subscriptions, but it just assumes the number of them is more than two.

Since it may be useful, I will post top-up patch on Monday, if there are no updating.

[1]: https://www.postgresql.org/message-id/89ccf72b-59f9-4317-b9fd-1e6d20a0c3b1%40app.fastmail.com
[2]:
https://www.postgresql.org/message-id/TYCPR01MB1207713BEC5C379A05D65E342F54B2%40TYCPR01MB12077.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: A proposal to provide a timeout option for CREATE_REPLICATION_SLOT/pg_create_logical_replication_slot
Следующее
От: jian he
Дата:
Сообщение: Re: POC, WIP: OR-clause support for indexes