Re: Patch: Implement failover on libpq connect level.
От | Korry Douglas |
---|---|
Тема | Re: Patch: Implement failover on libpq connect level. |
Дата | |
Msg-id | 5665F649.1040206@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Patch: Implement failover on libpq connect level. (Korry Douglas <korry.douglas@enterprisedb.com>) |
Список | pgsql-hackers |
> >> I've tried to deal with some of these problems. >> >> My patch have support for following things: >> >> 1. Check whether database instance is in the recovery/standby mode and >> try to find another one if so. >> 2. Let cluster management software to have some time to promote one of >> the standbys to master. I.e. there can be failover timeout specified to >> allow retry after some time if no working master found. >> >> Really there is room for some improvements in handling of connect >> timeout (which became much more important thing when ability to try >> next host appears). Now it is handled only by blocking-mode connect >> functions, not by async state machine. But I decided to publish patch >> without these improvements to get feedback from community. > A bit of testing on this turns up a problem. > > Consider a connection string that specifies two hosts and a read/write > connection: > > postgresql://korry@127.0.0.1:5301,127.0.0.1:5300/edb?readonly=0 > > If the first host is a healthy standby (meaning that I can connect to > it but pg_is_in_recovery() returns 't'), the state machine will never > move on to the second host. > > The problem seems to be in PQconnectPoll() in the case for > CONNECTION_AUTH_OK, specifically this code: > > /* We can release the address list now. */ > pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist); > conn->addrlist = NULL; > conn->addr_cur = NULL; > > That frees up the list of alternative host addresses. The state > machine then progresses to CONNECTION_CHECK_RO (which invokes > pg_is_in_recovery()), then CONNECTION_CHECK_RW (waiting for the > response from the server). Since we just connected to a standby > replica, pg_is_in_recovery() returns 't' and the state changes to > CONNECTION_NEEDED. The next call to try_next_address() will fail to > find a next address because we freed the list in the case for > CONNECTION_AUTH_OK. > > A related issue: when the above sequence occurs, no error message is > returned (because the case for CONNECTION_NEEDED thinks "an > appropriate error message is already set up"). > > In short, if you successfully connect to standby replica (and specify > readonly=0), the remaining hosts are ignored, even though one of those > hosts is a master. A follow-up - the conn->addrlist is also freed when the case for CONNECTION_CHECK_RW decides that conn->status != CONNECTION_OK and calls closePGConn(). -- Korry
В списке pgsql-hackers по дате отправления: