Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect
Дата
Msg-id 2943611.1602375376@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pgsql: postgres_fdw: reestablish new connection if cached one is detect  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Список pgsql-committers
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>>> Therefore, the easy fix is to make libpq mark the connection as
>>> CONNECTION_BAD even in ECONNABORTED, like we do in ECONNRESET.

So in the wake of commit fe27009cb, this is what lorikeet is doing:

@@ -9028,9 +9028,7 @@
 CALL terminate_backend_and_wait('fdw_retry_check');
 SAVEPOINT s;
 SELECT 1 FROM ft1 LIMIT 1;    -- should fail
-ERROR:  server closed the connection unexpectedly
-    This probably means the server terminated abnormally
-    before or while processing the request.
+ERROR:  could not receive data from server: Software caused connection abort
 CONTEXT:  remote SQL command: SAVEPOINT s2
 COMMIT;
 -- Clean up

which is better --- the connection recovery is working --- but
obviously still not a "pass".

The only way to make this test pass as-is is to hack libpq so that
it deems ECONNABORTED to be a server crash, which frankly I do not
think is a good change.  I don't see any principled reason to think
that it means that rather than a network connectivity failure.

What I've done instead as a stopgap is to adjust the test case to be
insensitive to the exact error message text.  Maybe there's a better
way, but I can't think of anything besides having two (or more?)
expected-output files.  That would be quite unpalatable as things
stand, though perhaps we could make it tolerable by splitting this
test off into a second test script.

Meanwhile, now that I've looked at commit 32a9c0bdf, I'm not very
happy with it:

* The control flow seems rather forced.  I think it was designed
on the assumption that reindenting the existing code is forbidden.
Why not use an actual loop, instead of a goto?  I also think that
it's far from obvious that the loop isn't an infinite loop; it
took me quite a while to glom onto the fact that the test inside the
PG_CATCH avoids that by checking retry_conn.  Maybe a comment would
be enough to improve that, but I feel the control logic could stand
a rethink.

* The CATCH case is completely failing to clean up after itself.
At the minimum, there has to be a FlushErrorState() there.
I don't think it'd be terribly hard to drive this code to an
error-stack-overflow PANIC.

* As is so often true of proposed patches in which PG_CATCH is
thought to be an adequate way to catch an error, this is just
unbelievably fragile.  It will break, and badly so, if it catches
an error that is anything other than what it is expecting ...
and it's not even particularly trying to verify that the error is
what it's expecting.  It might be okay, or at least a little bit
harder to break, if it verified that the error's SQLSTATE is
ERRCODE_CONNECTION_FAILURE.

            regards, tom lane



В списке pgsql-committers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: pgsql: Band-aid new postgres_fdw test case to remove error text depende
Следующее
От: Michael Paquier
Дата:
Сообщение: pgsql: Use perfect hash for NFC and NFKC Unicode Normalization quick ch