Обсуждение: Delays between "connection received" and "connection authenticated" because of localhost entries in hba
Delays between "connection received" and "connection authenticated" because of localhost entries in hba
От
"Daniel Westermann (DWE)"
Дата:
Delays between "connection received" and "connection authenticated" because of localhost entries in hba
Hi,
we're facing a strange issue with delays between "connection received" and "connection authenticated".
# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04)
11.4.0,64-bit
(1 ligne)
I know, this is not the latest minor version.
# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"
What we see in the log is this (around 4 seconds delay):
2024-05-07 15:29:50.244 CEST [369909] LOG: connection received: host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated: identity="xxxxxx" method=md5
(/etc/postgresql/15/main/pg_hba.conf:121)
The matching line is this ( I know md5 ):
host all xxxxx xx.xx.xx.0/24 md5
What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the
linematching our connection, something like this:
host replication xxxxx localhost md5
host all xxxxx xx.xx.xx.0/24 md5
We don't see it if we do it like this:
host all xxxxx xx.xx.xx.0/24 md5
host replication xxxxx localhost md5
Has anyone experienced such a behavior? It seems clear that this is somehow related to name resolution but we couldn't
reproducesomething like this on the OS using dig (in a loop several hundred of times). It is also only happening from
timeto time, and not constantly.
What we're basically looking for is a way to prove the assumption without involving PostgreSQL at all, if that does
makesense?
Many thanks in advance
Daniel
On 10/29/24 09:30, Daniel Westermann (DWE) wrote: > Delays between "connection received" and "connection authenticated" because of localhost entries in hba > > Hi, > > we're facing a strange issue with delays between "connection received" and "connection authenticated". > > # select version(); > version > ----------------------------------------------------------------------------------------------------------------------------------- > PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0,64-bit > (1 ligne) > > I know, this is not the latest minor version. > > # \! cat /etc/os-release | head -1 > PRETTY_NAME="Ubuntu 22.04.4 LTS" > > What we see in the log is this (around 4 seconds delay): > > 2024-05-07 15:29:50.244 CEST [369909] LOG: connection received: host=xx.xx.xx.100 port=48434 > 2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated: identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121) 1) host=xx.xx.xx.100 = localhost? 2) In postgresql.conf what is listen_addresses set to? 3) What are the settings in /etc/hosts? > > The matching line is this ( I know md5 ): > host all xxxxx xx.xx.xx.0/24 md5 > > What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the linematching our connection, something like this: > host replication xxxxx localhost md5 > host all xxxxx xx.xx.xx.0/24 md5 > > We don't see it if we do it like this: > host all xxxxx xx.xx.xx.0/24 md5 > host replication xxxxx localhost md5 > > Has anyone experienced such a behavior? It seems clear that this is somehow related to name resolution but we couldn'treproduce something like this on the OS using dig (in a loop several hundred of times). It is also only happeningfrom time to time, and not constantly. > > What we're basically looking for is a way to prove the assumption without involving PostgreSQL at all, if that does makesense? > > Many thanks in advance > Daniel > > -- Adrian Klaver adrian.klaver@aklaver.com
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
> we're facing a strange issue with delays between "connection received" and "connection authenticated".
> Has anyone experienced such a behavior? It seems clear that this is somehow related to name resolution but we
couldn'treproduce something like this on the OS using dig (in a loop several hundred of times). It is also only
happeningfrom time to time, and not constantly.
FWIW, I do have vague memories of other complaints like this, although
not of whether solutions were found. It was awhile ago though.
It might be worth searching our mailing list archives.
regards, tom lane
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba
От
"Daniel Westermann (DWE)"
Дата:
> 1) host=xx.xx.xx.100 = localhost? No, it is not > 2) In postgresql.conf what is listen_addresses set to? * > 3) What are the settings in /etc/hosts? Standard Ubuntu settings: 127.0.0.1 localhost 127.0.1.1 abc cde # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters Regards Daniel
On 10/29/24 10:42 AM, Daniel Westermann (DWE) wrote: >> 1) host=xx.xx.xx.100 = localhost? > > No, it is not Is the localhost machine on the same network? Is the client connection string using IPV4 address or host name? > >> 2) In postgresql.conf what is listen_addresses set to? > > * > >> 3) What are the settings in /etc/hosts? > > Standard Ubuntu settings: > > 127.0.0.1 localhost > 127.0.1.1 abc cde > > # The following lines are desirable for IPv6 capable hosts > ::1 localhost ip6-localhost ip6-loopback > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters Contents of resolv.conf? Would it be possible to crank the logging detail up? > > Regards > Daniel -- Adrian Klaver adrian.klaver@aklaver.com
missed the mailing list.
On 10/29/24 11:20 AM, Vijaykumar Jain wrote: > > missed the mailing list. Can you provide a more complete explanation for above? -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 29 Oct 2024 at 23:43, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
--On Tue, 29 Oct 2024 at 22:00, Daniel Westermann (DWE) <daniel.westermann@dbi-services.com> wrote:Delays between "connection received" and "connection authenticated" because of localhost entries in hba
Hi,
we're facing a strange issue with delays between "connection received" and "connection authenticated".
# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 ligne)
I know, this is not the latest minor version.
# \! cat /etc/os-release | head -1
PRETTY_NAME="Ubuntu 22.04.4 LTS"
What we see in the log is this (around 4 seconds delay):
2024-05-07 15:29:50.244 CEST [369909] LOG: connection received: host=xx.xx.xx.100 port=48434
2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated: identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121)Just before we get into os and name resolution stuff..1) if the database usage is really heavy with schema churn and or heavy temp usage. can the catalog bloat result in slowless of pg_auth* objects resulting in slow authentication?a general check used to be checking the size of the catalog objects and if needed vacuum full in single user mode .... to speed up auth lookup.i tried with pg latest on ubuntu 22, but did not see any issues.So to simulate a similar issue, i understand once connected, it will look up pg_auth* objects cascading to other catalogs for resolution ...in one session:PGPASSWORD=1234 psql -h localhost -p 5432 -d postgres -U postgres -c 'select 1'in another sessionpostgres@ubuntu:~$ cat /tmp/db1/postgresql.auto.conf# Do not edit this file manually!# It will be overwritten by the ALTER SYSTEM command.trace_locks = 'on'trace_lock_table = 1260postgres=# select 'pg_authid'::regclass::oid;oid------1260(1 row)postgres=*# lock table pg_authid in access exclusive mode;LOCK TABLEpostgres=*# rollback;ROLLBACKi tried to trace which locks are held when a connection is established after auth.for example pg_authid , so i enable tracing on this object, and to mock the delay i try to lock the table in access exclusive mode ...note, this may not be the scenario, but i am trying to rule out pg related delay due to bloat or lockingpostgres@ubuntu:/tmp$ time PGPASSWORD=1234 psql -h localhost -p 5432 -d postgres -U postgres -c 'select 1'?column?----------1(1 row)real 0m7.503s - -delayuser 0m0.001ssys 0m0.005spostgres@ubuntu:/tmp$ #after rollback of pg_authidfrom the logs2024-10-29 18:04:46.411 UTC [localhost(33424)] [62661] LOG: connection received: host=localhost port=334242024-10-29 18:04:53.906 UTC [localhost(33424)] [62661] LOG: connection authenticated: identity="postgres" method=md5 (/tmp/db1/pg_hba.conf:120)what i tried here was, delay auth but after connection established till lock held, which can be due to bloated catalog objects or long locks due to temp objects churn?still it does not explain how the order of pg_hba deals with the delay ?anyways, fyi to rule out :)
sorry, sharing again.
On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:
Delays between "connection received" and "connection authenticated" because of localhost entries in hba
Maybe check that "files" is the first mechanism for hosts lookups in nsswitch.conf.
afaik dig doesn't follow the same name resolution path as glibc.
On Wed, 30 Oct 2024 at 00:04, Alan Hodgson <ahodgson@lists.simkin.ca> wrote:
On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:Delays between "connection received" and "connection authenticated" because of localhost entries in hbaMaybe check that "files" is the first mechanism for hosts lookups in nsswitch.conf.afaik dig doesn't follow the same name resolution path as glibc.
a connection received should mean resolution was done, right ? am i missing something?
I could try adding delay with traffic control, but then if a connection established is logged in the log, then name resolution is ok ?
or the syscall to md5 is slow , does it work faster for trust ? just saying :)
On Wed, 2024-10-30 at 00:15 +0530, Vijaykumar Jain wrote:
On Wed, 30 Oct 2024 at 00:04, Alan Hodgson <ahodgson@lists.simkin.ca> wrote:On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote:Delays between "connection received" and "connection authenticated" because of localhost entries in hbaMaybe check that "files" is the first mechanism for hosts lookups in nsswitch.conf.afaik dig doesn't follow the same name resolution path as glibc.a connection received should mean resolution was done, right ? am i missing something?
Well, I'm just throwing out something to check, but I meant on the server, not the client. It's trying to resolve "localhost" to check the auth. If there's a delay, that's probably where it is.
I guess I just assumed that's where you were testing dig.
On Wed, 30 Oct 2024 at 00:40, Alan Hodgson <ahodgson@lists.simkin.ca> wrote:
Well, I'm just throwing out something to check, but I meant on the server, not the client. It's trying to resolve "localhost" to check the auth. If there's a delay, that's probably where it is.I guess I just assumed that's where you were testing dig.
Yep. my mistake. i did not read it correctly. in order to discard the rule, it first has to find out by expanding the hosts in the rule, so yep, dns resolution might be the problem.
also, the 4-5s delay is eerie similar to issues in the past.
maybe disabling ipv6 completely resolves this for the time being.
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba
От
"Daniel Westermann (DWE)"
Дата:
>Maybe check that "files" is the first mechanism for hosts lookups in nsswitch.conf.
It is
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba
От
"Daniel Westermann (DWE)"
Дата:
>>Maybe check that "files" is the first mechanism for hosts lookups in nsswitch.conf. >afaik dig doesn't follow the same name resolution path as glibc. I don't know. Is there a way to test the glibc way easily? > a connection received should mean resolution was done, right ? am i missing something? >Well, I'm just throwing out something to check, but I meant on the server, not the client. It's trying to resolve >"localhost"to check the auth. If there's a delay, that's probably where it is. >I guess I just assumed that's where you were testing dig. Yes Regards Daniel
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba
От
"Daniel Westermann (DWE)"
Дата:
>It might be worth searching our mailing list archives. I did that, but was not able to find anything useful related to this. Regards Daniel
On Tue, 2024-10-29 at 16:30 +0000, Daniel Westermann (DWE) wrote: > we're facing a strange issue with delays between "connection received" and "connection authenticated". > > # select version(); > version > ----------------------------------------------------------------------------------------------------------------------------------- > PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0,64-bit > (1 ligne) > > I know, this is not the latest minor version. > > # \! cat /etc/os-release | head -1 > PRETTY_NAME="Ubuntu 22.04.4 LTS" > > What we see in the log is this (around 4 seconds delay): > > 2024-05-07 15:29:50.244 CEST [369909] LOG: connection received: host=xx.xx.xx.100 port=48434 > 2024-05-07 15:29:54.518 CEST [369909] LOG: connection authenticated: identity="xxxxxx" method=md5 (/etc/postgresql/15/main/pg_hba.conf:121) > > The matching line is this ( I know md5 ): > host all xxxxx xx.xx.xx.0/24 md5 > > What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the linematching our connection, something like this: > host replication xxxxx localhost md5 > host all xxxxx xx.xx.xx.0/24 md5 I'd "strace" such a database connection to see where the time is spent. Yours, Laurenz Albe
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba
От
"Daniel Westermann (DWE)"
Дата:
>> What we've found out so far is, that this only happens if we have a localhost(or any other hostname) line before the linematching our connection, something like this: >> host replication xxxxx localhost md5 >> host all xxxxx xx.xx.xx.0/24 md5 ... which is quite hard to do if you don't know which one it will be Regards Daniel
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba
От
Greg Sabino Mullane
Дата:
I'd echo the suggestion to strace this. You can use the pre_auth_delay setting to help facilitate that. See:
Cheers,
Greg
Greg Sabino Mullane <htamfids@gmail.com> writes:
> I'd echo the suggestion to strace this. You can use the pre_auth_delay
> setting to help facilitate that. See:
IIUC, the delays are rare and unpredictable, so that strace'ing seems
unlikely to be practical.
If rebuilding from source is feasible, you could insert monitoring
elog(LOG) calls around the name lookup (that is, around the
pg_getnameinfo_all call in src/backend/libpq/hba.c). This would
confirm or disprove the theory that the name lookup is the source
of the issue, which is about as much as we'd get out of strace.
regards, tom lane
Re: Delays between "connection received" and "connection authenticated" because of localhost entries in hba
От
"Daniel Westermann (DWE)"
Дата:
>>Greg Sabino Mullane <htamfids@gmail.com> writes: >> I'd echo the suggestion to strace this. You can use the pre_auth_delay >> setting to help facilitate that. See: >IIUC, the delays are rare and unpredictable, so that strace'ing seems >unlikely to be practical. >If rebuilding from source is feasible, you could insert monitoring >elog(LOG) calls around the name lookup (that is, around the >pg_getnameinfo_all call in src/backend/libpq/hba.c). This would >confirm or disprove the theory that the name lookup is the source >of the issue, which is about as much as we'd get out of strace. Thanks for all your tips/hints Regards Daniel