Re: [PATCH] Exponential backoff for auth_delay

Поиск
Список
Период
Сортировка
От Michael Banck
Тема Re: [PATCH] Exponential backoff for auth_delay
Дата
Msg-id 65e62440.170a0220.f6a7c.6f7f@mx.google.com
обсуждение исходный текст
Ответ на Re: [PATCH] Exponential backoff for auth_delay  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: [PATCH] Exponential backoff for auth_delay  (Robert Haas <robertmhaas@gmail.com>)
Re: [PATCH] Exponential backoff for auth_delay  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
Hi,

On Wed, Feb 21, 2024 at 10:26:11PM +0100, Tomas Vondra wrote:
> Thanks for the patch. I took a closer look at v3, so let me share some
> review comments. Please push back if you happen to disagree with some of
> it, some of this is going to be more a matter of personal preference.

Thanks! As my patch was based on a previous patch, some of decisions
were carry-overs I am not overly attached to.
 
> 1) I think it's a bit weird to have two options specifying amount of
> time, but one is in seconds and one in milliseconds. Won't that be
> unnecessarily confusing? Could we do both in milliseconds?

Alright, I changed that.
 
See below for a discussion about the GUCs in general.
 
> 2) The C code defines the GUC as auth_delay.exponential_backoff, but the
> SGML docs say <varname>auth_delay.exp_backoff</varname>.

Right, an oversight from the last version where the GUC name got changed
but I forgot to change the documentation, fixed.
 
> 3) Do we actually need the exponential_backoff GUC? Wouldn't it be
> sufficient to have the maximum value, and if it's -1 treat that as no
> backoff?
 
That is a good question, I guess that makes sense.

The next question then is: what should the default for (now)
auth_delay.max_milliseconds be in this case, -1? Or do we say that as
the default for auth_delay.milliseconds is 0 anyway, why would somebody
not want exponential backoff when they switch it on and keep it at the
current 10s/10000ms)?

I have not changed that for now, pending further input.
 
> 4) I think the SGML docs should more clearly explain that the delay is
> initialized to auth_delay.milliseconds, and reset to this value after
> successful authentication. The wording kinda implies that, but it's not
> quite clear I think.

Ok, I added some text to that end. I also added a not that
auth_delay.max_milliseconds will mean that the delay doubling never
stops.
 
> 4) I've been really confused by the change from
> 
>      if (status != STATUS_OK)
>    to
>      if (status == STATUS_ERROR)
> 
> in auth_delay_checks, until I realized those two codes are not the only
> ones, and we intentionally ignore STATUS_EOF. I think it'd be good to
> mention that in a comment, it's not quite obvious (I only realized it
> because the e-mail mentioned it).

Yeah I agree, I tried to explain that now.
 
> 5) I kinda like the custom that functions in a contrib module start with
> a clear common prefix, say auth_delay_ in this case. Matter of personal
> preference, ofc.

Ok, I changed the functions to have an auth_delay_ prefix throughout..
 
> 6) I'm a bit skeptical about some acr_array details. Firstly, why would
> 50 entries be enough? Seems like a pretty low and arbitrary number ...
> Also, what happens if someone attempts to authenticate, fails to do so,
> and then disconnects and never tries again? Or just changes IP? Seems
> like the entry will remain in the array forever, no?

Yeah, that is how v3 of this patch worked. I have changed that now, see
below.

> Seems like a way to cause a "limited" DoS - do auth failure from 50
> different hosts, to fill the table, and now everyone has to wait the
> maximum amount of time (if they fail to authenticate).

Right, though the problem would only exist on authentication failures,
so it is really rather limited.
 
> I think it'd be good to consider:
> 
> - Making the acr_array a hash table, and larger than 50 entries (I
> wonder if that should be dynamic / customizable by GUC?).

I would say a GUC should be overkill for this as this would mostly be an
implementation detail.

More generally, I think now that entries are expired (see below), this
should be less of a concern, so I have not changed this to a hash table
for now but doubled MAX_CONN_RECORDS to 100 entries.
 
> - Make sure the entries are eventually expired, based on time (for
> example after 2*max_delay?).

I went with 5*max_milliseconds - the AuthConnRecord struct now has a
last_failed_auth timestamp member; if we increase the delay for a host,
we check if any other host expired in the meantime and remove it if so.
 
> - It would be a good idea to log something when we get into the "full
> table" and start delaying everyone by max_delay_seconds. (How would
> anyone know that's what's happening, seems rather confusing.)

Right, I added a log line for that. However, I made it LOG instead of
WARNING as I don't think the client would ever see it, would he?

Attached is v4 with the above changes.


Cheers,

Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: incremental backup mishandles XLOG_DBASE_CREATE_FILE_COPY
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Add system identifier to backup manifest