Обсуждение: delimiter inconsistency in generate-wait_event_types.pl

Поиск
Список
Период
Сортировка

delimiter inconsistency in generate-wait_event_types.pl

От
Kyotaro Horiguchi
Дата:
Hello,

I got bitten by an inconsistency introduced about two years ago. In
the script generate-wait_event_types.pl, the intermediate line format
is parsed using a regular expression that allows multiple tab
characters between fields. However, the fields were later extracted
using split(/\t/, ...), which assumes single-tab delimiters and fails
when fields are separated by multiple tabs. This leads to a somewhat
unclear error when processing input that should otherwise be valid
(*1):

> substr outside of string at ./generate-wait_event_types.pl line 243,
>  <$wait_event_names> line 434.

Since the data was already captured via regex, using $1, $2 and $3
instead of split() avoids the inconsistency and makes the intent
clearer. A related adjustment was made elsewhere in the script to
improve consistency.

This is addressed in the attached patch.

regards.


*1:
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0be307d2ca0..ba551938ed7 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -405,7 +405,7 @@ SerialSLRU    "Waiting to access the serializable transaction conflict SLRU cache."
 SubtransSLRU    "Waiting to access the sub-transaction SLRU cache."
 XactSLRU    "Waiting to access the transaction status SLRU cache."
 ParallelVacuumDSA    "Waiting for parallel vacuum dynamic shared memory allocation."
-AioUringCompletion    "Waiting for another process to complete IO via io_uring."
+AioUringCompletion        "Waiting for another process to complete IO via io_uring."
 
 # No "ABI_compatibility" region here as WaitEventLWLock has its own C code.


Вложения

Re: delimiter inconsistency in generate-wait_event_types.pl

От
Daniel Gustafsson
Дата:
> On 29 Jul 2025, at 06:56, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

> I got bitten by an inconsistency introduced about two years ago. In
> the script generate-wait_event_types.pl, the intermediate line format
> is parsed using a regular expression that allows multiple tab
> characters between fields. However, the fields were later extracted
> using split(/\t/, ...), which assumes single-tab delimiters and fails
> when fields are separated by multiple tabs. This leads to a somewhat
> unclear error when processing input that should otherwise be valid

Nothing in the documentation for this explicitly states that multiple tab
characters are supported so the alternative patch could be to remove support
for \t+.  That being said, such a restriction seems artificial and I prefer
your approach.

> Since the data was already captured via regex, using $1, $2 and $3
> instead of split() avoids the inconsistency and makes the intent
> clearer. A related adjustment was made elsewhere in the script to
> improve consistency.

+1, using the capture groups is clearly more readable.

While looking at this I noticed that the --docs option is incorrectly refered
to as --sgml in the usage output, which is fixed in 0002.

--
Daniel Gustafsson


Вложения

Re: delimiter inconsistency in generate-wait_event_types.pl

От
Daniel Gustafsson
Дата:
> On 29 Jul 2025, at 10:08, Daniel Gustafsson <daniel@yesql.se> wrote:

> While looking at this I noticed that the --docs option is incorrectly refered
> to as --sgml in the usage output, which is fixed in 0002.

I was helpfully reminded about this thread and after taking another look at it
I went ahead and pushed it.

--
Daniel Gustafsson