Обсуждение: Permute underscore separated components of columns before fuzzy matching

Поиск
Список
Период
Сортировка

Permute underscore separated components of columns before fuzzy matching

От
Arne Roland
Дата:
Hello,

we have the great fuzzy string match, that comes up with suggestions in the case of a typo of a column name.

Since underscores are the de facto standard of separating words, it would also make sense to also generate suggestions, if the order of words gets mixed up. Example: If the user types timstamp_entry instead of entry_timestamp the suggestion shows up.

The attached patch does that for up to three segments, that are separated by underscores. The permutation of two segments is treated the same way a wrongly typed char would be.

The permutation is skipped, if the typed column name contains more than 6 underscores to prevent a meaningful (measured on my development machine) slowdown, if the user types to many underscores. In terms of underscores m and the length of the individual strings n_att and n_col the trivial upper bound is O(n_att * n_col * m^2). Considering, that strings with a lot of underscores have a bigger likelihood of being long as well, I simply decided to add it. I still wonder a bit whether it should be disabled entirely (as this patch does) or only the swap-three sections part as the rest would bound by O(n_att * n_col * m). But the utility of only swapping two sections seems a bit dubious to me, if I have 7 or more of them.

To me this patch seems simple (if string handling in C can be called that way) and self contained. Despite my calculations above, it resides in a non performance critical piece of code. I think of it as a quality of life thing.
Let me know what you think. Thank you!

Regards
Arne

Вложения

Re: Permute underscore separated components of columns before fuzzy matching

От
Mikhail Gribkov
Дата:
Hello Arne,

The goal of supporting words-switching hints sounds interesting and I've tried to apply your patch.
The patch was applied smoothly to the latest master and check-world reported no problems. Although I had problems after trying to test the new functionality.

I tried to simply mix words in pg_stat_activity.wait_event_type:

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] ERROR:  column "wait_type_event" does not exist at character 8
2023-07-06 14:12:35.968 MSK [1480] HINT:  Perhaps you meant to reference the column "pg_stat_activity.wait_event_type".
2023-07-06 14:12:35.968 MSK [1480] STATEMENT:  select wait_type_event from pg_stat_activity ;
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;
               ^
HINT:  Perhaps you meant to reference the column "pg_stat_activity.wait_event_type".
postgres=#

So the desired hint is really there, but thgether with looots of warnings. For sure these should not be be encountered.

And no, this is not some kind of side problem brought by some other commit. The same request on a plain master branch performs without these warnings:

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:10:17.171 MSK [22431] ERROR:  column "wait_type_event" does not exist at character 8
2023-07-06 14:10:17.171 MSK [22431] STATEMENT:  select wait_type_event from pg_stat_activity ;
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;
--
 best regards,
    Mikhail A. Gribkov

e-mail: youzhick@gmail.com
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick

Re: Permute underscore separated components of columns before fuzzy matching

От
Mikhail Gribkov
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, failed
Documentation:            tested, failed

Hello Arne,

The goal of supporting words-switching hints sounds interesting and I've tried to apply your patch.
The patch was applied smoothly to the latest master and check-world reported no problems. Although I had problems after
tryingto test the new functionality.
 

I tried to simply mix words in pg_stat_activity.wait_event_type:

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] ERROR:  column "wait_type_event" does not exist at character 8
2023-07-06 14:12:35.968 MSK [1480] HINT:  Perhaps you meant to reference the column
"pg_stat_activity.wait_event_type".
2023-07-06 14:12:35.968 MSK [1480] STATEMENT:  select wait_type_event from pg_stat_activity ;
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;
               ^
HINT:  Perhaps you meant to reference the column "pg_stat_activity.wait_event_type".
postgres=#

So the desired hint is really there, but thgether with looots of warnings. For sure these should not be be
encountered.

And no, this is not some kind of side problem brought by some other commit. The same request on a plain master branch
performswithout these warnings:
 

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:10:17.171 MSK [22431] ERROR:  column "wait_type_event" does not exist at character 8
2023-07-06 14:10:17.171 MSK [22431] STATEMENT:  select wait_type_event from pg_stat_activity ;
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;

The new status of this patch is: Waiting on Author

Re: Permute underscore separated components of columns before fuzzy matching

От
Arne Roland
Дата:
Hello Mikhail,

I'm sorry. Please try attached patch instead.

Thank you for having a look!

Regards
Arne


From: Mikhail Gribkov <youzhick@gmail.com>
Sent: Thursday, July 6, 2023 13:31
To: Arne Roland <A.Roland@index.de>
Cc: Pg Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Permute underscore separated components of columns before fuzzy matching
 
Hello Arne,

The goal of supporting words-switching hints sounds interesting and I've tried to apply your patch.
The patch was applied smoothly to the latest master and check-world reported no problems. Although I had problems after trying to test the new functionality.

I tried to simply mix words in pg_stat_activity.wait_event_type:

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] ERROR:  column "wait_type_event" does not exist at character 8
2023-07-06 14:12:35.968 MSK [1480] HINT:  Perhaps you meant to reference the column "pg_stat_activity.wait_event_type".
2023-07-06 14:12:35.968 MSK [1480] STATEMENT:  select wait_type_event from pg_stat_activity ;
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;
               ^
HINT:  Perhaps you meant to reference the column "pg_stat_activity.wait_event_type".
postgres=#

So the desired hint is really there, but thgether with looots of warnings. For sure these should not be be encountered.

And no, this is not some kind of side problem brought by some other commit. The same request on a plain master branch performs without these warnings:

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:10:17.171 MSK [22431] ERROR:  column "wait_type_event" does not exist at character 8
2023-07-06 14:10:17.171 MSK [22431] STATEMENT:  select wait_type_event from pg_stat_activity ;
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;
--
 best regards,
    Mikhail A. Gribkov

e-mail: youzhick@gmail.com
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick

Вложения

Re: Permute underscore separated components of columns before fuzzy matching

От
Mikhail Gribkov
Дата:
Hello Arne,

yep, now the warnings have gone. And I must thank you for quite a fun time I had here testing your patch :) I tried even some weird combinations like this:
postgres=# create table t("_ __ ___" int);
CREATE TABLE
postgres=# select "__ _ ___" from t;
ERROR:  column "__ _ ___" does not exist
LINE 1: select "__ _ ___" from t;
               ^
HINT:  Perhaps you meant to reference the column "t._ __ ___".
postgres=# select "___ __ _" from t;
ERROR:  column "___ __ _" does not exist
LINE 1: select "___ __ _" from t;
               ^
HINT:  Perhaps you meant to reference the column "t._ __ ___".
postgres=#

... and it still worked fine.
Honestly I'm not entirely sure fixing only two switched words is worth the effort, but the declared goal is clearly achieved. 

I think the patch is good to go, although you need to fix code formatting. At least the char*-definition and opening "{" brackets are conspicuous. Maybe there are more: it is worth running pgindend tool.

And it would be much more convenient to work with your patch if every next version file will have a unique name (maybe something like "_v2", "_v3" etc. suffixes)

--
 best regards,
    Mikhail A. Gribkov

e-mail: youzhick@gmail.com
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick



On Mon, Jul 17, 2023 at 1:42 AM Arne Roland <A.Roland@index.de> wrote:
Hello Mikhail,

I'm sorry. Please try attached patch instead.

Thank you for having a look!

Regards
Arne


From: Mikhail Gribkov <youzhick@gmail.com>
Sent: Thursday, July 6, 2023 13:31
To: Arne Roland <A.Roland@index.de>
Cc: Pg Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Permute underscore separated components of columns before fuzzy matching
 
Hello Arne,

The goal of supporting words-switching hints sounds interesting and I've tried to apply your patch.
The patch was applied smoothly to the latest master and check-world reported no problems. Although I had problems after trying to test the new functionality.

I tried to simply mix words in pg_stat_activity.wait_event_type:

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
2023-07-06 14:12:35.968 MSK [1480] ERROR:  column "wait_type_event" does not exist at character 8
2023-07-06 14:12:35.968 MSK [1480] HINT:  Perhaps you meant to reference the column "pg_stat_activity.wait_event_type".
2023-07-06 14:12:35.968 MSK [1480] STATEMENT:  select wait_type_event from pg_stat_activity ;
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
WARNING:  detected write past chunk end in MessageContext 0x559d668aaf30
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;
               ^
HINT:  Perhaps you meant to reference the column "pg_stat_activity.wait_event_type".
postgres=#

So the desired hint is really there, but thgether with looots of warnings. For sure these should not be be encountered.

And no, this is not some kind of side problem brought by some other commit. The same request on a plain master branch performs without these warnings:

postgres=# select wait_type_event from pg_stat_activity ;
2023-07-06 14:10:17.171 MSK [22431] ERROR:  column "wait_type_event" does not exist at character 8
2023-07-06 14:10:17.171 MSK [22431] STATEMENT:  select wait_type_event from pg_stat_activity ;
ERROR:  column "wait_type_event" does not exist
LINE 1: select wait_type_event from pg_stat_activity ;
--
 best regards,
    Mikhail A. Gribkov

e-mail: youzhick@gmail.com
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick

Re: Permute underscore separated components of columns before fuzzy matching

От
Tom Lane
Дата:
Mikhail Gribkov <youzhick@gmail.com> writes:
> Honestly I'm not entirely sure fixing only two switched words is worth the
> effort, but the declared goal is clearly achieved.

> I think the patch is good to go, although you need to fix code formatting.

I took a brief look at this.  I concur that we shouldn't need to be
hugely concerned about the speed of this code path.  However, we *do*
need to be concerned about its maintainability, and I think the patch
falls down badly there: it adds a chunk of very opaque and essentially
undocumented code, that people will need to reverse-engineer anytime
they are studying this function.  That could be alleviated perhaps
with more work on comments, but I have to wonder whether it's worth
carrying this logic at all.  It's a rather strange behavior to add,
and I wonder if many users will want it.

One thing that struck me is that no care is being taken for adjacent
underscores (that is, "foo__bar" and similar cases).  It seems
unlikely that treating the zero-length substring between the
underscores as a word to permute is helpful; moreover, it adds
an edge case that the string-moving logic could easily get wrong.
I wonder if the code should treat any number of consecutive
underscores as a single separator.  (Somewhat related: I think it
will behave oddly when the first or last character is '_', since the
outer loop ignores those positions.)

> And it would be much more convenient to work with your patch if every next
> version file will have a unique name (maybe something like "_v2", "_v3"
> etc. suffixes)

Please.  It's very confusing when there are multiple identically-named
patches in a thread.

            regards, tom lane



Re: Permute underscore separated components of columns before fuzzy matching

От
Arne Roland
Дата:
Hi!

Mikhail Gribkov <youzhick(at)gmail(dot)com> writes:

 > > Honestly I'm not entirely sure fixing only two switched words is 
worth the
 > > effort, but the declared goal is clearly achieved.
 >
 >
 > > I think the patch is good to go, although you need to fix code 
formatting.
 >
 >
 > I took a brief look at this.  I concur that we shouldn't need to be
 > hugely concerned about the speed of this code path.  However, we *do*
 > need to be concerned about its maintainability, and I think the patch
 > falls down badly there: it adds a chunk of very opaque and essentially
 > undocumented code, that people will need to reverse-engineer anytime
 > they are studying this function.  That could be alleviated perhaps
 > with more work on comments, but I have to wonder whether it's worth
 > carrying this logic at all.  It's a rather strange behavior to add,
 > and I wonder if many users will want it.

I encounter this problem all the time. I don't know, whether my clients 
are representative. But I see the problem, when the developers show me 
their code base all the time.
It's an issue for column names and table names alike. I personally spent 
hours watching developers trying various permutations.
They rarely request this feature. Usually they are to embarrassed for 
not knowing their object names to request anything in that state.
But I want the database, which I support, to be gentle and helpful to 
the user under these circumstances.

Regarding complexity: I think the permutation matrix is the thing to 
easily get wrong. I had a one off bug writing it down initially.
I tried to explain the conceptual approach better with a longer comment 
than before.

                 /*
                  * Only consider mirroring permutations, since the 
three simple rotations are already
                  * (or will be for a later underscore_current) covered 
above.
                  *
                  * The entries of the permutation matrix tell us, where 
we should copy the tree segments to.
                  * The zeroth dimension iterates over the permutations, 
while the first dimension iterates
                  * over the three segments are permuted to.
                  * Considering the string A_B_C the three segments are:
                  * - before the initial underscore sections (A)
                  * - between the underscore sections (B)
                  * - after the later underscore sections (C)
                  */

If anything is still unclear, I'd appreciate feedback about what might 
be still unclear/confusing about this.
I can't promise to be helpful, if something breaks. But I have 
practically forgotten how I did it, and I found it easy to extend it 
like described below. It would have been embarrassing otherwise. Yet 
this gives me hope, it should be possible to enable others the same way.
I certainly want the code simple without need to reverse-engineer 
anything. Please let me know, if there are difficult to understand bits 
left around.

 > One thing that struck me is that no care is being taken for adjacent
 > underscores (that is, "foo__bar" and similar cases).  It seems
 > unlikely that treating the zero-length substring between the
 > underscores as a word to permute is helpful; moreover, it adds
 > an edge case that the string-moving logic could easily get wrong.
 > I wonder if the code should treat any number of consecutive
 > underscores as a single separator.  (Somewhat related: I think it
 > will behave oddly when the first or last character is '_', since the
 > outer loop ignores those positions.)

I wasn't sure how there could be any potential future bug with copying 
zero-length strings, i.e. doing nothing. And I still don't see that.

There is one point I agree with: Doing this seems rarely helpful. I 
changed the code, so it treats sections delimited by an arbitrary amount 
of underscores.
So it never permutes with zero length strings within. I also added 
functionality to skip the zero length cases if we should encounter them 
at the end of the string.
So afaict there should be no zero length swaps left. Please let me know 
whether this is more to your liking.

I also replaced the hard limit of underscores with more nuanced limits 
of permutations to try before giving up.

 > > And it would be much more convenient to work with your patch if 
every next
 > > version file will have a unique name (maybe something like "_v2", "_v3"
 > > etc. suffixes)
 >
 >
 > Please.  It's very confusing when there are multiple identically-named
 > patches in a thread.

Sorry, I started with this, because I confused cf bot in the past about 
whether the patches should be applied on top of each other or not.

For me the cf-bot logic is a bit opaque there. But you are right, 
confusing patch readers is definitely worse. I'll try to do that. I hope 
the attached format is better.


One question about pgindent: I struggled a bit with getting the right 
version of bsd_indent. I found versions labeled 2.2.1 and 2.1.1, but 
apparently we work with 2.1.2. Where can I get that?

Regards
Arne

Вложения

Re: Permute underscore separated components of columns before fuzzy matching

От
Peter Smith
Дата:
2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
like there were  CFbot test failures last time it was run [2]. Please
have a look and post an updated version if necessary.

======
[1] https://commitfest.postgresql.org/46/4282/
[2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4282

Kind Regards,
Peter Smith.



Re: Permute underscore separated components of columns before fuzzy matching

От
Arne Roland
Дата:
Thank you for bringing that to my attention. Is there a way to subscribe 
to cf-bot failures?

Apparently I confused myself with my naming. I attached a patch that 
fixes the bug (at least at my cassert test-world run).

Regards
Arne

On 2024-01-22 06:38, Peter Smith wrote:
> 2024-01 Commitfest.
>
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> like there were  CFbot test failures last time it was run [2]. Please
> have a look and post an updated version if necessary.
>
> ======
> [1] https://commitfest.postgresql.org/46/4282/
> [2] https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4282
>
> Kind Regards,
> Peter Smith.
Вложения

Re: Permute underscore separated components of columns before fuzzy matching

От
Tom Lane
Дата:
Arne Roland <arne.roland@malkut.net> writes:
> Thank you for bringing that to my attention. Is there a way to subscribe 
> to cf-bot failures?

I don't know of any push notification support in cfbot, but you
can bookmark the page with your own active patches, and check it
periodically:

http://commitfest.cputube.org/arne-roland.html

(For others, click on your own name in the main cfbot page's entry for
one of your patches to find out how it spelled your name for this
purpose.)

            regards, tom lane



Re: Permute underscore separated components of columns before fuzzy matching

От
Arne Roland
Дата:
Thank you! I wasn't aware of the filter per person. It was quite simple 
integrate a web scraper into my custom push system.

Regarding the patch: I ran the 2.1.1 version of pg_bsd_indent now. I 
hope that suffices. I removed the matrix declaration to make it C90 
complaint. I attached the result.

Regards
Arne

On 2024-01-22 19:22, Tom Lane wrote:
> Arne Roland <arne.roland@malkut.net> writes:
>> Thank you for bringing that to my attention. Is there a way to subscribe
>> to cf-bot failures?
> I don't know of any push notification support in cfbot, but you
> can bookmark the page with your own active patches, and check it
> periodically:
>
> http://commitfest.cputube.org/arne-roland.html
>
> (For others, click on your own name in the main cfbot page's entry for
> one of your patches to find out how it spelled your name for this
> purpose.)
>
>             regards, tom lane
Вложения

Re: Permute underscore separated components of columns before fuzzy matching

От
"Andrey M. Borodin"
Дата:

> On 23 Jan 2024, at 09:42, Arne Roland <arne.roland@malkut.net> wrote:
>
> <0001-fuzzy_underscore_permutation_v5.patch>

Mikhail, there’s a new patch version. May I ask you to review it?


Best regards, Andrey Borodin.