PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance
| От | Rahman Duran |
|---|---|
| Тема | PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance |
| Дата | |
| Msg-id | CA+jeBrejfchM60epUo1J3M8mMAQOcCWotpDP9QZYgEL=j0b=pg@mail.gmail.com обсуждение исходный текст |
| Ответы |
Re: PostgreSQL 18.1 non deterministic collation "LIKE %abc%" performance
|
| Список | pgsql-general |
Hi,
After the release of the PostgreSQL 18 version, I am trying non determisinstic collation with LIKE pattern matching support. I am mostly searching with "LIKE %search_term%" on about 10 text columns. As I use wildcard prefix and suffix, I can't use btree index anyways. So I decided to try non deterministic collation support so I can simplify application code. I am testing this on a table with ~60K rows. With this row count and search pattern, non deterministic collation seems at least 10 times slower than LOWER LIKE and ILIKE. Tested collations are the same ICU tr-TR collation with standard and custom based on same ICU tr-TR:
40960 | test3e | 2200 | 10 | i | f | -1 | | | tr-TR-u-ks-level2 | | 153.128.46
Version:
etukimlik_admin=# select version();
version
--------------------------------------------------------------------------------------------------------------------
PostgreSQL 18.1 (Debian 18.1-1.pgdg13+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit
(1 row)
--------------------------------------------------------------------------------------------------------------------
PostgreSQL 18.1 (Debian 18.1-1.pgdg13+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit
(1 row)
Table structure with ICU deterministic collation:
etukimlik_admin=# \d "AccountActivityLogEntries"
Table "public.AccountActivityLogEntries"
Column | Type | Collation | Nullable | Default
--------------------------+--------------------------+-------------+----------+--------------------------------------------------------------------------
Id | integer | | not null | generated by default as identity
Email | character varying(128) | tr-TR-x-icu | not null |
LdapUid | character varying(512) | tr-TR-x-icu | |
Identity | character varying(24) | tr-TR-x-icu | |
StudentId | text[] | | |
EmployeeId | character varying(24) | tr-TR-x-icu | |
Name | character varying(1024) | tr-TR-x-icu | |
Surname | character varying(1024) | tr-TR-x-icu | |
AccountType | text[] | | not null |
Description | character varying(4096) | tr-TR-x-icu | not null |
ActivityId | character varying(128) | tr-TR-x-icu | not null |
LogTypeCode | character varying(10) | tr-TR-x-icu | not null |
LogType | character varying(64) | tr-TR-x-icu | not null |
LogSubType | character varying(64) | tr-TR-x-icu | not null |
ActivityTime | timestamp with time zone | | |
ActivitySource | character varying(128) | tr-TR-x-icu | not null |
ChangedAttribute | character varying(128) | tr-TR-x-icu | |
ChangedAttributeOldValue | character varying(4096) | tr-TR-x-icu | |
ChangedAttributeNewValue | character varying(4096) | tr-TR-x-icu | |
DisplayName | character varying(2049) | tr-TR-x-icu | | generated always as (((("Name"::text || ' '::text) || "Surname"::text)))
Indexes:
"PK_AccountActivityLogEntries" PRIMARY KEY, btree ("Id")
Table "public.AccountActivityLogEntries"
Column | Type | Collation | Nullable | Default
--------------------------+--------------------------+-------------+----------+--------------------------------------------------------------------------
Id | integer | | not null | generated by default as identity
Email | character varying(128) | tr-TR-x-icu | not null |
LdapUid | character varying(512) | tr-TR-x-icu | |
Identity | character varying(24) | tr-TR-x-icu | |
StudentId | text[] | | |
EmployeeId | character varying(24) | tr-TR-x-icu | |
Name | character varying(1024) | tr-TR-x-icu | |
Surname | character varying(1024) | tr-TR-x-icu | |
AccountType | text[] | | not null |
Description | character varying(4096) | tr-TR-x-icu | not null |
ActivityId | character varying(128) | tr-TR-x-icu | not null |
LogTypeCode | character varying(10) | tr-TR-x-icu | not null |
LogType | character varying(64) | tr-TR-x-icu | not null |
LogSubType | character varying(64) | tr-TR-x-icu | not null |
ActivityTime | timestamp with time zone | | |
ActivitySource | character varying(128) | tr-TR-x-icu | not null |
ChangedAttribute | character varying(128) | tr-TR-x-icu | |
ChangedAttributeOldValue | character varying(4096) | tr-TR-x-icu | |
ChangedAttributeNewValue | character varying(4096) | tr-TR-x-icu | |
DisplayName | character varying(2049) | tr-TR-x-icu | | generated always as (((("Name"::text || ' '::text) || "Surname"::text)))
Indexes:
"PK_AccountActivityLogEntries" PRIMARY KEY, btree ("Id")
LOWER LIKE query test:
etukimlik_admin=# explain analyze select * FROM "AccountActivityLogEntries" where lower("Email") like '%duran%' or lower("LdapUid") like '%duran%' or lower("Identity") like '%duran%' or lower("Name") like '%duran%' or lower("Surname") like '%duran%' or lower("DisplayName") like '%duran%' or lower("Description") like '%duran%' or lower("EmployeeId") like '%duran%' or exists ( select from unnest("StudentId") AS value WHERE lower(value) LIKE '%duran%') or lower("ActivityId") like '%duran%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on "AccountActivityLogEntries" (cost=0.00..16387.18 rows=31210 width=5251) (actual time=1.952..282.090 rows=102.00 loops=1)
Filter: ((lower(("Email")::text) ~~ '%duran%'::text) OR (lower(("LdapUid")::text) ~~ '%duran%'::text) OR (lower(("Identity")::text) ~~ '%duran%'::text) OR (lower(("Name")::text) ~~ '%duran%'::text) OR (lower(("Surname")::text) ~~ '%duran%'::text) OR (lower((((("Name")::text || ' '::text) || ("Surname")::text))::text) ~~ '%duran%'::text) OR (lower(("Description")::text) ~~ '%duran%'::text) OR (lower(("EmployeeId")::text) ~~ '%duran%'::text) OR EXISTS(SubPlan 1) OR (lower(("ActivityId")::text) ~~ '%duran%'::text))
Rows Removed by Filter: 61438
Buffers: shared hit=2922
SubPlan 1
-> Function Scan on unnest value (cost=0.00..0.15 rows=1 width=0) (actual time=0.001..0.001 rows=0.00 loops=61438)
Filter: (lower(value) ~~ '%duran%'::text)
Rows Removed by Filter: 1
Planning Time: 0.165 ms
Execution Time: 282.143 ms
(10 rows)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on "AccountActivityLogEntries" (cost=0.00..16387.18 rows=31210 width=5251) (actual time=1.952..282.090 rows=102.00 loops=1)
Filter: ((lower(("Email")::text) ~~ '%duran%'::text) OR (lower(("LdapUid")::text) ~~ '%duran%'::text) OR (lower(("Identity")::text) ~~ '%duran%'::text) OR (lower(("Name")::text) ~~ '%duran%'::text) OR (lower(("Surname")::text) ~~ '%duran%'::text) OR (lower((((("Name")::text || ' '::text) || ("Surname")::text))::text) ~~ '%duran%'::text) OR (lower(("Description")::text) ~~ '%duran%'::text) OR (lower(("EmployeeId")::text) ~~ '%duran%'::text) OR EXISTS(SubPlan 1) OR (lower(("ActivityId")::text) ~~ '%duran%'::text))
Rows Removed by Filter: 61438
Buffers: shared hit=2922
SubPlan 1
-> Function Scan on unnest value (cost=0.00..0.15 rows=1 width=0) (actual time=0.001..0.001 rows=0.00 loops=61438)
Filter: (lower(value) ~~ '%duran%'::text)
Rows Removed by Filter: 1
Planning Time: 0.165 ms
Execution Time: 282.143 ms
(10 rows)
ILIKE query test:
explain analyze select * FROM "AccountActivityLogEntries" where "Email" ilike '%duran%' or "LdapUid" ilike '%duran%' or "Identity" ilike '%duran%' or "Name" ilike '%duran%' or "Surname" ilike '%duran%' or "DisplayName" ilike '%duran%' or "Description" ilike '%duran%' or "EmployeeId" ilike '%duran%' or exists ( select from unnest("StudentId") AS value WHERE value iLIKE '%duran%') or "ActivityId" ilike '%duran%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on "AccountActivityLogEntries" (cost=0.00..13464.03 rows=31210 width=5251) (actual time=3.782..417.573 rows=102.00 loops=1)
Filter: ((("Email")::text ~~* '%duran%'::text) OR (("LdapUid")::text ~~* '%duran%'::text) OR (("Identity")::text ~~* '%duran%'::text) OR (("Name")::text ~~* '%duran%'::text) OR (("Surname")::text ~~* '%duran%'::text) OR ((((("Name")::text || ' '::text) || ("Surname")::text))::text ~~* '%duran%'::text) OR (("Description")::text ~~* '%duran%'::text) OR (("EmployeeId")::text ~~* '%duran%'::text) OR EXISTS(SubPlan 1) OR (("ActivityId")::text ~~* '%duran%'::text))
Rows Removed by Filter: 61438
Buffers: shared hit=2922
SubPlan 1
-> Function Scan on unnest value (cost=0.00..0.13 rows=1 width=0) (actual time=0.001..0.001 rows=0.00 loops=61438)
Filter: (value ~~* '%duran%'::text)
Rows Removed by Filter: 1
Planning Time: 0.160 ms
Execution Time: 417.652 ms
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on "AccountActivityLogEntries" (cost=0.00..13464.03 rows=31210 width=5251) (actual time=3.782..417.573 rows=102.00 loops=1)
Filter: ((("Email")::text ~~* '%duran%'::text) OR (("LdapUid")::text ~~* '%duran%'::text) OR (("Identity")::text ~~* '%duran%'::text) OR (("Name")::text ~~* '%duran%'::text) OR (("Surname")::text ~~* '%duran%'::text) OR ((((("Name")::text || ' '::text) || ("Surname")::text))::text ~~* '%duran%'::text) OR (("Description")::text ~~* '%duran%'::text) OR (("EmployeeId")::text ~~* '%duran%'::text) OR EXISTS(SubPlan 1) OR (("ActivityId")::text ~~* '%duran%'::text))
Rows Removed by Filter: 61438
Buffers: shared hit=2922
SubPlan 1
-> Function Scan on unnest value (cost=0.00..0.13 rows=1 width=0) (actual time=0.001..0.001 rows=0.00 loops=61438)
Filter: (value ~~* '%duran%'::text)
Rows Removed by Filter: 1
Planning Time: 0.160 ms
Execution Time: 417.652 ms
Table structure with ICU non deterministic collation:
etukimlik_admin=# \d "AccountActivityLogEntries"
Table "public.AccountActivityLogEntries"
Column | Type | Collation | Nullable | Default
--------------------------+--------------------------+-----------+----------+--------------------------------------------------------------------------
Id | integer | | not null | generated by default as identity
Email | character varying(128) | test3e | not null |
LdapUid | character varying(512) | test3e | |
Identity | character varying(24) | test3e | |
StudentId | text[] | | |
EmployeeId | character varying(24) | test3e | |
Name | character varying(1024) | test3e | |
Surname | character varying(1024) | test3e | |
AccountType | text[] | | not null |
Description | character varying(4096) | test3e | not null |
ActivityId | character varying(128) | test3e | not null |
LogTypeCode | character varying(10) | test3e | not null |
LogType | character varying(64) | test3e | not null |
LogSubType | character varying(64) | test3e | not null |
ActivityTime | timestamp with time zone | | |
ActivitySource | character varying(128) | test3e | not null |
ChangedAttribute | character varying(128) | test3e | |
ChangedAttributeOldValue | character varying(4096) | test3e | |
ChangedAttributeNewValue | character varying(4096) | test3e | |
DisplayName | character varying(2049) | test3e | | generated always as (((("Name"::text || ' '::text) || "Surname"::text)))
Indexes:
"PK_AccountActivityLogEntries" PRIMARY KEY, btree ("Id")
Table "public.AccountActivityLogEntries"
Column | Type | Collation | Nullable | Default
--------------------------+--------------------------+-----------+----------+--------------------------------------------------------------------------
Id | integer | | not null | generated by default as identity
Email | character varying(128) | test3e | not null |
LdapUid | character varying(512) | test3e | |
Identity | character varying(24) | test3e | |
StudentId | text[] | | |
EmployeeId | character varying(24) | test3e | |
Name | character varying(1024) | test3e | |
Surname | character varying(1024) | test3e | |
AccountType | text[] | | not null |
Description | character varying(4096) | test3e | not null |
ActivityId | character varying(128) | test3e | not null |
LogTypeCode | character varying(10) | test3e | not null |
LogType | character varying(64) | test3e | not null |
LogSubType | character varying(64) | test3e | not null |
ActivityTime | timestamp with time zone | | |
ActivitySource | character varying(128) | test3e | not null |
ChangedAttribute | character varying(128) | test3e | |
ChangedAttributeOldValue | character varying(4096) | test3e | |
ChangedAttributeNewValue | character varying(4096) | test3e | |
DisplayName | character varying(2049) | test3e | | generated always as (((("Name"::text || ' '::text) || "Surname"::text)))
Indexes:
"PK_AccountActivityLogEntries" PRIMARY KEY, btree ("Id")
LIKE query test:
etukimlik_admin=# explain analyze select * FROM "AccountActivityLogEntries" where "Email" like '%duran%' or "LdapUid" like '%duran%' or "Identity" like '%duran%' or "Name" like '%duran%' or "Surname" like '%duran%' or "DisplayName" like '%duran%' or "Description" like '%duran%' or "EmployeeId" like '%duran%' or exists ( select from unnest("StudentId") AS value WHERE value LIKE '%duran%') or "ActivityId" like '%duran%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on "AccountActivityLogEntries" (cost=0.00..13464.03 rows=31210 width=5251) (actual time=18.466..2338.605 rows=102.00 loops=1)
Filter: ((("Email")::text ~~ '%duran%'::text) OR (("LdapUid")::text ~~ '%duran%'::text) OR (("Identity")::text ~~ '%duran%'::text) OR (("Name")::text ~~ '%duran%'::text) OR (("Surname")::text ~~ '%duran%'::text) OR ((((("Name")::text || ' '::text) || ("Surname")::text))::text ~~ '%duran%'::text) OR (("Description")::text ~~ '%duran%'::text) OR (("EmployeeId")::text ~~ '%duran%'::text) OR EXISTS(SubPlan 1) OR (("ActivityId")::text ~~ '%duran%'::text))
Rows Removed by Filter: 61438
Buffers: shared hit=2922
SubPlan 1
-> Function Scan on unnest value (cost=0.00..0.13 rows=1 width=0) (actual time=0.001..0.001 rows=0.00 loops=61438)
Filter: (value ~~ '%duran%'::text)
Rows Removed by Filter: 1
Planning:
Buffers: shared hit=4
Planning Time: 0.176 ms
Execution Time: 2338.670 ms
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on "AccountActivityLogEntries" (cost=0.00..13464.03 rows=31210 width=5251) (actual time=18.466..2338.605 rows=102.00 loops=1)
Filter: ((("Email")::text ~~ '%duran%'::text) OR (("LdapUid")::text ~~ '%duran%'::text) OR (("Identity")::text ~~ '%duran%'::text) OR (("Name")::text ~~ '%duran%'::text) OR (("Surname")::text ~~ '%duran%'::text) OR ((((("Name")::text || ' '::text) || ("Surname")::text))::text ~~ '%duran%'::text) OR (("Description")::text ~~ '%duran%'::text) OR (("EmployeeId")::text ~~ '%duran%'::text) OR EXISTS(SubPlan 1) OR (("ActivityId")::text ~~ '%duran%'::text))
Rows Removed by Filter: 61438
Buffers: shared hit=2922
SubPlan 1
-> Function Scan on unnest value (cost=0.00..0.13 rows=1 width=0) (actual time=0.001..0.001 rows=0.00 loops=61438)
Filter: (value ~~ '%duran%'::text)
Rows Removed by Filter: 1
Planning:
Buffers: shared hit=4
Planning Time: 0.176 ms
Execution Time: 2338.670 ms
As you can see, setting deterministic false to the same ICU collation and querying with "LIKE %search_term%" is way slower than using lower() function. So why is it this slow? Is there any technical documentation that explains this behaviour?
Regards,
Rahman Duran
В списке pgsql-general по дате отправления: