Обсуждение: [MASSMAIL]psql: Greatly speed up "\d tablename" when not using regexes

Поиск
Список
Период
Сортировка

[MASSMAIL]psql: Greatly speed up "\d tablename" when not using regexes

От
Jelte Fennema-Nio
Дата:
Running "\d tablename" from psql could take multiple seconds when
running on a system with 100k+ tables. The reason for this was that
a sequence scan on pg_class takes place, due to regex matching being
used.

Regex matching is obviously unnecessary when we're looking for an exact
match. This checks for this (common) case and starts using plain
equality in that case.

Вложения

Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Greg Sabino Mullane
Дата:
Patch looks good to me. Great idea overall, that forced regex has always bugged me.

+       char       *regexChars = "|*+?()[]{}.^$\\";

One super minor optimization is that we technically do not need to scan for ')' and ']'. If they appear without their partner, the query will fail anyway. :)
Cheers,
Greg

Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Tom Lane
Дата:
Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> Running "\d tablename" from psql could take multiple seconds when
> running on a system with 100k+ tables. The reason for this was that
> a sequence scan on pg_class takes place, due to regex matching being
> used.

> Regex matching is obviously unnecessary when we're looking for an exact
> match. This checks for this (common) case and starts using plain
> equality in that case.

Really?  ISTM this argument is ignoring an optimization the backend
has understood for a long time.

regression=# explain select * from pg_class where relname ~ '^foo$';
                                         QUERY PLAN

--------------------------------------------------------------------------------
-------------
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.28..8.30 rows=
1 width=739)
   Index Cond: (relname = 'foo'::text)
   Filter: (relname ~ '^foo$'::text)
(3 rows)

            regards, tom lane



Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Kirill Reshke
Дата:
Hi


> Regex matching is obviously unnecessary when we're looking for an exact
> match. This checks for this (common) case and starts using plain
> equality in that case.

+1

> + appendPQExpBuffer(buf, "(%s OPERATOR(pg_catalog.=) ", namevar);
> + appendStringLiteralConn(buf, &namebuf.data[2], conn);
> + appendPQExpBuffer(buf, "\n        OR %s OPERATOR(pg_catalog.=) ",
> +  altnamevar);
> + appendStringLiteralConn(buf, &namebuf.data[2], conn);
> + appendPQExpBufferStr(buf, ")\n");

Do we need to force Collaction here like in other branches?
if (PQserverVersion(conn) >= 120000)
   appendPQExpBufferStr(buf, " COLLATE pg_catalog.default");



Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Jelte Fennema-Nio
Дата:
On Wed, 10 Apr 2024 at 20:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Really?  ISTM this argument is ignoring an optimization the backend
> has understood for a long time.

Interesting. I didn't know about that optimization. I can't check
right now, but probably the COLLATE breaks that optimization.



Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Jelte Fennema-Nio
Дата:
On Wed, 10 Apr 2024 at 20:21, Kirill Reshke <reshkekirill@gmail.com> wrote:
> Do we need to force Collaction here like in other branches?
> if (PQserverVersion(conn) >= 120000)
>    appendPQExpBufferStr(buf, " COLLATE pg_catalog.default");

According to the commit and codecomment that introduced the COLLATE,
it was specifically added for correct regex matching (e.g. \w). So I
don't think it's necessary, and I'm pretty sure adding it will cause
the index scan not to be used anymore.



Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Kirill Reshke
Дата:

On Wed, 10 Apr 2024, 23:37 Jelte Fennema-Nio, <postgres@jeltef.nl> wrote:
On Wed, 10 Apr 2024 at 20:21, Kirill Reshke <reshkekirill@gmail.com> wrote:
> Do we need to force Collaction here like in other branches?
> if (PQserverVersion(conn) >= 120000)
>    appendPQExpBufferStr(buf, " COLLATE pg_catalog.default");

According to the commit and codecomment that introduced the COLLATE,
it was specifically added for correct regex matching (e.g. \w). So I
don't think it's necessary, and I'm pretty sure adding it will cause
the index scan not to be used anymore.

Ok, thanks for the clarification. If all of this is actually true, and patch is really does speedup, maybe we need to state this in the comments?

Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Tom Lane
Дата:
Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> On Wed, 10 Apr 2024 at 20:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Really?  ISTM this argument is ignoring an optimization the backend
>> has understood for a long time.

> Interesting. I didn't know about that optimization. I can't check
> right now, but probably the COLLATE breaks that optimization.

Not for me.

# explain select * from pg_class where relname ~ '^(foo)$' collate "en_US";
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.27..8.29 rows=1 width=263)
   Index Cond: (relname = 'foo'::text)
   Filter: (relname ~ '^(foo)$'::text COLLATE "en_US")
(3 rows)

Also, using -E:

# \d foo
/******** QUERY *********/
SELECT c.oid,
  n.nspname,
  c.relname
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname OPERATOR(pg_catalog.~) '^(foo)$' COLLATE pg_catalog.default
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;
/************************/

# explain SELECT c.oid,
  n.nspname,
  c.relname
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname OPERATOR(pg_catalog.~) '^(foo)$' COLLATE pg_catalog.default
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 2, 3;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Sort  (cost=9.42..9.42 rows=1 width=132)
   Sort Key: n.nspname, c.relname
   ->  Nested Loop Left Join  (cost=0.27..9.41 rows=1 width=132)
         Join Filter: (n.oid = c.relnamespace)
         ->  Index Scan using pg_class_relname_nsp_index on pg_class c  (cost=0.27..8.32 rows=1 width=72)
               Index Cond: (relname = 'foo'::text)
               Filter: ((relname ~ '^(foo)$'::text) AND pg_table_is_visible(oid))
         ->  Seq Scan on pg_namespace n  (cost=0.00..1.04 rows=4 width=68)
(8 rows)


There may be an argument for psql to do what you suggest,
but so far it seems like duplicative complication.

If there's a case you can demonstrate where "\d foo" doesn't optimize
into an indexscan, we should look into exactly why that's happening,
because I think the cause must be more subtle than this.

            regards, tom lane



Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Jelte Fennema-Nio
Дата:
On Wed, 10 Apr 2024 at 22:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There may be an argument for psql to do what you suggest,
> but so far it seems like duplicative complication.
>
> If there's a case you can demonstrate where "\d foo" doesn't optimize
> into an indexscan, we should look into exactly why that's happening,
> because I think the cause must be more subtle than this.

Hmm, okay so I took a closer look and you're completely right: It's
quite a lot more subtle than I initially thought. The query from "\d
foo" is fast as long as you don't have Citus installed. It turns out
that Citus breaks this regex index search optimization somehow by
adding "NOT relation_is_a_known_shard(c.oid)" to the securityQuals of
the rangeTableEntry for pg_class in its planner hook. Citus does this
to filter out the underlying shards of a table for every query on
pg_class. The reason is that these underlying shards cluttered the
output of \d and PgAdmin etc. Users also tended to get confused by
them, sometimes badly enough to remove them (and thus requiring
restore from backup).

We have a GUC to turn this filtering off for advanced users:
SET citus.show_shards_for_app_name_prefixes = '*';

If you set that the index is used and the query is fast again. Just
like what is happening for you. Not using the regex search also worked
as a way to trigger an index scan.

I'll think/research a bit tomorrow and try some stuff out to see if
this is fixable in Citus. That would definitely be preferable to me as
it would solve this issue on all Postgres/psql versions that citus
supports.

If I cannot think of a way to address this in Citus, would it be
possible to still consider to merge this patch (assuming comments
explaining that Citus is the reason)? Because this planner issue that
Citus its behaviour introduces is fixed by the change I proposed in my
Patch too (I'm not yet sure how exactly).



Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Tom Lane
Дата:
Jelte Fennema-Nio <postgres@jeltef.nl> writes:
> On Wed, 10 Apr 2024 at 22:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If there's a case you can demonstrate where "\d foo" doesn't optimize
>> into an indexscan, we should look into exactly why that's happening,
>> because I think the cause must be more subtle than this.

> Hmm, okay so I took a closer look and you're completely right: It's
> quite a lot more subtle than I initially thought. The query from "\d
> foo" is fast as long as you don't have Citus installed. It turns out
> that Citus breaks this regex index search optimization somehow by
> adding "NOT relation_is_a_known_shard(c.oid)" to the securityQuals of
> the rangeTableEntry for pg_class in its planner hook. Citus does this
> to filter out the underlying shards of a table for every query on
> pg_class.

Huh.  Okay, but then why does it work for a simple comparison?

Actually, I bet I know why: texteq is marked leakproof which lets
it drop below a security qual, while regex matches aren't leakproof.

Is it really necessary for Citus' filter to be a security qual rather
than a plain ol' filter condition?

The direction I'd be inclined to think about if this can't be fixed
inside Citus is whether we can generate the derived indexqual
condition despite the regex being backed up behind a security qual.
That is, as long as the derived condition is leakproof, there's no
reason not to let it go before the security qual.  We're probably
failing to consider generating derived quals for anything that isn't
qualified to become an indexqual, and this example shows that that's
leaving money on the table.

            regards, tom lane



Re: psql: Greatly speed up "\d tablename" when not using regexes

От
Jelte Fennema-Nio
Дата:
On Wed, 10 Apr 2024 at 23:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Is it really necessary for Citus' filter to be a security qual rather
> than a plain ol' filter condition?

No, it's not. I think using security quals simply required the least
amount of code (and it worked just fine if you didn't have lots of
tables). I created a PR for Citus to address this issue[1] by changing
to a normal filter condition. Thanks a lot for pointing me in the
right direction to fix this.

> That is, as long as the derived condition is leakproof, there's no
> reason not to let it go before the security qual.  We're probably
> failing to consider generating derived quals for anything that isn't
> qualified to become an indexqual, and this example shows that that's
> leaving money on the table.

I think even though my immediate is fixed, I think this would be a
good improvement anyway.

[1]: https://github.com/citusdata/citus/pull/7577