Обсуждение: type cache cleanup improvements

Поиск
Список
Период
Сортировка

type cache cleanup improvements

От
Teodor Sigaev
Дата:
Hi!

I'd like to suggest two independent patches to improve performance of type cache 
cleanup. I found a case where type cache cleanup was a reason for low 
performance. In short, customer makes 100 thousand temporary tables in one 
transaction.

1 mapRelType.patch
   It just adds a local map between relation and its type as it was suggested in 
comment above TypeCacheRelCallback(). Unfortunately, using syscache here was 
impossible because this call back could be called outside transaction and it 
makes impossible catalog lookups.

2 hash_seq_init_with_hash_value.patch
  TypeCacheTypCallback() loop over type hash  to find entry with given hash 
value. Here there are two problems: 1) there isn't  interface to dynahash to 
search entry with given hash value and 2) hash value calculation algorithm is 
differ from system cache. But coming hashvalue is came from system cache. Patch 
is addressed to both issues. It suggests hash_seq_init_with_hash_value() call 
which inits hash sequential scan over the single bucket which could contain 
entry with given hash value, and hash_seq_search() will iterate only over such 
entries. Anf patch changes hash algorithm to match syscache. Actually, patch 
makes small refactoring of dynahash, it makes common function hash_do_lookup() 
which does initial lookup in hash.

Some artificial performance test is in attachment, command to test is 'time psql 
< custom_types_and_array.sql', here I show only last rollback time and total 
execution time:
1) master 92d2ab7554f92b841ea71bcc72eaa8ab11aae662
Time: 33353,288 ms (00:33,353)
psql < custom_types_and_array.sql  0,82s user 0,71s system 1% cpu 1:28,36 total

2) mapRelType.patch
Time: 7455,581 ms (00:07,456)
psql < custom_types_and_array.sql  1,39s user 1,19s system 6% cpu 41,220 total

3) hash_seq_init_with_hash_value.patch
Time: 24975,886 ms (00:24,976)
psql < custom_types_and_array.sql  1,33s user 1,25s system 3% cpu 1:19,77 total

4) both
Time: 89,446 ms
psql < custom_types_and_array.sql  0,72s user 0,52s system 10% cpu 12,137 total


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
Вложения

Re: type cache cleanup improvements

От
Aleksander Alekseev
Дата:
Hi Teodor,

> I'd like to suggest two independent patches to improve performance of type cache
> cleanup. I found a case where type cache cleanup was a reason for low
> performance. In short, customer makes 100 thousand temporary tables in one
> transaction.
>
> 1 mapRelType.patch
>    It just adds a local map between relation and its type as it was suggested in
> comment above TypeCacheRelCallback(). Unfortunately, using syscache here was
> impossible because this call back could be called outside transaction and it
> makes impossible catalog lookups.
>
> 2 hash_seq_init_with_hash_value.patch
>   TypeCacheTypCallback() loop over type hash  to find entry with given hash
> value. Here there are two problems: 1) there isn't  interface to dynahash to
> search entry with given hash value and 2) hash value calculation algorithm is
> differ from system cache. But coming hashvalue is came from system cache. Patch
> is addressed to both issues. It suggests hash_seq_init_with_hash_value() call
> which inits hash sequential scan over the single bucket which could contain
> entry with given hash value, and hash_seq_search() will iterate only over such
> entries. Anf patch changes hash algorithm to match syscache. Actually, patch
> makes small refactoring of dynahash, it makes common function hash_do_lookup()
> which does initial lookup in hash.
>
> Some artificial performance test is in attachment, command to test is 'time psql
> < custom_types_and_array.sql', here I show only last rollback time and total
> execution time:
> 1) master 92d2ab7554f92b841ea71bcc72eaa8ab11aae662
> Time: 33353,288 ms (00:33,353)
> psql < custom_types_and_array.sql  0,82s user 0,71s system 1% cpu 1:28,36 total
>
> 2) mapRelType.patch
> Time: 7455,581 ms (00:07,456)
> psql < custom_types_and_array.sql  1,39s user 1,19s system 6% cpu 41,220 total
>
> 3) hash_seq_init_with_hash_value.patch
> Time: 24975,886 ms (00:24,976)
> psql < custom_types_and_array.sql  1,33s user 1,25s system 3% cpu 1:19,77 total
>
> 4) both
> Time: 89,446 ms
> psql < custom_types_and_array.sql  0,72s user 0,52s system 10% cpu 12,137 total

These changes look very promising. Unfortunately the proposed patches
conflict with each other regardless the order of applying:

```
error: patch failed: src/backend/utils/cache/typcache.c:356
error: src/backend/utils/cache/typcache.c: patch does not apply
```

So it's difficult to confirm case 4, not to mention the fact that we
are unable to test the patches on cfbot.

Could you please rebase the patches against the recent master branch
(in any order) and submit the result of `git format-patch` ?

-- 
Best regards,
Aleksander Alekseev



Re: type cache cleanup improvements

От
Teodor Sigaev
Дата:
Hi!

Thank you for interesting in it!

> These changes look very promising. Unfortunately the proposed patches
> conflict with each other regardless the order of applying:
> 
> ```
> error: patch failed: src/backend/utils/cache/typcache.c:356
> error: src/backend/utils/cache/typcache.c: patch does not apply
> ```
Try increase -F option of patch.

Anyway, union of both patches in attachment


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
Вложения

Re: type cache cleanup improvements

От
Aleksander Alekseev
Дата:
Hi,

> Thank you for interesting in it!
>
> > These changes look very promising. Unfortunately the proposed patches
> > conflict with each other regardless the order of applying:
> >
> > ```
> > error: patch failed: src/backend/utils/cache/typcache.c:356
> > error: src/backend/utils/cache/typcache.c: patch does not apply
> > ```
> Try increase -F option of patch.
>
> Anyway, union of both patches in attachment

Thanks for the quick update.

I tested the patch on an Intel MacBook. A release build was used with
my typical configuration, TWIMC see single-install-meson.sh [1]. The
speedup I got on the provided benchmark is about 150 times. cfbot
seems to be happy with the patch.

I would like to tweak the patch a little bit - change some comments,
add some Asserts, etc. Don't you mind?

[1]: https://github.com/afiskon/pgscripts/

-- 
Best regards,
Aleksander Alekseev



Re: type cache cleanup improvements

От
Teodor Sigaev
Дата:

> I would like to tweak the patch a little bit - change some comments,
> add some Asserts, etc. Don't you mind?
You are welcome!
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/



Re: type cache cleanup improvements

От
Aleksander Alekseev
Дата:
Hi,

> > I would like to tweak the patch a little bit - change some comments,
> > add some Asserts, etc. Don't you mind?
> You are welcome!

Thanks. PFA the updated patch with some tweaks by me. I added the
commit message as well.

One thing that I couldn't immediately figure out is why 0 hash value
is treated as a magic invalid value in TypeCacheTypCallback():

```
-       hash_seq_init(&status, TypeCacheHash);
+       if (hashvalue == 0)
+               hash_seq_init(&status, TypeCacheHash);
+       else
+               hash_seq_init_with_hash_value(&status, TypeCacheHash,
hashvalue);
```

Is there anything that prevents the actual hash value from being zero?
I don't think so, but maybe I missed something.

If zero is indeed an invalid hash value I would like to reference the
corresponding code. If zero is a correct hash value we should either
change this by adding something like `if(!hash) hash++` or use an
additional boolean argument here.

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: type cache cleanup improvements

От
Tom Lane
Дата:
Aleksander Alekseev <aleksander@timescale.com> writes:
> One thing that I couldn't immediately figure out is why 0 hash value
> is treated as a magic invalid value in TypeCacheTypCallback():

I've not read this patch, but IIRC in some places we have a convention
that hash value zero is passed for an sinval reset event (that is,
"flush all cache entries").

            regards, tom lane



Re: type cache cleanup improvements

От
Teodor Sigaev
Дата:
Yep, exacly. One time from 2^32 we reset whole cache instead of one (or several) 
entry with hash value = 0.

On 08.03.2024 18:35, Tom Lane wrote:
> Aleksander Alekseev <aleksander@timescale.com> writes:
>> One thing that I couldn't immediately figure out is why 0 hash value
>> is treated as a magic invalid value in TypeCacheTypCallback():
> 
> I've not read this patch, but IIRC in some places we have a convention
> that hash value zero is passed for an sinval reset event (that is,
> "flush all cache entries").
> 
>             regards, tom lane
> 
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/



Re: type cache cleanup improvements

От
Aleksander Alekseev
Дата:
Hi,

> Yep, exacly. One time from 2^32 we reset whole cache instead of one (or several)
> entry with hash value = 0.

Got it. Here is an updated patch where I added a corresponding comment.

Now the patch LGTM. I'm going to change its status to RfC unless
anyone wants to review it too.

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: type cache cleanup improvements

От
Teodor Sigaev
Дата:
> Got it. Here is an updated patch where I added a corresponding comment.
Thank you!

Playing around I found one more place which could easily modified with 
hash_seq_init_with_hash_value() call.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
Вложения

Re: type cache cleanup improvements

От
Michael Paquier
Дата:
On Tue, Mar 12, 2024 at 06:55:41PM +0300, Teodor Sigaev wrote:
> Playing around I found one more place which could easily modified with
> hash_seq_init_with_hash_value() call.

I think that this patch should be split for clarity, as there are a
few things that are independently useful.  I guess something like
that:
- Introduction of hash_initial_lookup(), that simplifies 3 places of
dynahash.c where the same code is used.  The routine should be
inlined.
- The split in hash_seq_search to force a different type of search is
weird, complicating the dynahash interface by hiding what seems like a
search mode.  Rather than hasHashvalue that's hidden in the middle of
HASH_SEQ_STATUS, could it be better to have an entirely different API
for the search?  That should be a patch on its own, as well.
- The typcache changes.
--
Michael

Вложения

Re: type cache cleanup improvements

От
Teodor Sigaev
Дата:
> I think that this patch should be split for clarity, as there are a
> few things that are independently useful.  I guess something like
> that:
Done, all patches should be applied consequentially.

 > - The typcache changes.
01-map_rel_to_type.v5.patch adds map relation to its type

> - Introduction of hash_initial_lookup(), that simplifies 3 places of
> dynahash.c where the same code is used.  The routine should be
> inlined.
> - The split in hash_seq_search to force a different type of search is
> weird, complicating the dynahash interface by hiding what seems like a
> search mode.  Rather than hasHashvalue that's hidden in the middle of
> HASH_SEQ_STATUS, could it be better to have an entirely different API
> for the search?  That should be a patch on its own, as well.

02-hash_seq_init_with_hash_value.v5.patch - introduces a 
hash_seq_init_with_hash_value() method. hash_initial_lookup() is marked as 
inline, but I suppose, modern compilers are smart enough to inline it automatically.

Using separate interface for scanning hash with hash value will make scan code 
more ugly in case when we need to use special value of hash value as it is done 
in cache's scans. Look, instead of this simplified code:
    if (hashvalue == 0)
        hash_seq_init(&status, TypeCacheHash);
    else
        hash_seq_init_with_hash_value(&status, TypeCacheHash, hashvalue);
    while ((typentry = hash_seq_search(&status)) != NULL) {
       ...
    }
we will need to code something like that:
    if (hashvalue == 0)
    {
        hash_seq_init(&status, TypeCacheHash);

        while ((typentry = hash_seq_search(&status)) != NULL) {
               ...
        }
    }
    else
    {
        hash_seq_init_with_hash_value(&status, TypeCacheHash, hashvalue);
        while ((typentry = hash_seq_search_with_hash_value(&status)) != NULL) {
               ...
        }
    }
Or I didn't understand you.

I thought about integrate check inside existing loop in  hash_seq_search() :
+ rerun:
     if ((curElem = status->curEntry) != NULL)
     {
         /* Continuing scan of curBucket... */
         status->curEntry = curElem->link;
         if (status->curEntry == NULL)   /* end of this bucket */
+    {
+        if (status->hasHashvalue)
+        hash_seq_term(status);        
    
             ++status->curBucket;
+    }
+    else if (status->hasHashvalue && status->hashvalue !=
+                curElem->hashvalue)
+        goto rerun;
         return (void *) ELEMENTKEY(curElem);
     }

But for me it looks weird and adds some checks which will takes some CPU time.


03-att_with_hash_value.v5.patch - adds usage of previous patch.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
Вложения

Re: type cache cleanup improvements

От
Michael Paquier
Дата:
On Wed, Mar 13, 2024 at 04:40:38PM +0300, Teodor Sigaev wrote:
> Done, all patches should be applied consequentially.

One thing that first pops out to me is that we can do the refactor of
hash_initial_lookup() as an independent piece, without the extra paths
introduced.  But rather than returning the bucket hash and have the
bucket number as an in/out argument of hash_initial_lookup(), there is
an argument for reversing them: hash_search_with_hash_value() does not
care about the bucket number.

> 02-hash_seq_init_with_hash_value.v5.patch - introduces a
> hash_seq_init_with_hash_value() method. hash_initial_lookup() is marked as
> inline, but I suppose, modern compilers are smart enough to inline it
> automatically.

Likely so, though that does not hurt to show the intention to the
reader.

So I would like to suggest the attached patch for this first piece.
What do you think?

It may also be an idea to use `git format-patch` when generating a
series of patches.  That makes for easier reviews.
--
Michael

Вложения

Re: type cache cleanup improvements

От
Teodor Sigaev
Дата:
> One thing that first pops out to me is that we can do the refactor of
> hash_initial_lookup() as an independent piece, without the extra paths
> introduced.  But rather than returning the bucket hash and have the
> bucket number as an in/out argument of hash_initial_lookup(), there is
> an argument for reversing them: hash_search_with_hash_value() does not
> care about the bucket number.
Ok, no problem

> 
>> 02-hash_seq_init_with_hash_value.v5.patch - introduces a
>> hash_seq_init_with_hash_value() method. hash_initial_lookup() is marked as
>> inline, but I suppose, modern compilers are smart enough to inline it
>> automatically.
> 
> Likely so, though that does not hurt to show the intention to the
> reader.
Agree

> 
> So I would like to suggest the attached patch for this first piece.
> What do you think?
I have not any objections

> 
> It may also be an idea to use `git format-patch` when generating a
> series of patches.  That makes for easier reviews.
Thanks, will try

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/



Re: type cache cleanup improvements

От
Michael Paquier
Дата:
On Thu, Mar 14, 2024 at 04:27:43PM +0300, Teodor Sigaev wrote:
>> So I would like to suggest the attached patch for this first piece.
>> What do you think?
>
> I have not any objections

Okay, I've applied this piece for now.  Not sure I'll have much room
to look at the rest.
--
Michael

Вложения

Re: type cache cleanup improvements

От
Teodor Sigaev
Дата:
> Okay, I've applied this piece for now.  Not sure I'll have much room
> to look at the rest.

Thank you very much!

Rest of patches, rebased.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/
Вложения

Re: type cache cleanup improvements

От
Жарков Роман
Дата:
> Rest of patches, rebased.

Hello,
I read and tested only the first patch so far. Creation of temp
tables and rollback in your script work 3-4 times faster with
0001-type-cache.patch on my windows laptop.

In the patch I found a copy of the comment "If it's domain over
composite, reset flags...". Can we move the reset flags operation
and its comment into the invalidateCompositeTypeCacheEntry()
function? This simplify the TypeCacheRelCallback() func, but
adds two more IF statements when we need to clean up a cache
entry for a specific relation. (diff attached).
--
Roman Zharkov
Вложения

Re: type cache cleanup improvements

От
Andrei Lepikhov
Дата:
On 3/15/24 17:57, Teodor Sigaev wrote:
>> Okay, I've applied this piece for now.  Not sure I'll have much room
>> to look at the rest.
> 
> Thank you very much!
I have spent some time reviewing this feature. I think we can discuss 
and apply it step-by-step. So, the 0001-* patch is at this moment.
The feature addresses the issue of TypCache being bloated by intensive 
usage of non-standard types and domains. It adds significant overhead 
during relcache invalidation by thoroughly scanning this hash table.
IMO, this feature will be handy soon, as we already see some patches 
where TypCache is intensively used for storing composite types—for 
example, look into solutions proposed in [1].
One of my main concerns with this feature is the possibility of lost 
entries, which could be mistakenly used by relations with the same oid 
in the future. This seems particularly possible in cases with multiple 
temporary tables. The author has attempted to address this by replacing 
the typrelid and type_id fields in the mapRelType on each call of 
lookup_type_cache. However, I believe we could further improve this by 
removing the entry from mapRelType on invalidation, thus avoiding this 
potential issue.
While reviewing the patch, I made some minor changes (see attachment) 
that you're free to adopt or reject. However, it's crucial that the 
patch includes a detailed explanation, not just a single sentence, to 
ensure everyone understands the changes.
Upon closer inspection, I noticed that the current implementation only 
invalidates the cache entry. While this is acceptable for standard 
types, it may not be sufficient to maintain numerous custom types (as in 
the example in the initial letter) or in cases where whole-row vars are 
heavily used. In such scenarios, removing the entry and reducing the 
hash table's size might be more efficient.
In toto, the 0001-* patch looks good, and I would be glad to see it in 
the core.

[1] 
https://www.postgresql.org/message-id/flat/CAKcux6ktu-8tefLWtQuuZBYFaZA83vUzuRd7c1YHC-yEWyYFpg%40mail.gmail.com

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения