Обсуждение: "type with xxxx does not exist" when doing ExecMemoize()

Поиск
Список
Период
Сортировка

"type with xxxx does not exist" when doing ExecMemoize()

От
Tender Wang
Дата:

Hi,

I met Memoize node failed When I used sqlancer test postgres.
database0=# explain select t0.c0 from t0 join t5 on t0.c0 = (t5.c0 - t5.c0);
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Nested Loop  (cost=0.17..21.20 rows=4 width=32)
   ->  Seq Scan on t5  (cost=0.00..1.04 rows=4 width=14)
   ->  Memoize  (cost=0.17..6.18 rows=1 width=32)
         Cache Key: (t5.c0 - t5.c0)
         Cache Mode: logical
         ->  Index Only Scan using t0_c0_key on t0  (cost=0.15..6.17 rows=1 width=32)
               Index Cond: (c0 = (t5.c0 - t5.c0))
(7 rows)

database0=# select t0.c0 from t0 join t5 on t0.c0 = (t5.c0 - t5.c0);
ERROR:  type with OID 2139062143 does not exist

How to repeat:
The attached database0.log (created by sqlancer) included statements to repeat this issue.
Firstly, create database test;
then;
psql postgres
\i   /xxx/database0.log

I analyzed aboved issue this weekend. And I found that 
After called ResetExprContext() in MemoizeHash_hash(), the data in mstate->probeslot was corrputed.

in prepare_probe_slot: the data as below:
(gdb) p *(DatumGetRangeTypeP(pslot->tts_values[0]))
$1 = {vl_len_ = 36, rangetypid = 3904}
after called ResetExprContext() in MemoizeHash_hash:
(gdb) p *(DatumGetRangeTypeP(pslot->tts_values[0]))
$3 = {vl_len_ = 264, rangetypid = 2139062143}

I think in prepare_probe_slot(), should called datumCopy as the attached patch does.

Any thoughts? Thanks.
--
Tender Wang
OpenPie:  https://en.openpie.com/
Вложения

Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Andrei Lepikhov
Дата:
On 25/2/2024 20:32, Tender Wang wrote:
> I think in prepare_probe_slot(), should called datumCopy as the attached 
> patch does.
> 
> Any thoughts? Thanks.
Thanks for the report.
I think it is better to invent a Runtime Memory Context; likewise, it is 
already designed in IndexScan and derivatives. Here, you just allocate 
the value in some upper memory context.
Also, I'm curious why such a trivial error hasn't been found for a long time

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Andrei Lepikhov
Дата:
On 26/2/2024 09:52, Andrei Lepikhov wrote:
> On 25/2/2024 20:32, Tender Wang wrote:
>> I think in prepare_probe_slot(), should called datumCopy as the 
>> attached patch does.
>>
>> Any thoughts? Thanks.
> Thanks for the report.
> I think it is better to invent a Runtime Memory Context; likewise, it is 
> already designed in IndexScan and derivatives. Here, you just allocate 
> the value in some upper memory context.
> Also, I'm curious why such a trivial error hasn't been found for a long 
> time
Hmmm. I see the problem (test.sql in attachment for reproduction and 
results). We only detect it by the number of Hits:
   Cache Key: t1.x, (t1.t)::numeric
   Cache Mode: logical
   Hits: 0  Misses: 30  Evictions: 0  Overflows: 0  Memory Usage: 8kB

We see no hits in logical mode and 100 hits in binary mode. We see 15 
hits for both logical and binary mode if parameters are integer numbers 
- no problems with resetting expression context.

Your patch resolves the issue for logical mode - I see 15 hits for 
integer and complex keys. But I still see 100 hits in binary mode. Maybe 
we still have a problem?

What's more, why the Memoize node doesn't see the problem at all?

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Andrei Lepikhov
Дата:
On 26/2/2024 12:44, Tender Wang wrote:
> 
> 
> Andrei Lepikhov <a.lepikhov@postgrespro.ru 
> <mailto:a.lepikhov@postgrespro.ru>> 于2024年2月26日周一 10:57写道:
> 
>     On 25/2/2024 20:32, Tender Wang wrote:
>      > I think in prepare_probe_slot(), should called datumCopy as the
>     attached
>      > patch does.
>      >
>      > Any thoughts? Thanks.
>     Thanks for the report.
>     I think it is better to invent a Runtime Memory Context; likewise,
>     it is
>     already designed in IndexScan and derivatives. Here, you just allocate
>     the value in some upper memory context.
>     Also, I'm curious why such a trivial error hasn't been found for a
>     long time
> 
> 
> Make sense. I found MemoizeState already has a MemoryContext, so I used it.
> I update the patch.
This approach is better for me. In the next version of this patch, I 
included a test case. I am still unsure about the context chosen and the 
stability of the test case. Richard, you recently fixed some Memoize 
issues, could you look at this problem and patch?

-- 
regards,
Andrei Lepikhov
Postgres Professional

Вложения

Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Richard Guo
Дата:

On Mon, Feb 26, 2024 at 3:54 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 26/2/2024 12:44, Tender Wang wrote:
> Make sense. I found MemoizeState already has a MemoryContext, so I used it.
> I update the patch.
This approach is better for me. In the next version of this patch, I
included a test case. I am still unsure about the context chosen and the
stability of the test case. Richard, you recently fixed some Memoize
issues, could you look at this problem and patch?

I looked at this issue a bit.  It seems to me what happens is that at
first the memory areas referenced by probeslot->tts_values[] are
allocated in the per tuple context (see prepare_probe_slot).  And then
in MemoizeHash_hash, after we've calculated the hashkey, we will reset
the per tuple context.  However, later in MemoizeHash_equal, we still
need to reference the values in probeslot->tts_values[], which have been
cleared.

Actually the context would always be reset in MemoizeHash_equal, for
both binary and logical mode.  So I kind of wonder if it's necessary to
reset the context in MemoizeHash_hash.

The ResetExprContext call in MemoizeHash_hash was introduced in
0b053e78b to fix a memory leak issue.

commit 0b053e78b5990cd01e7169ee5bd2bb8e4045deea
Author: David Rowley <drowley@postgresql.org>
Date:   Thu Oct 5 20:30:47 2023 +1300

    Fix memory leak in Memoize code

It seems to me that switching to the per-tuple memory context is
sufficient to fix the memory leak.  Calling ResetExprContext in
MemoizeHash_hash each time seems too aggressive.

I tried to remove the ResetExprContext call in MemoizeHash_hash and did
not see the memory leak with the repro query in [1].

diff --git a/src/backend/executor/nodeMemoize.c b/src/backend/executor/nodeMemoize.c
index 18870f10e1..f2f025520d 100644
--- a/src/backend/executor/nodeMemoize.c
+++ b/src/backend/executor/nodeMemoize.c
@@ -207,7 +207,6 @@ MemoizeHash_hash(struct memoize_hash *tb, const MemoizeKey *key)
                }
        }

-       ResetExprContext(econtext);
        MemoryContextSwitchTo(oldcontext);
        return murmurhash32(hashkey);
 }

Looping in David to have a look.

[1] https://www.postgresql.org/message-id/flat/83281eed63c74e4f940317186372abfd%40cft.ru

Thanks
Richard

Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Andrei Lepikhov
Дата:
On 26/2/2024 18:34, Richard Guo wrote:
> 
> On Mon, Feb 26, 2024 at 3:54 PM Andrei Lepikhov 
> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
> 
>     On 26/2/2024 12:44, Tender Wang wrote:
>      > Make sense. I found MemoizeState already has a MemoryContext, so
>     I used it.
>      > I update the patch.
>     This approach is better for me. In the next version of this patch, I
>     included a test case. I am still unsure about the context chosen and
>     the
>     stability of the test case. Richard, you recently fixed some Memoize
>     issues, could you look at this problem and patch?
> 
> 
> I looked at this issue a bit.  It seems to me what happens is that at
> first the memory areas referenced by probeslot->tts_values[] are
> allocated in the per tuple context (see prepare_probe_slot).  And then
> in MemoizeHash_hash, after we've calculated the hashkey, we will reset
> the per tuple context.  However, later in MemoizeHash_equal, we still
> need to reference the values in probeslot->tts_values[], which have been
> cleared.
Agree
> 
> Actually the context would always be reset in MemoizeHash_equal, for
> both binary and logical mode.  So I kind of wonder if it's necessary to
> reset the context in MemoizeHash_hash.
I can only provide one thought against this solution: what if we have a 
lot of unique hash values, maybe all of them? In that case, we still 
have a kind of 'leak' David fixed by the commit 0b053e78b5.
Also, I have a segfault report of one client. As I see, it was caused by 
too long text column in the table slot. As I see, key value, stored in 
the Memoize hash table, was corrupted, and the most plain reason is this 
bug. Should we add a test on this bug, and what do you think about the 
one proposed in v3?

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Tender Wang
Дата:
The attached patch is a new version based on v3(not including Andrei's the test case). There is no need to call datumCopy when
isnull is true. 

I have not added a new runtime memoryContext so far. Continue to use mstate->tableContext, I'm not sure the memory used of probeslot will affect mstate->mem_limit. 
Maybe adding a new memoryContext is better. I think I should spend a little time to learn nodeMemoize.c more deeply.

Andrei Lepikhov <a.lepikhov@postgrespro.ru> 于2024年2月26日周一 20:29写道:
On 26/2/2024 18:34, Richard Guo wrote:
>
> On Mon, Feb 26, 2024 at 3:54 PM Andrei Lepikhov
> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
>
>     On 26/2/2024 12:44, Tender Wang wrote:
>      > Make sense. I found MemoizeState already has a MemoryContext, so
>     I used it.
>      > I update the patch.
>     This approach is better for me. In the next version of this patch, I
>     included a test case. I am still unsure about the context chosen and
>     the
>     stability of the test case. Richard, you recently fixed some Memoize
>     issues, could you look at this problem and patch?
>
>
> I looked at this issue a bit.  It seems to me what happens is that at
> first the memory areas referenced by probeslot->tts_values[] are
> allocated in the per tuple context (see prepare_probe_slot).  And then
> in MemoizeHash_hash, after we've calculated the hashkey, we will reset
> the per tuple context.  However, later in MemoizeHash_equal, we still
> need to reference the values in probeslot->tts_values[], which have been
> cleared.
Agree
>
> Actually the context would always be reset in MemoizeHash_equal, for
> both binary and logical mode.  So I kind of wonder if it's necessary to
> reset the context in MemoizeHash_hash.
I can only provide one thought against this solution: what if we have a
lot of unique hash values, maybe all of them? In that case, we still
have a kind of 'leak' David fixed by the commit 0b053e78b5.
Also, I have a segfault report of one client. As I see, it was caused by
too long text column in the table slot. As I see, key value, stored in
the Memoize hash table, was corrupted, and the most plain reason is this
bug. Should we add a test on this bug, and what do you think about the
one proposed in v3?

--
regards,
Andrei Lepikhov
Postgres Professional



--
Tender Wang
OpenPie:  https://en.openpie.com/
Вложения

Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Andrei Lepikhov
Дата:
On 28/2/2024 13:53, Tender Wang wrote:
> The attached patch is a new version based on v3(not including Andrei's 
> the test case). There is no need to call datumCopy when
> isnull is true.
> 
> I have not added a new runtime memoryContext so far. Continue to use 
> mstate->tableContext, I'm not sure the memory used of probeslot will 
> affect mstate->mem_limit.
> Maybe adding a new memoryContext is better. I think I should spend a 
> little time to learn nodeMemoize.c more deeply.
I am curious about your reasons to stay with tableContext. In terms of 
memory allocation, Richard's approach looks better.
Also, You don't need to initialize tts_values[i] at all if tts_isnull[i] 
set to true.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Tender Wang
Дата:
I read Memoize code and how other node use ResetExprContext() recently.

The comments about per_tuple_memory said that :

 * ecxt_per_tuple_memory is a short-term context for expression results.
 *  As the name suggests, it will typically be reset once per tuple,
 *  before we begin to evaluate expressions for that tuple.  Each
 *  ExprContext normally has its very own per-tuple memory context.

So ResetExprContext() should called once per tuple, but not in Hash and Equal function just as Richard said before.
In ExecResult() and ExecProjectSet(), they call ResetExprContext() once when enter these functions. 
So I think ExecMemoize() can do the same way. 

The attached patch includes below modifications:
1. 
When I read the code in nodeMemoize.c, I found a typos: outer should be inner, 
if I don't misunderstand the intend of Memoize.

2.
I found that almost executor node call CHECK_FOR_INTERRUPTS(), so I add it.
Is it right to add it for ExecMemoize()?

3.
I remove ResetExprContext() from Hash and Equal funciton. And I call it when enter
ExecMemoize() just like ExecPrejectSet() does.
ExecQualAndReset() is replaed with ExecQual().

4.
This patch doesn't include test case.  I use the Andrei's test case, but I don't repeat the aboved issue.
I may need to spend some more time to think about how to repeat this issue easily.

So, what do you think about the one proposed in v5? @Andrei Lepikhov  @Richard Guo @David Rowley .
I don't want to continue to do work based on v3 patch. As Andrei Lepikhov said, using mstate->tableContext for probeslot
is not good.  v5 looks more simple.
Вложения

Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Tender Wang
Дата:
Hi,

When I think about how to add a test case for v5 version patch, and I want to test if v5 version patch has memory leak.
This thread [1] provided a way how to repeat the memory leak, so I used it to test v5 patch. I didn't found memory leak on
v5 patch.

But I found other interesting issue. When changed whereClause in [1],  the query reported below error:

"ERROR could not find memoization table entry"

the query:
EXPLAIN analyze
select sum(q.id_table1)
from (
SELECT t2.*
FROM table1 t1
JOIN table2 t2
ON (t2.id_table1 + t2.id_table1) = t1.id) q;

But on v5 patch, it didn't report error.

I guess it is the same reason that data in probeslot was reset in Hash function.

I debug the above query, and get this:
before
(gdb) p *(DatumGetNumeric(mstate->probeslot->tts_values[0]))
$1 = {vl_len_ = 48, choice = {n_header = 32770, n_long = {n_sign_dscale = 32770, n_weight = 60, n_data = 0x564632ebd708}, n_short = {n_header = 32770, n_data = 0x564632ebd706}}}
after
(gdb) p *(DatumGetNumeric(mstate->probeslot->tts_values[0]))
$2 = {vl_len_ = 264, choice = {n_header = 32639, n_long = {n_sign_dscale = 32639, n_weight = 32639, n_data = 0x564632ebd6a8}, n_short = {n_header = 32639, n_data = 0x564632ebd6a6}}}

So after call ResetExprContext() in Hash function, the data in probeslot is corrupted.  It is not sure what error will happen when executing on corrupted data.

During debug, I learned that numeric_add doesn't have type check like rangetype, so aboved query will not report "type with xxx does not exist".

And I realize that  the test case added by Andrei Lepikhov  in v3 is right. So in v6 patch I add Andrei Lepikhov's test case.  Thanks a lot.

Now I think the v6 version patch seems to be complete now.




--
Tender Wang
OpenPie:  https://en.openpie.com/
Вложения

Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Andrei Lepikhov
Дата:
On 1/3/2024 14:18, Tender Wang wrote:
> During debug, I learned that numeric_add doesn't have type check like 
> rangetype, so aboved query will not report "type with xxx does not exist".
> 
> And I realize that  the test case added by Andrei Lepikhov  in v3 is 
> right. So in v6 patch I add Andrei Lepikhov's test case.  Thanks a lot.
> 
> Now I think the v6 version patch seems to be complete now.
I've passed through the patch, and it looks okay. Although I am afraid 
of the same problems that future changes can cause and how to detect 
them, it works correctly.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Tender Wang
Дата:


Andrei Lepikhov <a.lepikhov@postgrespro.ru> 于2024年3月5日周二 17:36写道:
On 1/3/2024 14:18, Tender Wang wrote:
> During debug, I learned that numeric_add doesn't have type check like
> rangetype, so aboved query will not report "type with xxx does not exist".
>
> And I realize that  the test case added by Andrei Lepikhov  in v3 is
> right. So in v6 patch I add Andrei Lepikhov's test case.  Thanks a lot.
>
> Now I think the v6 version patch seems to be complete now.
I've passed through the patch, and it looks okay. Although I am afraid
of the same problems that future changes can cause and how to detect
them, it works correctly.

Thanks for reviewing it, and I add it to commitfest 2024-07.


--
Tender Wang
OpenPie:  https://en.openpie.com/

Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Andrei Lepikhov
Дата:
On 6/3/2024 10:10, Tender Wang wrote:
> 
> 
> Andrei Lepikhov <a.lepikhov@postgrespro.ru 
> <mailto:a.lepikhov@postgrespro.ru>> 于2024年3月5日周二 17:36写道:
> 
>     On 1/3/2024 14:18, Tender Wang wrote:
>      > During debug, I learned that numeric_add doesn't have type check
>     like
>      > rangetype, so aboved query will not report "type with xxx does
>     not exist".
>      >
>      > And I realize that  the test case added by Andrei Lepikhov  in v3 is
>      > right. So in v6 patch I add Andrei Lepikhov's test case.  Thanks
>     a lot.
>      >
>      > Now I think the v6 version patch seems to be complete now.
>     I've passed through the patch, and it looks okay. Although I am afraid
>     of the same problems that future changes can cause and how to detect
>     them, it works correctly.
> 
> 
> Thanks for reviewing it, and I add it to commitfest 2024-07.
I think, it is a bug. Should it be fixed (and back-patched) earlier?

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Tender Wang
Дата:


Andrei Lepikhov <a.lepikhov@postgrespro.ru> 于2024年3月6日周三 11:37写道:
I think, it is a bug. Should it be fixed (and back-patched) earlier?

Agreed.   Need David to review it as he knows this area best.


--
Tender Wang
OpenPie:  https://en.openpie.com/

Re: "type with xxxx does not exist" when doing ExecMemoize()

От
David Rowley
Дата:
On Thu, 7 Mar 2024 at 15:24, Tender Wang <tndrwang@gmail.com> wrote:
>
> Andrei Lepikhov <a.lepikhov@postgrespro.ru> 于2024年3月6日周三 11:37写道:
>> I think, it is a bug. Should it be fixed (and back-patched) earlier?
>
> Agreed.   Need David to review it as he knows this area best.

This is on my list of things to do. Just not at the top yet.

David



Re: "type with xxxx does not exist" when doing ExecMemoize()

От
David Rowley
Дата:
On Thu, 7 Mar 2024 at 22:50, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 7 Mar 2024 at 15:24, Tender Wang <tndrwang@gmail.com> wrote:
> >
> > Andrei Lepikhov <a.lepikhov@postgrespro.ru> 于2024年3月6日周三 11:37写道:
> >> I think, it is a bug. Should it be fixed (and back-patched) earlier?
> >
> > Agreed.   Need David to review it as he knows this area best.
>
> This is on my list of things to do. Just not at the top yet.

I've gone over this patch and I'm happy with the changes to
nodeMemoize.c.  The thing I did change was the newly added test.  The
problem there was the test was passing for me with and without the
code fix.  I ended up changing the test so the cache hits and misses
are reported.  That required moving the test to above where the
work_mem is set to 64KB so we can be certain the values will all be
cached and the cache hits are predictable.

My other changes were just cosmetic.

Thanks for working on this fix.  I've pushed the patch.

David



Re: "type with xxxx does not exist" when doing ExecMemoize()

От
Tender Wang
Дата:


David Rowley <dgrowleyml@gmail.com> 于2024年3月11日周一 13:25写道:
On Thu, 7 Mar 2024 at 22:50, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 7 Mar 2024 at 15:24, Tender Wang <tndrwang@gmail.com> wrote:
> >
> > Andrei Lepikhov <a.lepikhov@postgrespro.ru> 于2024年3月6日周三 11:37写道:
> >> I think, it is a bug. Should it be fixed (and back-patched) earlier?
> >
> > Agreed.   Need David to review it as he knows this area best.
>
> This is on my list of things to do. Just not at the top yet.

I've gone over this patch and I'm happy with the changes to
nodeMemoize.c.  The thing I did change was the newly added test.  The
problem there was the test was passing for me with and without the
code fix.  I ended up changing the test so the cache hits and misses
are reported.  That required moving the test to above where the
work_mem is set to 64KB so we can be certain the values will all be
cached and the cache hits are predictable.

My other changes were just cosmetic.

Thanks for working on this fix.  I've pushed the patch.

David

Thanks for pushing the patch.
--
Tender Wang
OpenPie:  https://en.openpie.com/