Обсуждение: Avoid detoast overhead when possible

Поиск
Список
Период
Сортировка

Avoid detoast overhead when possible

От
zhihuifan1213@163.com
Дата:
Currently our code can do lazily detoast by design, for example:

SELECT toast_col FROM t; 
SELECT toast_col FROM t ORDER BY b;
SELECT toast_col FROM t join t2 using(c); 

it is only detoast at {type}_out function. The benefits includes:
1. The life time of detoast datum is pretty short which is good for
   general memory usage.
2. In the order by / hash case, the less memory usage can let the
   work_mem hold more tuples so it is good for performance aspect.

Recently I run into a user case like this:

create table b(big jsonb);
...
select big->'1', big->'2', big->'3', big->'5', big->'10' from b;

In the above query, we can see the 'big' datum is detoasted 5 times, and
if the toast value is huge, it causes a pretty bad performance. jsonb
will be a common case to access the toast value multi times, but it
is possible for other data type as well. for example:

SELECT f1(big_toast_col), f2(big_toast_col) FROM t;

I attached a POC patch which eagerly detoast the datum during
EEOP_INNER/OUTER/SCAN_VAR step and store the detoast value back to the
original slot->tts_values, so the later call of slot->tts_values[n] will
use the detoast value automatically. With the attached setup.sql and
the patch, the performance is easy to reduced to 310ms from 1600ms.

select big->'1', big->'2', big->'3', big->'5', big->'10' from b;  
                          QUERY PLAN                           
---------------------------------------------------------------
 Seq Scan on b (actual time=1.731..1577.911 rows=1001 loops=1)
 Planning Time: 0.099 ms
 Execution Time: 1578.411 ms
(3 rows) 

set jit to off;

select big->'1', big->'2', big->'3', big->'5', big->'10' from b;  
                          QUERY PLAN                          
--------------------------------------------------------------
 Seq Scan on b (actual time=0.417..309.937 rows=1001 loops=1)
 Planning Time: 0.097 ms
 Execution Time: 310.255 m

(I used 'jit=off' to turn on this feature just because I'm still not
ready for JIT code.)

However this patch just throws away almost all the benefits of toast, so
how can we draw a line between should vs should not do this code path?
IMO, we should only run the 'eagerly detoast' when we know that we will
have a FuncCall against the toast_col on the current plan node. I think
this information can be get from Qual and TargetList. If so, we can set
the slot->detoast_attrs accordingly.

if we code like this: 

SELECT f1(toast_col) FROM t join t2 using(c);

We only apply the code path on the join plan node, so even the join method
is hash / sort merge, the benefit of toast is still there.

'SELECT f1(toast_col) FROM t;' will apply this code path, but nothing
gain and nothing lost.  Applying this code path only when the toast
datum is accessed 1+ times needs some extra run-time effort. I don't
implement this so far, I'd like to see if I miss some obvious points.
Any feedback is welcome.



-- 
Best Regards
Andy Fan

Вложения

Re: Avoid detoast overhead when possible

От
Nikita Malakhov
Дата:
Hi!

There's a view from the other angle - detoast just attributes that are needed
(partial detoast), with optimized storage mechanics for JSONb. I'm preparing
a patch for it, so maybe the best results could be acquired by combining these
two techniques.

What do you think?

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Avoid detoast overhead when possible

От
zhihuifan1213@163.com
Дата:

Nikita Malakhov <hukutoc@gmail.com> writes:

Hi!
>
> There's a view from the other angle - detoast just attributes that are needed
> (partial detoast), with optimized storage mechanics for JSONb.

Very glad to know that,  looking forward your design & patch!

-- 
Best Regards
Andy Fan




Re: Avoid detoast overhead when possible

От
Matthias van de Meent
Дата:
On Mon, 4 Dec 2023 at 07:56, <zhihuifan1213@163.com> wrote:
> 'SELECT f1(toast_col) FROM t;' will apply this code path, but nothing
> gain and nothing lost.  Applying this code path only when the toast
> datum is accessed 1+ times needs some extra run-time effort. I don't
> implement this so far, I'd like to see if I miss some obvious points.
> Any feedback is welcome.

This does add some measurable memory overhead to query execution where
the produced derivative of the large toasted field is small (e.g. 1MB
toast value -> 2x BIGINT), and when the toasted value is deep in the
query tree (e.g. 3 nested loops deep). It would also add overhead when
we write results to disk, such as spilling merge sorts, hash join
spills, or CTE materializations.

Could you find a way to reduce this memory and IO usage when the value
is not going to be used immediately? Using the toast pointer at such
points surely will be cheaper than storing the full value again and
again.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Avoid detoast overhead when possible

От
zhihuifan1213@163.com
Дата:
Hi,

Matthias van de Meent <boekewurm+postgres@gmail.com> writes:

> On Mon, 4 Dec 2023 at 07:56, <zhihuifan1213@163.com> wrote:

> ..It would also add overhead when
> we write results to disk, such as spilling merge sorts, hash join
> spills, or CTE materializations.
>
> Could you find a way to reduce this memory and IO usage when the value
> is not going to be used immediately? Using the toast pointer at such
> points surely will be cheaper than storing the full value again and
> again.

I'm not sure I understand you correctly, I think the issue you raised
here is covered by the below design (not implemented in the patch).

"
However this patch just throws away almost all the benefits of toast, so
how can we draw a line between should vs should not do this code path?
IMO, we should only run the 'eagerly detoast' when we know that we will
have a FuncCall against the toast_col on **the current plan node**. I
think this information can be get from Qual and TargetList. If so, we
can set the slot->detoast_attrs accordingly.
"

Let's see an example of this:

SELECT f(t1.toastable_col) FROM t1 join t2 using(c);

Suppose it is using hash join and t1 should be hashed.  With the above
design, we will NOT detoast toastable_col at the scan of t1 or hash t1
since there is no one "funcall" access it in either SeqScan of t1 or
hash (t1). But when we do the projection on the joinrel, the detoast
would happen.

I'm still working on how to know if a toast_col will be detoast for a
given PlanState. If there is no design error, I think I can work out a
version tomorrow.

-- 
Best Regards
Andy Fan




Re: Avoid detoast overhead when possible

От
Matthias van de Meent
Дата:
On Mon, 4 Dec 2023 at 14:23, <zhihuifan1213@163.com> wrote:
>
>
> Hi,
>
> Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
>
> > On Mon, 4 Dec 2023 at 07:56, <zhihuifan1213@163.com> wrote:
>
> > ..It would also add overhead when
> > we write results to disk, such as spilling merge sorts, hash join
> > spills, or CTE materializations.
> >
> > Could you find a way to reduce this memory and IO usage when the value
> > is not going to be used immediately? Using the toast pointer at such
> > points surely will be cheaper than storing the full value again and
> > again.
>
> I'm not sure I understand you correctly, I think the issue you raised
> here is covered by the below design (not implemented in the patch).
>
> "
> However this patch just throws away almost all the benefits of toast, so
> how can we draw a line between should vs should not do this code path?
> IMO, we should only run the 'eagerly detoast' when we know that we will
> have a FuncCall against the toast_col on **the current plan node**. I
> think this information can be get from Qual and TargetList. If so, we
> can set the slot->detoast_attrs accordingly.
> "
>
> Let's see an example of this:
>
> SELECT f(t1.toastable_col) FROM t1 join t2 using(c);
>
> Suppose it is using hash join and t1 should be hashed.  With the above
> design, we will NOT detoast toastable_col at the scan of t1 or hash t1
> since there is no one "funcall" access it in either SeqScan of t1 or
> hash (t1). But when we do the projection on the joinrel, the detoast
> would happen.

I assume that you detoast the column only once, and not in a separate
per-node context? This would indicate to me that a query like the
following would detoast toastable_col and never "retoast" it.

SELECT toastable_col FROM t1
WHERE f(t1.toastable_col)
ORDER BY nonindexed;

or the equivalent in current PG catalogs:

SELECT ev_class
FROM pg_rewrite
WHERE octet_length(ev_action) > 1
ORDER BY ev_class;

whose plan is

 Sort
   Sort Key: ev_class
   ->  Seq Scan on pg_rewrite
         Filter: (octet_length((ev_action)::text) > 1)

This would first apply the condition (because sort-then-filter is
generally more expensive than filter-then-sort), and  thus permanently
detoast the column, which is thus detoasted when it is fed into the
sort, which made the sort much more expensive than without the
aggressive detoasting.

Or do I still misunderstand something here?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Avoid detoast overhead when possible

От
zhihuifan1213@163.com
Дата:
Hi,

Matthias van de Meent <boekewurm+postgres@gmail.com> writes:

> SELECT toastable_col FROM t1
> WHERE f(t1.toastable_col)
> ORDER BY nonindexed;

Thanks for this example! it's true that the current design requires more
memory to sort since toastable_col is detoasted at the scan stage and it
is output to the sort node. It should be avoided.

> SELECT ev_class
> FROM pg_rewrite
> WHERE octet_length(ev_action) > 1
> ORDER BY ev_class;

This one is different I think, since the ev_action (the toastable_col) is
*NOT* output to sort node, so no extra memory is required IIUC. 

 * CP_SMALL_TLIST specifies that a narrower tlist is preferred.  This is
 * passed down by parent nodes such as Sort and Hash, which will have to
 * store the returned tuples.

We can also verify this by

explain (costs off, verbose) SELECT ev_class
FROM pg_rewrite
WHERE octet_length(ev_action) > 1
ORDER BY ev_class;
                            QUERY PLAN                            
------------------------------------------------------------------
 Sort
   Output: ev_class
   Sort Key: pg_rewrite.ev_class
   ->  Seq Scan on pg_catalog.pg_rewrite
         Output: ev_class
         Filter: (octet_length((pg_rewrite.ev_action)::text) > 1)
(6 rows)

Only ev_class is output to Sort node.

So if we want to make sure there is performance regression for all the
existing queries in any case, we can add 1 more restriction into the
saved-detoast-value logic. It must be (NOT under CP_SMALL_TLIST) OR (the
toastable_col is not in the output list). It can be a planner decision.

If we code like this, the result will be we need to dotoast N times
for toastable_col in qual for the below query.

SELECT toastable_col FROM t
WHERE f1(toastable_col)
AND f2(toastable_col)
..
AND fn(toastable_col)
ORDER BY any-target-entry;

However

SELECT
  f1(toastable_col),
  f2(toastable_col),
  ..
  fn(toastable_col)
FROM t
ORDER BY any-target-entry;

the current path still works for it.

This one is my favorite one so far. Another option is saving the
detoast-value in some other memory or existing-slot-in-place for
different sistuation, that would requires more expr expression changes
and planner changes. I just checked all the queries in my hand, the
current design can cover all of them. 

-- 
Best Regards
Andy Fan




Re: Avoid detoast overhead when possible

От
Nikita Malakhov
Дата:
Hi,

Hmmm, I've checked this patch and can't see performance difference on a large
(20000 key-value pairs) json, using toasted json column several times makes no
difference between current implementation on master (like queries mentioned above).

Maybe I'm doing something wrong?

On Tue, Dec 5, 2023 at 4:16 AM <zhihuifan1213@163.com> wrote:

Hi,

Matthias van de Meent <boekewurm+postgres@gmail.com> writes:

> SELECT toastable_col FROM t1
> WHERE f(t1.toastable_col)
> ORDER BY nonindexed;

Thanks for this example! it's true that the current design requires more
memory to sort since toastable_col is detoasted at the scan stage and it
is output to the sort node. It should be avoided.

> SELECT ev_class
> FROM pg_rewrite
> WHERE octet_length(ev_action) > 1
> ORDER BY ev_class;

This one is different I think, since the ev_action (the toastable_col) is
*NOT* output to sort node, so no extra memory is required IIUC.

 * CP_SMALL_TLIST specifies that a narrower tlist is preferred.  This is
 * passed down by parent nodes such as Sort and Hash, which will have to
 * store the returned tuples.

We can also verify this by

explain (costs off, verbose) SELECT ev_class
FROM pg_rewrite
WHERE octet_length(ev_action) > 1
ORDER BY ev_class;
                            QUERY PLAN                           
------------------------------------------------------------------
 Sort
   Output: ev_class
   Sort Key: pg_rewrite.ev_class
   ->  Seq Scan on pg_catalog.pg_rewrite
         Output: ev_class
         Filter: (octet_length((pg_rewrite.ev_action)::text) > 1)
(6 rows)

Only ev_class is output to Sort node.

So if we want to make sure there is performance regression for all the
existing queries in any case, we can add 1 more restriction into the
saved-detoast-value logic. It must be (NOT under CP_SMALL_TLIST) OR (the
toastable_col is not in the output list). It can be a planner decision.

If we code like this, the result will be we need to dotoast N times
for toastable_col in qual for the below query.

SELECT toastable_col FROM t
WHERE f1(toastable_col)
AND f2(toastable_col)
..
AND fn(toastable_col)
ORDER BY any-target-entry;

However

SELECT
  f1(toastable_col),
  f2(toastable_col),
  ..
  fn(toastable_col)
FROM t
ORDER BY any-target-entry;

the current path still works for it.

This one is my favorite one so far. Another option is saving the
detoast-value in some other memory or existing-slot-in-place for
different sistuation, that would requires more expr expression changes
and planner changes. I just checked all the queries in my hand, the
current design can cover all of them.

--
Best Regards
Andy Fan





--
Regards,

--
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Avoid detoast overhead when possible

От
zhihuifan1213@163.com
Дата:
Nikita Malakhov <hukutoc@gmail.com> writes:

> Hi,
>
> Hmmm, I've checked this patch and can't see performance difference on a large
> (20000 key-value pairs) json, using toasted json column several times makes no
> difference between current implementation on master (like queries mentioned above).
>
> Maybe I'm doing something wrong?

Could you try something like below? (set jit to off to turn on this
feature). Or could you tell me the steps you used?  I also attached the
setup.sql at the begining of this thread.

select big->'1', big->'2', big->'3', big->'5', big->'10' from b;  
                          QUERY PLAN                           
---------------------------------------------------------------
 Seq Scan on b (actual time=1.731..1577.911 rows=1001 loops=1)
 Planning Time: 0.099 ms
 Execution Time: 1578.411 ms
(3 rows) 

set jit to off;

select big->'1', big->'2', big->'3', big->'5', big->'10' from b;  
                          QUERY PLAN                          
--------------------------------------------------------------
 Seq Scan on b (actual time=0.417..309.937 rows=1001 loops=1)
 Planning Time: 0.097 ms
 Execution Time: 310.255 m

(I used 'jit=off' to turn on this feature just because I'm still not
ready for JIT code.)

-- 
Best Regards
Andy Fan




Re: Avoid detoast overhead when possible

От
Nikita Malakhov
Дата:
Hi,

With your setup (table created with setup.sql):
postgres@postgres=# explain analyze select big->'1', big->'2', big->'3', big->'5', big->'10' from b;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on b  (cost=0.00..29.52 rows=1001 width=160) (actual time=0.656..359.964 rows=1001 loops=1)
 Planning Time: 0.042 ms
 Execution Time: 360.177 ms
(3 rows)

Time: 361.054 ms
postgres@postgres=# explain analyze select big->'1' from b;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on b  (cost=0.00..19.51 rows=1001 width=32) (actual time=0.170..63.996 rows=1001 loops=1)
 Planning Time: 0.042 ms
 Execution Time: 64.063 ms
(3 rows)

Time: 64.626 ms

Without patch, the same table and queries:
postgres@postgres=# explain analyze select big->'1', big->'2', big->'3', big->'5', big->'10' from b;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on b  (cost=0.00..29.52 rows=1001 width=160) (actual time=0.665..326.399 rows=1001 loops=1)
 Planning Time: 0.035 ms
 Execution Time: 326.508 ms
(3 rows)

Time: 327.132 ms
postgres@postgres=# explain analyze select big->'1' from b;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on b  (cost=0.00..19.51 rows=1001 width=32) (actual time=0.159..62.807 rows=1001 loops=1)
 Planning Time: 0.033 ms
 Execution Time: 62.879 ms
(3 rows)

Time: 63.504 ms

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company

Re: Avoid detoast overhead when possible

От
Andy Fan
Дата:
Hi

Nikita Malakhov <hukutoc@gmail.com> writes:
>
> With your setup (table created with setup.sql):

You need to "set jit to off" to turn on this feature, as I state in [1]
[2]. 

[1] https://www.postgresql.org/message-id/87ttoyihgm.fsf%40163.com
[2] https://www.postgresql.org/message-id/877cltvxgt.fsf%40163.com

-- 
Best Regards
Andy Fan