Avoid detoast overhead when possible
От | zhihuifan1213@163.com |
---|---|
Тема | Avoid detoast overhead when possible |
Дата | |
Msg-id | 87ttoyihgm.fsf@163.com обсуждение исходный текст |
Ответы |
Re: Avoid detoast overhead when possible
Re: Avoid detoast overhead when possible |
Список | pgsql-hackers |
Currently our code can do lazily detoast by design, for example: SELECT toast_col FROM t; SELECT toast_col FROM t ORDER BY b; SELECT toast_col FROM t join t2 using(c); it is only detoast at {type}_out function. The benefits includes: 1. The life time of detoast datum is pretty short which is good for general memory usage. 2. In the order by / hash case, the less memory usage can let the work_mem hold more tuples so it is good for performance aspect. Recently I run into a user case like this: create table b(big jsonb); ... select big->'1', big->'2', big->'3', big->'5', big->'10' from b; In the above query, we can see the 'big' datum is detoasted 5 times, and if the toast value is huge, it causes a pretty bad performance. jsonb will be a common case to access the toast value multi times, but it is possible for other data type as well. for example: SELECT f1(big_toast_col), f2(big_toast_col) FROM t; I attached a POC patch which eagerly detoast the datum during EEOP_INNER/OUTER/SCAN_VAR step and store the detoast value back to the original slot->tts_values, so the later call of slot->tts_values[n] will use the detoast value automatically. With the attached setup.sql and the patch, the performance is easy to reduced to 310ms from 1600ms. select big->'1', big->'2', big->'3', big->'5', big->'10' from b; QUERY PLAN --------------------------------------------------------------- Seq Scan on b (actual time=1.731..1577.911 rows=1001 loops=1) Planning Time: 0.099 ms Execution Time: 1578.411 ms (3 rows) set jit to off; select big->'1', big->'2', big->'3', big->'5', big->'10' from b; QUERY PLAN -------------------------------------------------------------- Seq Scan on b (actual time=0.417..309.937 rows=1001 loops=1) Planning Time: 0.097 ms Execution Time: 310.255 m (I used 'jit=off' to turn on this feature just because I'm still not ready for JIT code.) However this patch just throws away almost all the benefits of toast, so how can we draw a line between should vs should not do this code path? IMO, we should only run the 'eagerly detoast' when we know that we will have a FuncCall against the toast_col on the current plan node. I think this information can be get from Qual and TargetList. If so, we can set the slot->detoast_attrs accordingly. if we code like this: SELECT f1(toast_col) FROM t join t2 using(c); We only apply the code path on the join plan node, so even the join method is hash / sort merge, the benefit of toast is still there. 'SELECT f1(toast_col) FROM t;' will apply this code path, but nothing gain and nothing lost. Applying this code path only when the toast datum is accessed 1+ times needs some extra run-time effort. I don't implement this so far, I'd like to see if I miss some obvious points. Any feedback is welcome. -- Best Regards Andy Fan
Вложения
В списке pgsql-hackers по дате отправления: