Partitioning performance: cache stringToNode() of pg_constraint.ccbin
От | Noah Misch |
---|---|
Тема | Partitioning performance: cache stringToNode() of pg_constraint.ccbin |
Дата | |
Msg-id | 20130603190727.GA360354@tornado.leadboat.com обсуждение исходный текст |
Ответы |
Re: Partitioning performance: cache stringToNode() of pg_constraint.ccbin
Re: Partitioning performance: cache stringToNode() of pg_constraint.ccbin |
Список | pgsql-hackers |
A colleague, Korry Douglas, observed a table partitioning scenario where deserializing pg_constraint.ccbin is a hot spot. The following test case, a simplification of a typical partitioning setup, spends 28% of its time in stringToNode() and callees thereof: \timing on \set n 600000 BEGIN; CREATE TABLE bench_check_constr_parent (c int); CREATE TABLE bench_check_constr_child ( CHECK (c > 0 AND c <= 100000000) ) INHERITS (bench_check_constr_parent); CREATE FUNCTION trig() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN INSERT INTO bench_check_constr_child VALUES (NEW.*); RETURN NULL; END $$; CREATE TRIGGER redir BEFORE INSERT ON bench_check_constr_parent FOR EACH ROW EXECUTE PROCEDURE trig(); -- Main benchmark INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n); TRUNCATE bench_check_constr_parent; INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n); TRUNCATE bench_check_constr_parent; INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n); TRUNCATE bench_check_constr_parent; -- Compare direct insert performance @ 10x volume INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10); TRUNCATE bench_check_constr_parent; INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10); TRUNCATE bench_check_constr_parent; INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10); TRUNCATE bench_check_constr_parent; ROLLBACK; The executor caches each CHECK constraint in ResultRelInfo as a planned expression. That cache is highly effectively for long-running statements, but the trivial INSERTs effectively work without a cache. Korry devised this patch to cache the stringToNode() form of the constraint in the relcache. It improves the benchmark's partitioned scenario by 33%: -- Timings (seconds) -- master, INSERT parent: 14.2, 14.4, 14.4 patched, INSERT parent: 9.6, 9.7, 9.7 master, INSERT*10 child: 9.9, 9.9, 10.2 patched, INSERT*10 child: 10.0, 10.2, 10.2 There's still not much to like about that tenfold overhead from use of the partition routing trigger, but this patch makes a nice cut into that overhead without doing anything aggressive. The profile no longer shows low-hanging fruit; running an entire SQL statement per row piles on the runtime from a wide range of sources. For anyone curious, I've attached output from "perf report -s parent -g graph,1,caller" with the patch applied; I suggest browsing under "less -S". Some call sites need to modify the node tree, so the patch has them do copyObject(). I ran a microbenchmark of copyObject() on the cached node tree vs. redoing stringToNode(), and copyObject() still won by a factor of four. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: