Speeding up parts of the planner using a binary search tree structurefor nodes
От | David Rowley |
---|---|
Тема | Speeding up parts of the planner using a binary search tree structurefor nodes |
Дата | |
Msg-id | CAApHDvrEXcadNYAAdq6RO0eKZUG6rRHXJGAbpzj8y432gCD9bA@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Speeding up parts of the planner using a binary search treestructure for nodes
Re: Speeding up parts of the planner using a binary search treestructure for nodes |
Список | pgsql-hackers |
Hi, In [1] I mentioned that I'd like to look into coding a data structure to allow Node types to be looked up more efficiently than what List allows. There are many places in the planner where we must determine if a given Node is in some List of Nodes. This is an O(n) operation. We could reduce these to as low as O(log n) with a binary search tree. A few places in the code that does this which can become a problem when the lists being searched are large: get_eclass_for_sort_expr() tlist_member() For the former, it's quite easy to see this come up in the profiles by querying a partitioned table with many partitions. e.g create table listp (a int) partition by list(a); select 'create table listp'||x::Text || ' partition of listp for values in('||x::text||');' from generate_Series(1,10000)x; \gexec create index on listp(a); explain select * from listp order by a; There are a few places that this new data structure could be used. My primary interest at the moment is EquivalenceClass.ec_members. There are perhaps other interesting places, such as planner targetlists, but obviously, any changes would need to be proved worthwhile before they're made. Per what I mentioned in [1], I'd like this structure to be a binary search tree, (either a red/black or AVL binary search tree compacted into an array.) So instead of having ->left ->right pointers, we have left and right offsets back into the array. This allows very fast and cache-friendly unordered traversals of the tree, as it's an array whenever we want it to be and a tree when we need to perform a lookup of a specific value. Because I'm not using a true tree, i.e pointers to elements, then it does not appear like I can use rbtree.c The first step to make this work is to modify equalfuncs.c to add an equalCompare() function which will return an int to indicate the sort order of the node against another node. I've drafted a patch to implement this and attached it here. I've done this in 2 phases, 0001 creates and implements datatype specific macros for the comparisons. Doing this allows us to optimise the bool/char field comparison macros in 0002 so we can do a simple subtransaction rather than 2 comparisons. The 0002 patch modifies each comparison macro and changes all the equal functions to return int rather than bool. The external comparison function is equalCompare(). equal() just calls equalCompare() and checks the value returned is 0. With both patches, there is an increase in size of about 17% for the object file for equalfuncs: equalfuncs.o Master: 56768 bytes Patched: 66912 bytes If I don't use the macros specifically optimised for bool and char, then the size is 68240 bytes. So I think doing 0001 is worth it. This does break the API for ExtensibleNodeMethods as it's no longer enough to just have nodeEqual(). In the 0002 patch I've changed this to nodeCompare(). Extension authors who are using this will need to alter their code to implement a proper comparison function. For now, I'm mostly just interested in getting feedback about making this change to equalfuncs.c. Does anyone object to it? David [1] https://www.postgresql.org/message-id/CAKJS1f8v-fUG8YpaAGj309ZuALo3aEk7f6cqMHr_AVwz1fKXug%40mail.gmail.com
Вложения
В списке pgsql-hackers по дате отправления: