Proposal for Improving Concurrent Index Creation Performance
От | Sergey Sargsyan |
---|---|
Тема | Proposal for Improving Concurrent Index Creation Performance |
Дата | |
Msg-id | CAMAof6_FY0MrNJOuBrqvQqJKiwskFvjRtgpVHf-D7A=KvTtYXg@mail.gmail.com обсуждение исходный текст |
Список | pgsql-hackers |
Hi PostgreSQL Hackers,
I've been exploring the process of concurrent index creation and noticed a potential area for performance improvement, especially for large indexes. Currently, the process involves multiple stages: creating the index (initially invalid and not ready), builing the index, validating the index by checking that all tuples are included, and finally swapping the old index with the new one.
The validation stage, where we sort the index entries by TID and compare them to the heap, is currently single-threaded and can become a bottleneck for large indexes.
To address this, I propose a modification: during the creation of the index, we can create it as an empty but ready index. This means that while the index is being built, new transactions will start adding tuples to it immediately. So during the build stage we can do everything as before, but instead of building index from scratch from tuples, we will just merge new tuples into already built index.
More simpler approach could be to do everything as before, but create one "temporary" index (build empty, ready) alongside with creation of index itself.
Then basically during the build stage we may build our index as before, and instead of old validation stage, we can just iterate over our "temp" index, and move all tuples into main index.
I am mostly interested for such improvement for btree indexes, but i guess it should work for all of them.
I'm curious if there are any potential pitfalls or reasons this approach might not work as expected. I'd appreciate any feedback or insights from the community on this idea.
Thank you!
Best regards,
Sergey Sargsian
В списке pgsql-hackers по дате отправления: