Patch proposal - parameter to limit amount of FPW because of hint bits per second
От | Michail Nikolaev |
---|---|
Тема | Patch proposal - parameter to limit amount of FPW because of hint bits per second |
Дата | |
Msg-id | CANtu0ohFKu6ZKMBO+C+-7yQPxC-71Lo3YP+xhaeNyWj4MH1niQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Patch proposal - parameter to limit amount of FPW because of hint bits per second
|
Список | pgsql-hackers |
Hello, hackers. We have a production cluster with 10 hot standby servers. Each server has 48 cores and 762Mbs network. We have experienced multiple temporary downtimes caused by long transactions and hint bits. For example - we are creating a new big index. It could take even a day sometimes. Also, there are some tables with frequently updating indexes (HOT is not used for such tables). Of course, after some time we have experienced higher CPU usage because of tons of “dead” tuples in index and heap. But everything is still working. But real issues come once a long-lived transaction is finally finished. Next index and heap scans start to mark millions of records with the LP_DEAD flag. And it causes a ton of FPW records in WAL. It is impossible to quickly transfer such an amount through the network (or even write to the disk) - and the primary server becomes unavailable with the whole system. You can check the graphic of primary resources for real downtime incident in the attachment. So, I was thinking about a way to avoid such downtimes. What is about a patch to add parameters to limit the number of FPW caused by LP_DEAD bits per second? It is always possible to skip the setting of LP_DEAD for future time. Such a parameter will make it possible to spread all additional WAL traffic over time by some Mbit/s. Does it look worth its implementation? Thanks, Michail.
Вложения
В списке pgsql-hackers по дате отправления: