[HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns
От | Andres Freund |
---|---|
Тема | [HACKERS] SendRowDescriptionMessage() is slow for queries with a lot of columns |
Дата | |
Msg-id | 20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de обсуждение исходный текст |
Ответы |
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with alot of columns
Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with alot of columns Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with alot of columns |
Список | pgsql-hackers |
Hi, When running workloads that include fast queries with a lot of columns, SendRowDescriptionMessage(), and the routines it calls, becomes a bottleneck. Besides syscache lookups (see [1] and [2]) a major cost of that is manipulation of the StringBuffer and the version specific branches in the per-attribute loop. As so often, the performance differential of this patch gets bigger when the other performance patches are applied. The issues in SendRowDescriptionMessage() are the following: 1) All the pq_sendint calls, and also the pq_sendstring, are expensive. The amount of calculations done for a single 2/4 byte addition to the stringbuffer (particularly enlargeStringInfo()) is problematic, as are the reallocations themselves. I addressed this by adding pq_send*_pre() wrappers, implemented as inline functions, that require that memory is pre-allocated. Combining that with doing a enlargeStringInfo() in SendRowDescriptionMessage() that pre-allocates the maximum required memory, yields pretty good speedup. I'm not yet super sure about the implementation. For one, I'm not sure this shouldn't instead be stringinfo.h functions, with very very tiny pqformat.h wrappers. But conversely I think it'd make a lot of sense for the pqformat integer functions to get rid of the continually maintained trailing null-byte - I was hoping the compiler could optimize that away, but alas, no luck. As soon as a single integer is sent, you can't rely on 0 terminated strings anyway. 2) It creates a new StringInfo in every iteration. That causes noticeable memory management overhead, and restarts the size of the buffer every time. When the description is relatively large, that leads to a number of reallocations for every SendRowDescriptionMessage() call. My solution here was to create persistent StringInfo for SendRowDescriptionMessage() that never gets deallocated (just reset). That in combination with new versions of pq_beginmessage/endmessage that keep the buffer alive, yields a nice speedup. Currently I'm using a static variable to allocate a string buffer for the function. It'd probably better to manage that outside of a single function - I'm also wondering why we're allocating a good number of stringinfos in various places, rather than reuse them. Most of them can't be entered recursively, and even if that's a concern, we could have one reusable per portal or such. 3) The v2/v3 branches in the attribute loop are noticeable (others too, but well...). I solved that by splitting out the v2 and v3 per-attribute loops into separate functions. Imo also a good chunk more readable. Comments? Greetings, Andres Freund [1] http://archives.postgresql.org/message-id/CA+Tgmobj72E_tG6w98H0oUbCCUmoC4uRmjocYPbnWC2RxYACeg@mail.gmail.com [2] http://archives.postgresql.org/message-id/20170914061207.zxotvyopetm7lrrp%40alap3.anarazel.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: