Re: Flushing large data immediately in pqcomm

Поиск
Список
Период
Сортировка
От Melih Mutlu
Тема Re: Flushing large data immediately in pqcomm
Дата
Msg-id CAGPVpCSX8bTF61ZL9jOgh1AaY3bgsWnQ6J7WmJK4TV0f2LPnJQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Flushing large data immediately in pqcomm  (Andres Freund <andres@anarazel.de>)
Ответы Re: Flushing large data immediately in pqcomm  (Robert Haas <robertmhaas@gmail.com>)
Re: Flushing large data immediately in pqcomm  (Jelte Fennema-Nio <postgres@jeltef.nl>)
Re: Flushing large data immediately in pqcomm  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
Hi hackers,

I did some experiments with this patch, after previous discussions. This probably does not answer all questions, but would be happy to do more if needed.

First, I updated the patch according to what suggested here [1]. PSA  v2.
I tweaked the master branch a bit to not allow any buffering. I compared HEAD, this patch and no buffering at all.
I also added a simple GUC to control PqSendBufferSize, this change only allows to modify the buffer size and should not have any impact on performance.

I again ran the COPY TO STDOUT command and timed it. AFAIU COPY sends data row by row, and I tried running the command under different scenarios with different # of rows and row sizes. You can find the test script attached (see test.sh).
All timings are in ms.

1- row size = 100 bytes, # of rows = 1000000
┌───────────┬────────────┬──────┬──────┬──────┬──────┬──────┐
│           │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ HEAD      │ 1036       │ 998  │ 940  │ 910  │ 894  │ 874  │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ patch     │ 1107       │ 1032 │ 980  │ 957  │ 917  │ 909  │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ no buffer │ 6230       │ 6125 │ 6282 │ 6279 │ 6255 │ 6221 │
└───────────┴────────────┴──────┴──────┴──────┴──────┴──────┘


2-  row size = half of the rows are 1KB and rest is 10KB , # of rows = 1000000
┌───────────┬────────────┬───────┬───────┬───────┬───────┬───────┐
│           │ 1400 bytes │ 2KB   │ 4KB   │ 8KB   │ 16KB  │ 32KB  │
├───────────┼────────────┼───────┼───────┼───────┼───────┼───────┤
│ HEAD      │ 25197      │ 23414 │ 20612 │ 19206 │ 18334 │ 18033 │
├───────────┼────────────┼───────┼───────┼───────┼───────┼───────┤
│ patch     │ 19843      │ 19889 │ 19798 │ 19129 │ 18578 │ 18260 │
├───────────┼────────────┼───────┼───────┼───────┼───────┼───────┤
│ no buffer │ 23752      │ 23565 │ 23602 │ 23622 │ 23541 │ 23599 │
└───────────┴────────────┴───────┴───────┴───────┴───────┴───────┘

3-  row size = half of the rows are 1KB and rest is 1MB , # of rows = 1000
┌───────────┬────────────┬──────┬──────┬──────┬──────┬──────┐
│           │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ HEAD      │ 3137       │ 2937 │ 2687 │ 2551 │ 2456 │ 2465 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ patch     │ 2399       │ 2390 │ 2402 │ 2415 │ 2417 │ 2422 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ no buffer │ 2417       │ 2414 │ 2429 │ 2418 │ 2435 │ 2404 │
└───────────┴────────────┴──────┴──────┴──────┴──────┴──────┘

3-  row size = all rows are 1MB , # of rows = 1000
┌───────────┬────────────┬──────┬──────┬──────┬──────┬──────┐
│           │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ HEAD      │ 6113       │ 5764 │ 5281 │ 5009 │ 4885 │ 4872 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ patch     │ 4759       │ 4754 │ 4754 │ 4758 │ 4782 │ 4805 │
├───────────┼────────────┼──────┼──────┼──────┼──────┼──────┤
│ no buffer │ 4756       │ 4774 │ 4793 │ 4766 │ 4770 │ 4774 │
└───────────┴────────────┴──────┴──────┴──────┴──────┴──────┘


Some quick observations:
1- Even though I expect both the patch and HEAD behave similarly in case of small data (case 1: 100 bytes), the patch runs slightly slower than HEAD.
2- In cases where the data does not fit into the buffer, the patch starts performing better than HEAD. For example, in case 2, patch seems faster until the buffer size exceeds the data length. When the buffer size is set to something larger than 10KB (16KB/32KB in this case), there is again a slight performance loss with the patch as in case 1.
3- With large row sizes (i.e. sizes that do not fit into the buffer) not buffering at all starts performing better than HEAD. Similarly the patch performs better too as it stops buffering if data does not fit into the buffer.



[1]


Thanks,
--
Melih Mutlu
Microsoft
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: UUID v7
Следующее
От: Dagfinn Ilmari Mannsåker
Дата:
Сообщение: Re: Using the %m printf format more