Re: Parallel copy
От | Ants Aasma |
---|---|
Тема | Re: Parallel copy |
Дата | |
Msg-id | CANwKhkPmM18UYpOt_AEB4JC6fa0dfA1PfgiQyNzeNUxEpG=XUw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel copy (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Parallel copy
|
Список | pgsql-hackers |
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote: > This is something similar to what I had also in mind for this idea. I > had thought of handing over complete chunk (64K or whatever we > decide). The one thing that slightly bothers me is that we will add > some additional overhead of copying to and from shared memory which > was earlier from local process memory. And, the tokenization (finding > line boundaries) would be serial. I think that tokenization should be > a small part of the overall work we do during the copy operation, but > will do some measurements to ascertain the same. I don't think any extra copying is needed. The reader can directly fread()/pq_copymsgbytes() into shared memory, and the workers can run CopyReadLineText() inner loop directly off of the buffer in shared memory. For serial performance of tokenization into lines, I really think a SIMD based approach will be fast enough for quite some time. I hacked up the code in the simdcsv project to only tokenize on line endings and it was able to tokenize a CSV file with short lines at 8+ GB/s. There are going to be many other bottlenecks before this one starts limiting. Patch attached if you'd like to try that out. Regards, Ants Aasma
Вложения
В списке pgsql-hackers по дате отправления: