Re: Parallel copy
От | Ants Aasma |
---|---|
Тема | Re: Parallel copy |
Дата | |
Msg-id | CANwKhkOu7dWj66gC-N4B5SaLWW7=mLGVbfitquoO7pjtEJRWLg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel copy (David Fetter <david@fetter.org>) |
Ответы |
Re: Parallel copy
|
Список | pgsql-hackers |
On Thu, 20 Feb 2020 at 18:43, David Fetter <david@fetter.org> wrote:> > On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote: > > I think the wc2 is showing that maybe instead of parallelizing the > > parsing, we might instead try using a different tokenizer/parser and > > make the implementation more efficient instead of just throwing more > > CPUs on it. > > That was what I had in mind. > > > I don't know if our code is similar to what wc does, maytbe parsing > > csv is more complicated than what wc does. > > CSV parsing differs from wc in that there are more states in the state > machine, but I don't see anything fundamentally different. The trouble with a state machine based approach is that the state transitions form a dependency chain, which means that at best the processing rate will be 4-5 cycles per byte (L1 latency to fetch the next state). I whipped together a quick prototype that uses SIMD and bitmap manipulations to do the equivalent of CopyReadLineText() in csv mode including quotes and escape handling, this runs at 0.25-0.5 cycles per byte. Regards, Ants Aasma
Вложения
В списке pgsql-hackers по дате отправления: