Speed up COPY FROM text/CSV parsing using SIMD

Поиск

Список

Период

Сортировка

От	Shinya Kato
Тема	Speed up COPY FROM text/CSV parsing using SIMD
Дата	7 августа 04:48:30
Msg-id	CAOzEurSW8cNr6TPKsjrstnPfhf4QyQqB4tnPXGGe8N4e_v7Jig@mail.gmail.com обсуждение исходный текст
Список	pgsql-hackers

Дерево обсуждения

Hi hackers,

I have implemented SIMD optimization for the COPY FROM (FORMAT {csv,
text}) command and observed approximately a 5% performance
improvement. Please see the detailed test results below.

Idea
====
The current text/CSV parser processes input byte-by-byte, checking
whether each byte is a special character (\n, \r, quote, escape) or a
regular character, and transitions states in a state machine. This
sequential processing is inefficient and likely causes frequent branch
mispredictions due to the many if statements.

I thought this problem could be addressed by leveraging SIMD and
vectorized operations for faster processing.

Implementation Overview
=======================
1. Create a vector of special characters (e.g., Vector8 nl =
vector8_broadcast('\n');).
2. Load the input buffer into a Vector8 variable called chunk.
3. Perform vectorized operations between chunk and the special
character vectors to check if the buffer contains any special
characters.
4-1. If no special characters are found, advance the input_buf_ptr by
sizeof(Vector8).
4-2. If special characters are found, advance the input_buf_ptr as far
as possible, then fall back to the original text/CSV parser for
byte-by-byte processing.

Test
====
I tested the performance by measuring the time it takes to load a CSV
file created using the attached SQL script with the following COPY
command:
=# COPY t FROM '/tmp/t.csv' (FORMAT csv);

Environment
-----------
OS: Rocky Linux 9.6
CPU: Intel Core i7-10710U (6 Cores / 12 Threads, 1.1 GHz Base / 4.7
GHz Boost, AVX2 & FMA supported)

Time
----
master: 02.44.943
patch applied: 02:36.878 (about 5% faster)

Perf
----
Each call graphs are attached and the rates of CopyReadLineText are:
master: 12.15%
patch applied: 8.04%

Thought?
I would appreciate feedback on the implementation and any suggestions
for further improvement.

-- 
Best regards,
Shinya Kato
NTT OSS Center

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Speed up COPY FROM text/CSV parsing using SIMD

Вложения