Re: Speed up COPY FROM text/CSV parsing using SIMD
| От | Andrew Dunstan | 
|---|---|
| Тема | Re: Speed up COPY FROM text/CSV parsing using SIMD | 
| Дата | |
| Msg-id | 5d81fbbb-7609-4445-9bc4-8af211fb7674@dunslane.net обсуждение исходный текст  | 
		
| Ответ на | Re: Speed up COPY FROM text/CSV parsing using SIMD (Nathan Bossart <nathandbossart@gmail.com>) | 
| Список | pgsql-hackers | 
On 2025-10-22 We 3:24 PM, Nathan Bossart wrote: > On Wed, Oct 22, 2025 at 03:33:37PM +0300, Nazir Bilal Yavuz wrote: >> On Tue, 21 Oct 2025 at 21:40, Nathan Bossart <nathandbossart@gmail.com> wrote: >>> I wonder if we could mitigate the regression further by spacing out the >>> checks a bit more. It could be worth comparing a variety of values to >>> identify what works best with the test data. >> Do you mean that instead of doubling the SIMD sleep, we should >> multiply it by 3 (or another factor)? Or are you referring to >> increasing the maximum sleep from 1024? Or possibly both? > I'm not sure of the precise details, but the main thrust of my suggestion > is to assume that whatever sampling you do to determine whether to use SIMD > is good for a larger chunk of data. That is, if you are sampling 1K lines > and then using the result to choose whether to use SIMD for the next 100K > lines, we could instead bump the latter number to 1M lines (or something). > That way we minimize the regression for relatively uniform data sets while > retaining some ability to adapt in case things change halfway through a > large table. > I'd be ok with numbers like this, although I suspect the numbers of cases where we see shape shifts like this in the middle of a data set would be vanishingly small. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: