Re: Parallel copy
От | Amit Kapila |
---|---|
Тема | Re: Parallel copy |
Дата | |
Msg-id | CAA4eK1+FDd=yH=YdvzCJxRCZjFRP-5iV73B83=1uSnwxaO2STw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel copy (vignesh C <vignesh21@gmail.com>) |
Список | pgsql-hackers |
On Thu, Aug 27, 2020 at 4:56 PM vignesh C <vignesh21@gmail.com> wrote: > > On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422@gmail.com> wrote: > > > > > > > I have attached new set of patches with the fixes. > > > > Thoughts? > > > > > > Hi Vignesh, > > > > > > I don't really have any further comments on the code, but would like > > > to share some results of some Parallel Copy performance tests I ran > > > (attached). > > > > > > The tests loaded a 5GB CSV data file into a 100 column table (of > > > different data types). The following were varied as part of the test: > > > - Number of workers (1 – 10) > > > - No indexes / 4-indexes > > > - Default settings / increased resources (shared_buffers,work_mem, etc.) > > > > > > (I did not do any partition-related tests as I believe those type of > > > tests were previously performed) > > > > > > I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4). > > > The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM. > > > > > > > > > I observed the following trends: > > > - For the data file size used, Parallel Copy achieved best performance > > > using about 9 – 10 workers. Larger data files may benefit from using > > > more workers. However, I couldn’t really see any better performance, > > > for example, from using 16 workers on a 10GB CSV data file compared to > > > using 8 workers. Results may also vary depending on machine > > > characteristics. > > > - Parallel Copy with 1 worker ran slower than normal Copy in a couple > > > of cases (I did question if allowing 1 worker was useful in my patch > > > review). > > > > I think the reason is that for 1 worker case there is not much > > parallelization as a leader doesn't perform the actual load work. > > Vignesh, can you please once see if the results are reproducible at > > your end, if so, we can once compare the perf profiles to see why in > > some cases we get improvement and in other cases not. Based on that we > > can decide whether to allow the 1 worker case or not. > > > > I will spend some time on this and update. > Thanks. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: