Re: questions on (parallel) COPY and when to REINDEX
От | Andy Colson |
---|---|
Тема | Re: questions on (parallel) COPY and when to REINDEX |
Дата | |
Msg-id | 4A76367D.5060802@squeakycode.net обсуждение исходный текст |
Ответ на | Re: questions on (parallel) COPY and when to REINDEX (Janet Jacobsen <jsjacobsen@lbl.gov>) |
Ответы |
Re: questions on (parallel) COPY and when to REINDEX
|
Список | pgsql-general |
>> On 1 Aug 2009, at 23:24, Janet Jacobsen wrote: >>> My questions are: >>> (2) Should I REINDEX these two tables daily after the pipeline >>> completes? Is this what other people do in practice? >> >> No need, but as people are querying as soon as data is arriving, an >> analyse of the table you just copied to should help performance - the >> query plans will be derived from more accurate table statistics that way. > > The files that are being copied into the table(s) are between 200 to 1,000 > each, but there are hundreds of these small files every day. Would you > recommend running analyze after every COPY? Just once, at the end. Assuming you are only deleting 100k records and re-adding/updating another 100k in a batch. That'snot so many records it'll through the stats out of whack. (If you were dropping/updating 15M in a batch you mightwanna analyze in the middle) >> >> You wrote that your pipeline runs for a period of 4 hours and the >> table is about 15M rows now. What is taking up all that time? I >> understand why you'd want to parallelise that process, but where do >> you expect the gains? > > > We're processing images, and the data processing and analysis takes > up most of the time, but the images can be processed/analyzed in parallel. > We've been doing all of the data loading at the end - one COPY at a > time. Originally that made sense because the researchers wanted to > check the images before loading the data/analysis results into the db. Ah! Images! When you are doing the COPY are you escaping the data? You cant "just" copy a binary file. -Andy
В списке pgsql-general по дате отправления: