Re: Implement UNLOGGED clause for COPY FROM
От | Fujii Masao |
---|---|
Тема | Re: Implement UNLOGGED clause for COPY FROM |
Дата | |
Msg-id | 86ea09f8-a5ee-a998-84c8-e4e0c9425edb@oss.nttdata.com обсуждение исходный текст |
Ответ на | RE: Implement UNLOGGED clause for COPY FROM ("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>) |
Список | pgsql-hackers |
On 2020/07/09 15:17, osumi.takamichi@fujitsu.com wrote: > Fujii-san > > Thank you for your interest in this idea. > >> This feature can work safely with wal_level=replica or logical? >> Or it can work only with wal_level=minimal? >> If yes, what is the main difference >> between this method and wal_skip_threshold? > I'm thinking this feature can be used > when you set any parameters of wal_level. > Besides that, data loading into a table *with some data* > should be allowed. This means I don't want to limit > the usage of this feature only for initial load > for empty table or under condition of 'minimal' wal_level in other words. > > Let me explain more detail of the background. > > I got a report that one of my customers says that > multiple COPY from multiple sessions have a bottleneck to write WAL. > Her use case was DWH system using postgres mainly to load dozens of GB (or more) log data > from multiple data sources to execute night batch processing everyday. > > Her scenario included both initial load to empty table > and load to table that already has records. Yes, I understand this use case. > > In passing, she also used our company's product of parallel loader, > to load data with dozens of, nearly 100, BGWs at the same time. > Through investigation of iostat, > they found the same problem that CPU worked for WAL write intensively. > > This could happen after the implementation > of Parallel copy that is really hotly discussed and reviewed in the mailing lists now. > So I thought it's good to discuss this in advance. > >>> 4. Execute the data loading, bypassing WAL generation for data. >>> 5. Sync the data to disk by performing checkpoint. >> >> What happens if the server crashes before #5? Since no WAL for data-loading can >> be replayed, the target table should be truncated? > My answer for this is just to load that COPY data again. > It's because the application itself knows what kind of data was loaded > from the command. When the server crashes before #5, some table and index pages that #4 loaded the data into might have been partially synced to the disk because of bgwriter or shared buffer replacement. So ISTM that the target table needs to be truncated to empty during recovery and users need to load whole the data into the table again. Is my understanding right? If yes, isn't this the same feature as that UNLOGGED table provides? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
В списке pgsql-hackers по дате отправления: