Re: [HACKERS] sort on huge table
От | Bruce Momjian |
---|---|
Тема | Re: [HACKERS] sort on huge table |
Дата | |
Msg-id | 199911011800.NAA20652@candle.pha.pa.us обсуждение исходный текст |
Ответ на | Re: [HACKERS] sort on huge table (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: [HACKERS] sort on huge table
|
Список | pgsql-hackers |
> Next question is what to do about it. I don't suppose we have any way > of turning off the OS' read-ahead algorithm :-(. We could forget about > this space-recycling improvement and go back to separate temp files. > The objection to that, of course, is that while sorting might be faster, > it doesn't matter how fast the algorithm is if you don't have the disk > space to execute it. Look what I found. I downloaded Linux kernel source for 2.2.0, and started looking for the word 'ahead' in the file system files. I found that read-ahead seems to be controlled by f_reada, and look where I found it being turned off? Seems like any seek turns off read-ahead on Linux. When you do a read or write, it seems to be turned on again. Once you read/write, the next read/write will do read-ahead, assuming you don't do any lseek() before the second read/write(). Seems like the algorithm in psort now is rarely having read-ahead on Linux, while other OS's check to see if the read-ahead was eventually used, and control read-ahead that way. read-head also seems be off on the first read from a file. --------------------------------------------------------------------------- /** linux/fs/ext2/file.c ... /** Make sure the offset never goes beyond the 32-bit mark..*/ static long long ext2_file_lseek(struct file *file,long long offset,int origin) {struct inode *inode = file->f_dentry->d_inode; switch (origin) { case 2: offset += inode->i_size; break; case 1: offset += file->f_pos;}if (((unsignedlong long) offset >> 32) != 0) { #if BITS_PER_LONG < 64 return -EINVAL; #else if (offset > ext2_max_sizes[EXT2_BLOCK_SIZE_BITS(inode->i_sb)]) return -EINVAL; #endif} if (offset != file->f_pos) { file->f_pos = offset; file->f_reada = 0; file->f_version = ++event;}returnoffset; } -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
В списке pgsql-hackers по дате отправления: