Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes

Поиск
Список
Период
Сортировка
От Антон Степаненко
Тема Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes
Дата
Msg-id 56961308340142@web152.yandex.ru
обсуждение исходный текст
Ответ на Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-bugs

17.06.2011, 21:24, "Merlin Moncure" <mmoncure@gmail.com>:
> 2011/6/17 Антон Степаненко <zlobnynigga@yandex.ru>;:
>
>>  17.06.2011, 20:19, "Merlin Moncure" <mmoncure@gmail.com>;:
>>>  On Fri, Jun 17, 2011 at 10:56 AM, Kevin Grittner
>>>  <Kevin.Grittner@wicourts.gov>;; wrote:
>>>>>   I still do not believe that this is hardware problem.
>>>>   How would an application cause a bus error?
>>>  unaligned memory access on risc maybe?  what's this running on?
>>>
>>>  merlin
>>  *****:~$ cat /proc/cpuinfo
>>  processor       : 0
>>  ....
>>  processor       : 23
>>  vendor_id       : GenuineIntel
>>  cpu family      : 6
>>  model           : 44
>>  model name      : Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
>
> hm, I'm wondering if this
>
(http://us.generation-nt.com/bug-626451-linux-image-mremap-returns-useless-pages-moving-anonymous-shared-mmap-access-causes-sigbus-help-203302832.html)
> has anything to do with your problem.
>
> merlin

Thank you very much, very interesting link. I've compiled it under my ubuntu lucid - it really causes sigbus. But when
compiledunder CentOS 2.6.18 - it makes the same. So I am not sure that this is a bug. 
And event if it is - why it occurs only when buffers are set to 12Gb and filled...
I've read some sources of postgresql, e.g. /src/backend/storage/smgr/md.c:
void
mdread(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,   char *buffer)
{
..
if (nbytes != BLCKSZ){    if (nbytes < 0)        ereport(ERROR,                (errcode_for_file_access(),
  errmsg("could not read block %u in file \"%s\": %m",                        blocknum, FilePathName(v->mdfd_vfd)))); 
    /*     * Short read: we are at or past EOF, or we read a partial block at     * EOF.  Normally this is an error;
upperlevels should never try to     * read a nonexistent block.  However, if zero_damaged_pages is ON or     * we are
InRecovery,we should instead return zeroes without     * complaining.  This allows, for example, the case of trying to
  * update a block that was later truncated away.     */    if (zero_damaged_pages || InRecovery)        MemSet(buffer,
0,BLCKSZ);    else        ereport(ERROR,                (errcode(ERRCODE_DATA_CORRUPTED),                 errmsg("could
notread block %u in file \"%s\": read only %d of %d bytes",                        blocknum, FilePathName(v->mdfd_vfd),
                      nbytes, BLCKSZ)));} 
}

This is the only place reporting errors like 'could not read block in file'.
Then I lookead at /src/backend/storage/file/fd.c:
int
FileRead(File file, char *buffer, int amount)
{
..
retry:returnCode = read(VfdCache[file].fd, buffer, amount);
if (returnCode >= 0)    VfdCache[file].seekPos += returnCode;else{    /*     * Windows may run out of kernel buffers
andreturn "Insufficient     * system resources" error.  Wait a bit and retry to solve it.     *     * It is rumored
thatEINTR is also possible on some Unix filesystems,     * in which case immediate retry is indicated.     */ 
#ifdef WIN32    ...
#endif    /* OK to retry if interrupted */    if (errno == EINTR)        goto retry;
    /* Trouble, so assume we don't know the file position anymore */    VfdCache[file].seekPos = Fileiso-8859-1Pos;}
return returnCode;
}

First, comment started with 'It is rumored' looks suspiciosly =) But I am not a kernel developer, I am event not a C++
developer,so I trust authors. 
I've read 'man read' and 'man 7 signal', and it is said that syscalls could be interrupted by some signals, including
sigbus,but when they do so, they should return to normal behaviour. 
"the call will be automatically restarted after the signal handler returns if the SA_RESTART flag was used; otherwise
thecall will fail with the error EINTR" - from man 7 signal 
So as I far as I understand even if postgresql gets signal 7 it should experience EINTR and retry immediately. What I
amtrying to say is that I do not know why I am getting sigbus, but no matter where it comes from, according to sources
postgresqlshould just try to read one more time, and one more, and so on until read succeeded. But I'm not quite sure
whathappens first - sigbus or 'could not read block' error. 


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Behaviour of triggers on replicated and non replicated tables
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes