Re: Reviewing freeze map code

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Reviewing freeze map code
Дата
Msg-id 20160610175911.rbbjywcuygzohn56@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Reviewing freeze map code  (Andres Freund <andres@anarazel.de>)
Ответы Re: Reviewing freeze map code  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 2016-06-09 23:39:24 -0700, Andres Freund wrote:
> On 2016-06-10 11:58:26 +0530, Amit Kapila wrote:
> > I have tried in multiple ways by running pgbench with read-write tests, but
> > could not see any such behaviour.
> 
> It took over an hour of pgbench on a fast laptop till I saw it.
> 
> 
> > I have tried by even crashing and
> > restarting the server and then again running pgbench.  Do you see these
> > records on master or slave?
> 
> Master, but with an existing standby. So it could be related to
> hot_standby_feedback or such.

I just managed to trigger it again.


#1  0x00007fa1a73778da in __GI_abort () at abort.c:89
#2  0x00007f9f1395e59c in record_corrupt_item (items=items@entry=0x2137be0, tid=0x7f9fb8681c0c)   at
/home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:612
#3  0x00007f9f1395ead5 in collect_corrupt_items (relid=relid@entry=29449, all_visible=all_visible@entry=0 '\000',
all_frozen=all_frozen@entry=1'\001')   at /home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:572
 
#4  0x00007f9f1395f476 in pg_check_frozen (fcinfo=0x7ffe5343a200) at
/home/andres/src/postgresql/contrib/pg_visibility/pg_visibility.c:292
#5  0x00000000005fdbec in ExecMakeTableFunctionResult (funcexpr=0x2168630, econtext=0x2168320, argContext=<optimized
out>,expectedDesc=0x2168ef0,    randomAccess=0 '\000') at
/home/andres/src/postgresql/src/backend/executor/execQual.c:2211
#6  0x0000000000616992 in FunctionNext (node=node@entry=0x2168210) at
/home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:94
#7  0x00000000005ffdcb in ExecScanFetch (recheckMtd=0x6166f0 <FunctionRecheck>, accessMtd=0x616700 <FunctionNext>,
node=0x2168210)  at /home/andres/src/postgresql/src/backend/executor/execScan.c:95
 
#8  ExecScan (node=node@entry=0x2168210, accessMtd=accessMtd@entry=0x616700 <FunctionNext>,
recheckMtd=recheckMtd@entry=0x6166f0<FunctionRecheck>)   at
/home/andres/src/postgresql/src/backend/executor/execScan.c:145
#9  0x00000000006169e4 in ExecFunctionScan (node=node@entry=0x2168210) at
/home/andres/src/postgresql/src/backend/executor/nodeFunctionscan.c:268

the error happened just after I restarted a standby, so it's not
unlikely to be related to hot_standby_feedback.


(gdb) p *tuple.t_data
$5 = {t_choice = {t_heap = {t_xmin = 9105470, t_xmax = 26049273, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum =
{datum_len_= 9105470,      datum_typmod = 26049273, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 1, bi_lo =
19765},ip_posid = 3}, t_infomask2 = 4, t_infomask = 770,  t_hoff = 24 '\030', t_bits = 0x7f9fb8681c17 ""}
 

Infomask is:
#define HEAP_XMIN_COMMITTED        0x0100    /* t_xmin committed */
#define HEAP_XMIN_INVALID        0x0200    /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN        (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_HASVARWIDTH        0x0002    /* has variable-width attribute(s) */

This indeed looks borked.  Such a tuple should never survive    if (check_frozen && !VM_ALL_FROZEN(rel, blkno,
&vmbuffer))       check_frozen = false;
 
especially not when
(gdb) p PageIsAllVisible(page)
$3 = 4

(fwiw, checking PD_ALL_VISIBLE in those functions sounds like a good plan)


I've got another earlier case (that I somehow missed seeing), below
check_visible:

(gdb) p *tuple->t_data 
$2 = {t_choice = {t_heap = {t_xmin = 13616549, t_xmax = 25210801, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum =
{datum_len_= 13616549,      datum_typmod = 25210801, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 0, bi_lo =
52320},ip_posid = 67}, t_infomask2 = 32772, t_infomask = 8962,  t_hoff = 24 '\030', t_bits = 0x7f9fda2f8717 ""}
 

infomask is:
#define HEAP_UPDATED                    0x2000  /* this is UPDATEd version of row */
#define HEAP_XMIN_COMMITTED        0x0100    /* t_xmin committed */
#define HEAP_XMIN_INVALID        0x0200    /* t_xmin invalid/aborted */
#define HEAP_HASVARWIDTH        0x0002    /* has variable-width attribute(s) */
infomask2 is:
#define HEAP_ONLY_TUPLE                 0x8000  /* this is heap-only tuple */

I'll run again, with a debugger attached, maybe I can get some more
information.


Regards,

Andres



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: parallel.c is not marked as test covered
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: parallel workers and client encoding