On Fri, Jun 10, 2016 at 8:27 AM, Andres Freund <andres@anarazel.de> wrote:
On June 9, 2016 7:46:06 PM PDT, Amit Kapila <amit.kapila16@gmail.com> wrote: >On Fri, Jun 10, 2016 at 8:08 AM, Andres Freund <andres@anarazel.de> >wrote: > >> On 2016-06-09 19:33:52 -0700, Andres Freund wrote: >> > I played with it for a while, and besides >> > finding intentionally caused corruption, it didn't flag anything >> > (besides crashing on a standby, as in 2)). >> >> Ugh. Just sends after I sent that email: >> >> oid | t_ctid >> ------------------+-------------- >> pgbench_accounts | (889641,33) >> pgbench_accounts | (893854,56) >> pgbench_accounts | (924226,13) >> pgbench_accounts | (1073457,51) >> pgbench_accounts | (1084904,16) >> pgbench_accounts | (1111996,26) >> (6 rows) >> >> oid | t_ctid >> -----+-------- >> (0 rows) >> >> oid | t_ctid >> ------------------+-------------- >> pgbench_accounts | (739198,13) >> pgbench_accounts | (887254,11) >> pgbench_accounts | (1050391,6) >> pgbench_accounts | (1158640,46) >> pgbench_accounts | (1238067,18) >> pgbench_accounts | (1273282,22) >> pgbench_accounts | (1355816,54) >> pgbench_accounts | (1361880,33) >> (8 rows) >> >> >Is this output of pg_check_visible() or pg_check_frozen()?
Unfortunately I don't know. I was running a union of both, I didn't really expect to hit an issue... I guess I'll put a PANIC in the relevant places and check whether I cab reproduce.
I have tried in multiple ways by running pgbench with read-write tests, but could not see any such behaviour. I have tried by even crashing and restarting the server and then again running pgbench. Do you see these records on master or slave?
While looking at code in this area, I observed that during replay of records (heap_xlog_delete), we first clear the vm, then update the page. So we don't have Buffer lock while updating the vm where as in the patch (collect_corrupt_items()), we are relying on the fact that for clearing vm bit one needs to acquire buffer lock. Can that cause a problem?