Re: 10.5 but not 10.4: backend startup during reindex system: couldnot read block 0 in file "base/16400/..": read only 0 of 8192 bytes
От | Justin Pryzby |
---|---|
Тема | Re: 10.5 but not 10.4: backend startup during reindex system: couldnot read block 0 in file "base/16400/..": read only 0 of 8192 bytes |
Дата | |
Msg-id | 20180830215711.GW23024@telsasoft.com обсуждение исходный текст |
Ответ на | Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes
|
Список | pgsql-hackers |
On Thu, Aug 30, 2018 at 05:30:30PM -0400, Tom Lane wrote: > Justin Pryzby <pryzby@telsasoft.com> writes: > > On Wed, Aug 29, 2018 at 11:35:51AM -0400, Tom Lane wrote: > >> As far as we can tell, that bug is a dozen years old, so it's not clear > >> why you find that you can reproduce it only in 10.5. But there might be > >> some subtle timing change accounting for that. > > > It seems to me there's one root problem occurring in (at least) two slightly > > different ways. The issue/symptom that I've been seeing occurs in 10.5 but not > > 10.4, and specifically at commit 2ce64ca, but not before. > > Yeah, as you probably saw in the other thread, we later realized that > 2ce64ca created an additional pathway for ScanPgRelation to recurse; > a pathway that's evidently easier to hit than the pre-existing ones. > I note that both of your stack traces display ScanPgRelation recursion, > so I'm feeling pretty confident that what you're seeing is the same > thing. > > But, as Andres says, it'd be great if you could confirm whether the > draft patches fix it for you. I tested with relcache-rebuild.diff which hasn't broken in 15min, so I'm confident that doesn't hit the additional recusive pathway, but have to wait awhile and see if autovacuum survives, too. I tried to apply fix-missed-inval-msg-accepts-1.patch on top of PG10.5 but patch didn't apply, so I can test HEAD after the first patch soaks awhile. Just curious, is there really any difficulty in reproducing this? Once I realized this was a continuing issue and started to suspect pg10.5, it takes just about nothing to reproduce anywhere I've tried. I just tested 5 servers, and only one took more than a handful of seconds to fail. I gave up waiting for a 6th server, because I found it was waiting on a pre-existing lock. [pryzbyj@database ~]$ while :; do for a in pg_class_oid_index pg_class_relname_nsp_index pg_class_tblspc_relfilenode_index;do psql ts -qc "REINDEX INDEX $a"; done; done& [pryzbyj@database ~]$ a=0; time while psql ts -qc ''; do a=$((1+a)); done ; echo "$a" psql: FATAL: could not read block 0 in file "base/16400/313581263": read only 0 of 8192 bytes real 0m1.772s user 0m0.076s sys 0m0.116s 47 Justin
В списке pgsql-hackers по дате отправления: