Re: Can someone verify CVS tip on Win32?
От | Reini Urban |
---|---|
Тема | Re: Can someone verify CVS tip on Win32? |
Дата | |
Msg-id | 419C90ED.70706@x-ray.at обсуждение исходный текст |
Ответ на | Re: Can someone verify CVS tip on Win32? (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers-win32 |
Tom Lane schrieb: > Andrew Dunstan <andrew@dunslane.net> writes: > >>Tom Lane wrote: >> >>>Hmm ... I have a theory about it, but I'm not sure how to reproduce the >>>problem. How many databases have you created in the installation that >>>the contrib installcheck is running against? > > >>Just what make installcheck / make contrib installcheck runs. > > OK. I still haven't been able to reproduce it, but the place where it > is failing is consistent with my theory, which is: > > 1. CREATE DATABASE creates a pg_database row for "regression" that is > the last or nearly last row that will fit into block 0 of pg_database. > It then flushes this block to disk to ensure that new backends can see > the row in GetRawDatabaseInfo. > > 2. pg_regress.sh then does several ALTER DATABASE operations. These > will mark the original row dead and make a new row. At the end of this, > I hypothesize that the live copy of the "regression" row is in > pg_database block 1, not block 0. And it's not been flushed to disk, > because ALTER DATABASE fails to do that. > > 3. (Here's the hard-to-reproduce part.) Assume that something causes > block 0, but not block 1, of pg_database to be flushed from shared > buffers to disk. > > 4. Now, an incoming backend will see the original pg_database row for > "regression" as committed dead, so it'll ignore it. It can't see the > live row because that's not been flushed to disk; it's only in shared > buffers. Ergo, GetRawDatabaseInfo fails. > > The problem goes away as soon as a checkpoint happens, but it's still > possible for the regression tests to fail this way. > > A reasonable theory about step 3 is that the bgwriter chooses to write > out block 0 at just the right time. This would happen infrequently > enough to explain why we've not seen this reported before. > > This theory explains why the failure consistently happens at the same > place in the test sequence, and why that place is machine-architecture > dependent: it can only happen when a certain number of pg_database rows > have been created and deleted, and the magic number depends on the > machine MAXALIGN value because that affects the size of the rows. > > The fix of course is that ALTER DATABASE must flush pg_database to disk, > just as RENAME does. This also explains my strange regression problems on cygwin. Thanks for the change. Everything looks much easier now. -- Reini Urban http://xarch.tu-graz.ac.at/home/rurban/
В списке pgsql-hackers-win32 по дате отправления: