Обсуждение: pg_upgrade broken by xlog numbering

Поиск
Список
Период
Сортировка

pg_upgrade broken by xlog numbering

От
"Kevin Grittner"
Дата:
On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
build:

+ pg_upgrade -d
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
-B
/home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
Performing Consistency Checks
-----------------------------
Checking current, bin, and data directories                 ok
Checking cluster versions                                   ok
Some required control information is missing;  cannot find: first log file ID after reset first log file segment after
reset

Cannot continue without required control information, terminating
Failure, exiting




Re: pg_upgrade broken by xlog numbering

От
Robert Haas
Дата:
On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
> build:
>
> + pg_upgrade -d
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
> -B
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
> Performing Consistency Checks
> -----------------------------
> Checking current, bin, and data directories                 ok
> Checking cluster versions                                   ok
> Some required control information is missing;  cannot find:
>  first log file ID after reset
>  first log file segment after reset
>
> Cannot continue without required control information, terminating
> Failure, exiting

On MacOS X, on latest sources, initdb fails:

creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in
/Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... FATAL:  control file contains invalid data
child process exited with exit code 1
initdb: data directory
"/Users/rhaas/pgsql/src/test/regress/./tmp_check/data" not removed at
user's request

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade broken by xlog numbering

От
Thom Brown
Дата:
On 25 June 2012 13:11, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
> build:
>
> + pg_upgrade -d
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data.old -D
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/data -b
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
> -B
> /home/kevin/pg/master/contrib/pg_upgrade/tmp_check/install//home/kevin/pg/master/Debug/bin
> Performing Consistency Checks
> -----------------------------
> Checking current, bin, and data directories                 ok
> Checking cluster versions                                   ok
> Some required control information is missing;  cannot find:
>  first log file ID after reset
>  first log file segment after reset
>
> Cannot continue without required control information, terminating
> Failure, exiting

I get precisely the same on 64-bit Linux.

--
Thom


Re: pg_upgrade broken by xlog numbering

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On MacOS X, on latest sources, initdb fails:

> creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 32MB
> creating configuration files ... ok
> creating template1 database in
> /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
> initializing pg_authid ... ok
> initializing dependencies ... ok
> creating system views ... ok
> loading system objects' descriptions ... ok
> creating collations ... ok
> creating conversions ... ok
> creating dictionaries ... FATAL:  control file contains invalid data
> child process exited with exit code 1

Same for me.  It's crashing here:
   if (ControlFile->state < DB_SHUTDOWNED ||       ControlFile->state > DB_IN_PRODUCTION ||
!XRecOffIsValid(ControlFile->checkPoint))      ereport(FATAL,               (errmsg("control file contains invalid
data")));

state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test.
ControlFile->checkPoint == 19972072 (0x130BFE8), what's wrong with that?

(I suppose the reason this is only failing on some machines is
platform-specific variations in xlog entry size, but it's still a bit
distressing that this got committed in such a broken state.)
        regards, tom lane


Re: pg_upgrade broken by xlog numbering

От
Robert Haas
Дата:
On Mon, Jun 25, 2012 at 11:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On MacOS X, on latest sources, initdb fails:
>
>> creating directory /Users/rhaas/pgsql/src/test/regress/./tmp_check/data ... ok
>> creating subdirectories ... ok
>> selecting default max_connections ... 100
>> selecting default shared_buffers ... 32MB
>> creating configuration files ... ok
>> creating template1 database in
>> /Users/rhaas/pgsql/src/test/regress/./tmp_check/data/base/1 ... ok
>> initializing pg_authid ... ok
>> initializing dependencies ... ok
>> creating system views ... ok
>> loading system objects' descriptions ... ok
>> creating collations ... ok
>> creating conversions ... ok
>> creating dictionaries ... FATAL:  control file contains invalid data
>> child process exited with exit code 1
>
> Same for me.  It's crashing here:
>
>    if (ControlFile->state < DB_SHUTDOWNED ||
>        ControlFile->state > DB_IN_PRODUCTION ||
>        !XRecOffIsValid(ControlFile->checkPoint))
>        ereport(FATAL,
>                (errmsg("control file contains invalid data")));
>
> state == DB_SHUTDOWNED, so the problem is with the XRecOffIsValid test.
> ControlFile->checkPoint == 19972072 (0x130BFE8), what's wrong with that?
>
> (I suppose the reason this is only failing on some machines is
> platform-specific variations in xlog entry size, but it's still a bit
> distressing that this got committed in such a broken state.)

I'm guessing that the problem is as follows: in the old code, the
XLogRecord header could not be split, so any offset that was closer to
the end of the page than SizeOfXLogRecord was a sure sign of trouble.
But commit 061e7efb1b4c5b8a5d02122b7780531b8d5bf23d relaxed that
restriction, so now it IS legal for the checkpoint record to be where
it is.  But it seems that XRecOffIsValid() didn't get the memo.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_upgrade broken by xlog numbering

От
Robert Haas
Дата:
On Mon, Jun 25, 2012 at 8:11 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> On HEAD at the moment, `make check-world` is failing on a 32-bit Linux
> build:

This appears to be because of the following hunk from commit
dfda6ebaec6763090fb78b458a979b558c50b39b:

@@ -558,10 +536,10 @@ PrintControlValues(bool guessed)       snprintf(sysident_str, sizeof(sysident_str),
UINT64_FORMAT,                       ControlFile.system_identifier);
 

-       printf(_("First log file ID after reset:        %u\n"),
-                  newXlogId);
-       printf(_("First log file segment after reset:   %u\n"),
-                  newXlogSeg);
+       XLogFileName(fname, ControlFile.checkPointCopy.ThisTimeLineID, newXlogSe
+
+       printf(_("First log segment after reset:        %s\n"),
+                  fname);       printf(_("pg_control version number:            %u\n"),
ControlFile.pg_control_version);      printf(_("Catalog version number:               %u\n"),
 

Evidently, Heikki failed to realize that pg_upgrade gets the control
data information by parsing the output of pg_controldata.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company