Обсуждение: Misleading/inaccurate error message from pg_basebackup

Поиск
Список
Период
Сортировка

Misleading/inaccurate error message from pg_basebackup

От
c@osss.net
Дата:
pg_basebackup can throw an error which is inaccurate and misleading:

$ mkdir /var/lib/postgresql/14
$ ls -ld /var/lib/postgresql/14
drwxr-x--- 2 postgres postgres 6 Jan 22 16:08 /var/lib/postgresql/14
$ pg_basebackup --version
Error: /var/lib/postgresql/14/main is not accessible; please fix the directory permissions (/var/lib/postgresql/14/
shouldbe world readable) 

Reading that error suggests that I need to make the directory /var/lib/postgresql/14 world-readable, but doing so does
notfix the problem, as the actual problem is that the main subdirectory does not exist: 

$ chmod a+rx /var/lib/postgresql/14
$ ls -ld /var/lib/postgresql/14
drwxr-xr-x 2 postgres postgres 6 Jan 22 16:08 /var/lib/postgresql/14
$ pg_basebackup --version
Error: /var/lib/postgresql/14/main is not accessible; please fix the directory permissions (/var/lib/postgresql/14/
shouldbe world readable) 

In fact it does not need to be world readable at all - the subdirectory just needs to be created:

$ chmod o-rx /var/lib/postgresql/14
$ mkdir /var/lib/postgresql/14/main
$ ls -ld /var/lib/postgresql/14{,/main}
drwxr-x--- 3 postgres postgres 19 Jan 22 16:14 /var/lib/postgresql/14
drwxr-x--- 2 postgres postgres  6 Jan 22 16:14 /var/lib/postgresql/14/main
$ pg_basebackup --version
pg_basebackup (PostgreSQL) 14.10 (Ubuntu 14.10-1.pgdg20.04+1)

This check is being done in cases where it's unnecessary, as it shouldn't matter at all when running a simple --version
or--help, anyways. 

--
Regards,
Casey


Re: Misleading/inaccurate error message from pg_basebackup

От
Daniel Gustafsson
Дата:
> On 22 Jan 2024, at 17:47, c@osss.net wrote:

> pg_basebackup can throw an error which is inaccurate and misleading:

> $ pg_basebackup --version
> Error: /var/lib/postgresql/14/main is not accessible; please fix the directory permissions (/var/lib/postgresql/14/
shouldbe world readable) 
>
> In fact it does not need to be world readable at all - the subdirectory just needs to be created:

> This check is being done in cases where it's unnecessary, as it shouldn't matter at all when running a simple
--versionor --help, anyways. 

There is no such check before --version or --help, and no such check at all in
pg_basebackup.  Whatever is raising that error probably isn't postgres or
pg_basebackup.  When pg_basebackup logs an error it looks like this:

pg_basebackup: error: /var/lib/postgresql/14/main is not accessible

Something else is doing this on your system.

--
Daniel Gustafsson




Re: Misleading/inaccurate error message from pg_basebackup

От
Casey
Дата:
Hmm, this must be a problem with the Debian packaging then.

root@prime-or1-pg-truth-2:~# file /usr/bin/pg_basebackup 

/usr/bin/pg_basebackup: symbolic link to ../share/postgresql-common/pg_wrapper


Do you know the correct place to file an appropriate report for the Debian package maintainers?

Thanks,
-- 
Casey

On Jan 23, 2024, at 5:27 AM, Daniel Gustafsson <daniel@yesql.se> wrote:

On 22 Jan 2024, at 17:47, c@osss.net wrote:

pg_basebackup can throw an error which is inaccurate and misleading:

$ pg_basebackup --version
Error: /var/lib/postgresql/14/main is not accessible; please fix the directory permissions (/var/lib/postgresql/14/ should be world readable)

In fact it does not need to be world readable at all - the subdirectory just needs to be created:

This check is being done in cases where it's unnecessary, as it shouldn't matter at all when running a simple --version or --help, anyways.

There is no such check before --version or --help, and no such check at all in
pg_basebackup.  Whatever is raising that error probably isn't postgres or
pg_basebackup.  When pg_basebackup logs an error it looks like this:

pg_basebackup: error: /var/lib/postgresql/14/main is not accessible

Something else is doing this on your system.

--
Daniel Gustafsson


Re: Misleading/inaccurate error message from pg_basebackup

От
Michael Paquier
Дата:
On Tue, Jan 23, 2024 at 12:37:57PM -0500, Casey wrote:
> Hmm, this must be a problem with the Debian packaging then.
>
> root@prime-or1-pg-truth-2:~# file /usr/bin/pg_basebackup
> /usr/bin/pg_basebackup: symbolic link to ../share/postgresql-common/pg_wrapper
>
> Do you know the correct place to file an appropriate report for the Debian package maintainers?

Christoph Berg would be the correct person.  I am adding him in CC for
comments.
--
Michael

Вложения

Re: Misleading/inaccurate error message from pg_basebackup

От
Christoph Berg
Дата:
Re: c@osss.net
> pg_basebackup can throw an error which is inaccurate and misleading:
> 
> $ mkdir /var/lib/postgresql/14

Hi Casey,

what previous error/message lead you to create that directory? The
data directory should be existing before.

What does `pg_lsclusters` report?

Where is the actual data directory of that cluster?

Christoph



Re: Misleading/inaccurate error message from pg_basebackup

От
Christoph Berg
Дата:
Re: Casey Shobe
> Below is pasted my initial message, which gives more context and detail.  Let me know if anything is still inclear
afterthis.  The context is that I use Patroni to run a multi-node cluster, and WAL-G creates a hidden directory within
thewal directory which I did not initially notice when I otherwise emptied it before reinitializing a node after
replacingdisk for the data volume.  This led to a fair bit of time wasted looking for the wrong problem:
 

I did reply to your initially message and all the questions are still
open.

Christoph



Re: Misleading/inaccurate error message from pg_basebackup

От
Christoph Berg
Дата:
Re: Casey
> I thought that I addressed your inquiries as best as I was able.  Can you please clarify any remaining questions?

What did you do to make you believe that you had to "mkdir" in the
first place?

Also, please keep it on the list.

> > On Jan 24, 2024, at 6:48 AM, Christoph Berg <myon@debian.org> wrote:
> > 
> > Re: Casey Shobe
> >> Below is pasted my initial message, which gives more context and detail.  Let me know if anything is still inclear
afterthis.  The context is that I use Patroni to run a multi-node cluster, and WAL-G creates a hidden directory within
thewal directory which I did not initially notice when I otherwise emptied it before reinitializing a node after
replacingdisk for the data volume.  This led to a fair bit of time wasted looking for the wrong problem:
 
> > 
> > I did reply to your initially message and all the questions are still
> > open.
> > 
> > Christoph

Christoph



Re: Misleading/inaccurate error message from pg_basebackup

От
Casey
Дата:
I didn't believe I had to mkdir, those were just test cases to illustrate the problem in isolation.  I had been trying
toreinitialize a node after replacing disks used for the data volume, using Patroni.  When that failed due to a
pg_basebackuperror, it removed the data directory. 

To be clear, I'm mounting separate volumes at:
/var/lib/postgresql
/var/lib/postgresql/wal

The data and wal directories are a couple levels under those:
/var/lib/postgresql/14/main
/var/lib/postgresql/wal/14/main

So when Patroni ran into a pg_basebackup error, it removed /var/lib/postgresq/14/main and /var/lib/postgresql/14.  It
alsodoes not log the specific generated pg_basebackup command.  As I couldn't tell why it was erroring, I tried to
recreatethat command myself based on the configuration and defaults.  When I did that, I didn't think about specifying
thepath for it as /usr/lib/postgresql/14/bin as I ought to have, but just relied on what was in my path, which turned
outto be the wrapper script. 

The actual problem turned out to be that I thought that I'd cleared out all the contents of the wal directory, but I'd
inadvertentlyleft a hidden file sitting in there.  Anyways during the process of debugging this, I didn't have the
databaserunning, and didn't have the data directory existing.  I wanted to look at pg_basebackup --help, and that would
notwork, throwing the error about the data directory not existing.  I should have focused on the first part of the
errormessage, that /var/lib/postgresq/14/main was not accessible, but instead I got distracted by the second part,
tellingme to fix the directory permissions on /var/lib/postgresql/14 making it world-readable.  Well it didn't actually
needto be world-readable, and we don't want it to be world-readable.  Regardless, I tried making it world-readable, and
wasconfused as to why pg_basebackup threw the same error message.  Once I created the /main subdirectory, ignoring the
complaintabout world-readability, I was able to get a different error that pointed me to the actual problem: 
pg_basebackup: error: directory "/var/lib/postgresql/wal/14/main" exists but is not empty

The point is that the error I ran into when the data directory (/main) did not exist under /var/lib/postgresq//14, is
incorrect,and led me to being confused and wasting some time wondering what was wrong rather than getting to the actual
problem. "please fix the directory permissions (/var/lib/postgresql/14/ should be world readable)" is misleading as
therewas no need to follow that instruction and it distracted from the more relevant and correct message printed just
beforeit ("/var/lib/postgresql/14/main is not accessible").  Furthermore, pg_basebackup --help should ideally work
regardlessof that, as does the upstream binary. 

Hope this helps,
--
Casey

> On Jan 29, 2024, at 11:40 AM, Christoph Berg <myon@debian.org> wrote:
>
> Re: Casey
>> I thought that I addressed your inquiries as best as I was able.  Can you please clarify any remaining questions?
>
> What did you do to make you believe that you had to "mkdir" in the
> first place?
>
> Also, please keep it on the list.
>
>>> On Jan 24, 2024, at 6:48 AM, Christoph Berg <myon@debian.org> wrote:
>>>
>>> Re: Casey Shobe
>>>> Below is pasted my initial message, which gives more context and detail.  Let me know if anything is still inclear
afterthis.  The context is that I use Patroni to run a multi-node cluster, and WAL-G creates a hidden directory within
thewal directory which I did not initially notice when I otherwise emptied it before reinitializing a node after
replacingdisk for the data volume.  This led to a fair bit of time wasted looking for the wrong problem: 
>>>
>>> I did reply to your initially message and all the questions are still
>>> open.
>>>
>>> Christoph
>
> Christoph