Re: The ability of postgres to determine loss of files of the main fork

Поиск
Список
Период
Сортировка
От Frits Hoogland
Тема Re: The ability of postgres to determine loss of files of the main fork
Дата
Msg-id A2681026-3AE4-4F3E-9BD7-5F78DF2D1E34@gmail.com
обсуждение исходный текст
Ответ на Re: The ability of postgres to determine loss of files of the main fork  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Список pgsql-hackers
Thank you for looking into this Jakub, Aleksander, Michael, Tom,

There are more scenario's where files might be missing: how about a backup somehow missing files?
But also deliberate sabotage, you can hide data in this way, and it would be hard to detect, and even
harder to understand what happened, why it happened and how it happened. I dare to say quite ideal
if you want a name to be removed from a database a covert way, and know something about the
database and have access, or somehow can manipulate in the data directory.

Without having the exact sourcecode files and functions handy, the way I think using the segments
works in postgres, is that the database, because of the absence of any indicator of size, will scan 
the relfilenode, and when it reaches the set limit (1GB), it will try to open the next segment. If that 
segment doesn't exist, the database assumes it's the end of the segment. That is all. That is why there
is no error: it's exactly alike when there truly is no more data, and there is no metadata to understand
there is supposed to be more data.

And therefore I talked about the primary key: that contains "indirect" data to be able to detect if a
table has a missing segment. But I think really what is needed is a max page number to indicate
the known size.



Frits Hoogland




On 1 Oct 2025, at 13:20, Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:

On Wed, Oct 1, 2025 at 9:02 AM Michael Banck <mbanck@gmx.net> wrote:

Hi,

wow, this is one of the most terrifying threads I've ever seen...

Same.

On Tue, Sep 30, 2025 at 12:41:29PM -0400, Tom Lane wrote:
Aleksander Alekseev <aleksander@tigerdata.com> writes:
Therefore, I would like to request an enhancement: add an option to
verify_heapam() that causes the primary key index to be scanned and makes
sure that all line pointers in the index point to existing tuples.

... IMO there is little value in adding a check for the existence of
the segments for a single table. And the *real* check will not differ
much from something like SELECT * FROM my_table, or from making a
complete backup of the database.

As Frits mentioned, neither of those actions will really notice if a
table has been truncated via loss of a segment.

Is there a valid case for a missing segment? If not, couldn't this be
caught somewhere in the storage manager?


I've took a look on PG17 and in _mfd_openseg() there's if fd < 0
return NULL after open(), but out of it's callers only _mdfd_getseg()
seems to be alerting on that NULL. To me this seems like a bug,
because i've seen way too many times people and software deleting
files randomly. Even simple crashes (with e2fsck, xfs_repair) could
put orphaned inodes into /lost+found. IMHO all files should be opened
at least on startup to check integrity, because the non-zero return
code (during such SELECT) for openat(2) seems o be coming out of
RelationGetNumberOfBlocksInFork()->table_block_relation_size()->smgrnblocks()->mdnblocks()->_mdfd_openseg().
Now if the 1st seg file would be missing we would complain in
mdopenfork(). mdnblocks() says even "all active segments of the
relation are opened...", but even that apparently is not true.

The bigger context seems to be be that 049469e7e7cfe0c69 (2015) could
be culprit here as well, as it is stated there that mdnblocks() could
earlier create zero-length files back in day and it removed that
ereport(ERROR) when unable to access that file.

Another idea (than this being a bug) is that Thomas had a large
relation patchset back in [1], but I wouldn't be a fan of us operating
on 31-32TB files ;)

-J.

[1] - https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BBGXwMbrvzXAjL8VMGf25y_ga_XnO741g10y0%3Dm6dDiA%40mail.gmail.com

В списке pgsql-hackers по дате отправления: