Re: Add pg_walinspect function with block info columns

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Add pg_walinspect function with block info columns
Дата
Msg-id CALj2ACXR_SWazf5UVTBPa_32rwD2=M4Z1j6qt8k7QND2BEpuVA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Add pg_walinspect function with block info columns  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Add pg_walinspect function with block info columns  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Wed, Mar 8, 2023 at 12:58 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Mar 07, 2023 at 03:56:22PM +0530, Bharath Rupireddy wrote:
> > That would be a lot better. Not just the test, but also the
> > documentation can have it. Simple way to generate such a record (both
> > block data and FPI) is to just change the wal_level to logical in
> > walinspect.conf [1], see code around REGBUF_KEEP_DATA and
> > RelationIsLogicallyLogged in heapam.c
>
> I don't agree that we need to go down to wal_level=logical for this.
> The important part is to check that the non-NULL and NULL paths for
> the block data and FPI data are both taken, making 4 paths to check.
> So we need two tests at minimum, which would be either:
> - One SQL generating no FPI with no block data and a second generating
> a FPI with block data.  v2 was doing that but did not cover the first
> case.
> - One SQL generating a FPI with no block data and a second generating
> no FPI with block data.
>
> So let's just geenrate a heap record on an UPDATE, for example, like
> in the version attached.

Yup, that should work too because block data gets logged [1].

> > 4. I think we need to free raw_data, raw_page and flags as we loop
> > over multiple blocks (XLR_MAX_BLOCK_ID) and will leak memory which can
> > be a problem if we have many blocks assocated with a single WAL
> > record.
> >             flags = (Datum *) palloc0(sizeof(Datum) * bitcnt);
> > Also, we will leak all CStringGetTextDatum memory in the block_id for loop.
> > Another way is to use and reset temp memory context in the for loop
> > over block_ids. I prefer this approach over multiple pfree()s in
> > block_id for loop.
>
> I disagree, this was on purpose in the last version.  This version
> finishes by calling AllocSetContextCreate() and MemoryContextDelete()
> once per *record*, which will not be free, and we are arguing about
> resetting the memory context after scanning up to XLR_MAX_BLOCK_ID
> blocks, or 32 blocks which would go up to 32kB per page in the worst
> case.  That's not going to matter in a large scan for each record, but
> the extra AllocSet*() calls could.  And we basically do the same thing
> on HEAD.

It's not just 32kB per page right? 32*8KB on HEAD (no block data,
flags and CStringGetTextDatum there). With the patch, the number of
pallocs for each block_id = 6 CStringGetTextDatum + BLCKSZ (8KB) +
flags (5*size of ptr) + block data_len. In the worst case, all
XLR_MAX_BLOCK_ID can have both FPIs and block data. Furthermore,
imagine if someone initialized their cluster with a higher BLCKSZ (>=
8KB), then the memory leak happens noticeably on a lower-end system.

I understand that performance is critical here but we need to ensure
memory is used wisely. Therefore, I'd still vote to free at least the
major contributors here, that is, pfree(raw_data);, pfree(raw_page);
and pfree(flags); right after they are done using. I'm sure pfree()s
don't hurt more than resetting memory context for every block_id.

> Any comments?

I think we need to output block data length (blk->data_len) similar to
fpilen to save users from figuring out how to get the length of a
bytea column. This will also keep block data in sync with FPI info.

[1]
needs_backup = (page_lsn <= RedoRecPtr);

(gdb) p page_lsn
$2 = 21581544
(gdb) p RedoRecPtr
$3 = 21484808
(gdb) p needs_backup
$4 = false
(gdb)
(gdb) bt
#0  XLogRecordAssemble (rmid=10 '\n', info=64 '@',
RedoRecPtr=21484808, doPageWrites=true, fpw_lsn=0x7ffde118d640,
    num_fpi=0x7ffde118d634, topxid_included=0x7ffde118d633) at xloginsert.c:582
#1  0x00005598cd9c3ef7 in XLogInsert (rmid=10 '\n', info=64 '@') at
xloginsert.c:497
#2  0x00005598cd930452 in log_heap_update (reln=0x7f4a4c7cd808,
oldbuf=136, newbuf=136, oldtup=0x7ffde118d820,
    newtup=0x5598d00cb098, old_key_tuple=0x0,
all_visible_cleared=false, new_all_visible_cleared=false)
    at heapam.c:8473
#3  0x00005598cd92876e in heap_update (relation=0x7f4a4c7cd808,
otid=0x7ffde118dab2, newtup=0x5598d00cb098, cid=0,
    crosscheck=0x0, wait=true, tmfd=0x7ffde118db60,
lockmode=0x7ffde118da74) at heapam.c:3741

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Drouvot, Bertrand"
Дата:
Сообщение: Re: Minimal logical decoding on standbys
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: HOT chain validation in verify_heapam()