Обсуждение: pgsql: Remove the restriction that the relmap must be 512 bytes.
Remove the restriction that the relmap must be 512 bytes. Instead of relying on the ability to atomically overwrite the entire relmap file in one shot, write a new one and durably rename it into place. Removing the struct padding and the calculation showing why the map is exactly 512 bytes, and change the maximum number of entries to a nearby round number. Patch by me, reviewed by Andres Freund and Dilip Kumar. Discussion: http://postgr.es/m/CA+TgmoZq5%3DLWDK7kHaUbmWXxcaTuw_QwafgG9dr-BaPym_U8WQ%40mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-ttOXLX75k_WzRo9ar=VvxFhrHi+rJxns997F+yvkm==A@mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/d8cd0c6c95c0120168df93aae095df4e0682a08a Modified Files -------------- doc/src/sgml/monitoring.sgml | 4 +- src/backend/utils/activity/wait_event.c | 4 +- src/backend/utils/cache/relmapper.c | 94 +++++++++++++++++++-------------- src/include/utils/wait_event.h | 2 +- 4 files changed, 58 insertions(+), 46 deletions(-)
Hi Robert, On Tue, Jul 26, 2022 at 07:10:22PM +0000, Robert Haas wrote: > Remove the restriction that the relmap must be 512 bytes. > > Instead of relying on the ability to atomically overwrite the > entire relmap file in one shot, write a new one and durably > rename it into place. Removing the struct padding and the > calculation showing why the map is exactly 512 bytes, and change > the maximum number of entries to a nearby round number. > > Patch by me, reviewed by Andres Freund and Dilip Kumar. > > Discussion: http://postgr.es/m/CA+TgmoZq5%3DLWDK7kHaUbmWXxcaTuw_QwafgG9dr-BaPym_U8WQ%40mail.gmail.com > Discussion: http://postgr.es/m/CAFiTN-ttOXLX75k_WzRo9ar=VvxFhrHi+rJxns997F+yvkm==A@mail.gmail.com The CI on Windows is blowing up here and there after something that looks to come from this commit, as of this backtrace: 00000000`007fe300 00000001`405c62dd postgres!errfinish( char * filename = 0x00000001`40bf1513 "fd.c", int lineno = 0n756, char * funcname = 0x00000001`40bf14e0 "durable_rename")+0x41b [c:\cirrus\src\backend\utils\error\elog.c @ 683] 00000000`007fe360 00000001`4081647b postgres!durable_rename( char * oldfile = 0x00000000`007fe430 "base/16384/pg_filenode.map.tmp", char * newfile = 0x00000000`007fe830 "base/16384/pg_filenode.map", int elevel = 0n21)+0x22d [c:\cirrus\src\backend\storage\file\fd.c @ 753] 00000000`007fe3b0 00000001`408166c9 postgres!write_relmap_file( struct RelMapFile * newmap = 0x00000000`007fecb0, bool write_wal = true, bool send_sinval = true, bool preserve_files = true, unsigned int dbid = 0x4000, unsigned int tsid = 0x67f, char * dbpath = 0x00000000`0090b1c0 "base/16384")+0x38b [c:\cirrus\src\backend\utils\cache\relmapper.c @ 971] Here is one of them, kicked by the CF bot, but I have seen similar crashes with some of my own things (see the txt file in crashlog, in a manual VACUUM): https://cirrus-ci.com/task/5240408958566400 Thanks, -- Michael
Вложения
On Wed, Jul 27, 2022 at 4:35 PM Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Jul 26, 2022 at 07:10:22PM +0000, Robert Haas wrote: > > Remove the restriction that the relmap must be 512 bytes. > The CI on Windows is blowing up here and there after something that > looks to come from this commit, as of this backtrace: > 00000000`007fe300 00000001`405c62dd postgres!errfinish( > char * filename = 0x00000001`40bf1513 "fd.c", > int lineno = 0n756, > char * funcname = 0x00000001`40bf14e0 "durable_rename")+0x41b > [c:\cirrus\src\backend\utils\error\elog.c @ 683] And here's what the error looks like: 2022-07-26 19:38:04.321 GMT [8020][client backend] [pg_regress/vacuum][8/349:4527] PANIC: could not rename file "global/pg_filenode.map.tmp" to "global/pg_filenode.map": Permission denied Someone else still has the old file open, so we can't rename the new one to its name? On Windows that should have gone through pgrename() in dirmod.c, which would retry 100 times with a 100ms sleep between. Since every backend reads that file (I added an elog() and saw it 2289 times during make check), I guess you can run out of luck. /me thinks
On Wed, Jul 27, 2022 at 5:01 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Jul 27, 2022 at 4:35 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jul 26, 2022 at 07:10:22PM +0000, Robert Haas wrote: > > > Remove the restriction that the relmap must be 512 bytes. > > > The CI on Windows is blowing up here and there after something that > > looks to come from this commit, as of this backtrace: > > 00000000`007fe300 00000001`405c62dd postgres!errfinish( > > char * filename = 0x00000001`40bf1513 "fd.c", > > int lineno = 0n756, > > char * funcname = 0x00000001`40bf14e0 "durable_rename")+0x41b > > [c:\cirrus\src\backend\utils\error\elog.c @ 683] > > And here's what the error looks like: > > 2022-07-26 19:38:04.321 GMT [8020][client backend] > [pg_regress/vacuum][8/349:4527] PANIC: could not rename file > "global/pg_filenode.map.tmp" to "global/pg_filenode.map": Permission > denied > > Someone else still has the old file open, so we can't rename the new > one to its name? On Windows that should have gone through pgrename() > in dirmod.c, which would retry 100 times with a 100ms sleep between. > Since every backend reads that file (I added an elog() and saw it 2289 > times during make check), I guess you can run out of luck. > > /me thinks Maybe we just have to rearrange the locking slightly? Something like the attached.
Вложения
On Wed, Jul 27, 2022 at 6:06 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Jul 27, 2022 at 5:01 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Someone else still has the old file open, so we can't rename the new > > one to its name? On Windows that should have gone through pgrename() > > in dirmod.c, which would retry 100 times with a 100ms sleep between. > > Since every backend reads that file (I added an elog() and saw it 2289 > > times during make check), I guess you can run out of luck. > > > > /me thinks > > Maybe we just have to rearrange the locking slightly? Something like > the attached. Erm, let me try that again, this time with the CloseTransientFile() also under the lock, so that we never have a file handle without a lock.
Вложения
On Wed, Jul 27, 2022 at 06:21:07PM +1200, Thomas Munro wrote: > Erm, let me try that again, this time with the CloseTransientFile() > also under the lock, so that we never have a file handle without a > lock. Right. The whole write_relmap_file() already happens while taking RelationMappingLock, so that seems like a good idea for consistency at the end (even if I remember that there is a patch floating around to improve the concurrency of pgrename, which may become an easier move now that we require Windows 10). I have tested three runs and that was working here even if the issue is sporadic, so more runs may be better to have more confidence. -- Michael
Вложения
On Wed, Jul 27, 2022 at 6:25 AM Michael Paquier <michael@paquier.xyz> wrote: > Right. The whole write_relmap_file() already happens while taking > RelationMappingLock, so that seems like a good idea for consistency at > the end (even if I remember that there is a patch floating around to > improve the concurrency of pgrename, which may become an easier move > now that we require Windows 10). > > I have tested three runs and that was working here even if the > issue is sporadic, so more runs may be better to have more > confidence. OK, I committed Thomas's patch, after taking the liberty of adding an explanatory comment. -- Robert Haas EDB: http://www.enterprisedb.com