Initdb-time block size specification
От | David Christensen |
---|---|
Тема | Initdb-time block size specification |
Дата | |
Msg-id | CAOxo6XKx7DyDgBkWwPfnGSXQYNLpNrSWtYnK6-1u+QHUwRa1Gg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Initdb-time block size specification
(Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: Initdb-time block size specification (Andres Freund <andres@anarazel.de>) Re: Initdb-time block size specification (David Christensen <david.christensen@crunchydata.com>) |
Список | pgsql-hackers |
Hi, As discussed somewhat at PgCon, enclosed is v1 of a patch to provide variable block sizes; basically instead of BLCKSZ being a compile-time constant, a single set of binaries can support all of the block sizes Pg can support, using the value stored in pg_control as the basis. (Possible future plans would be to make this something even more dynamic, such as configured per tablespace, but this is out of scope; this just sets up the infrastructure for this.) Whereas we had traditionally used BLCKSZ to indicate the compile-time selected block size, this commit adjusted things so the cluster block size can be selected at initdb time. In order to code for this, we introduce a few new defines: - CLUSTER_BLOCK_SIZE is the blocksize for this cluster itself. This is not valid until BlockSizeInit() has been called in the given backend, which we do as early as possible by parsing the ControlFile and using the blcksz field. - MIN_BLOCK_SIZE and MAX_BLOCK_SIZE are the limits for the selectable block size. It is required that CLUSTER_BLOCK_SIZE is a power of 2 between these two constants. - DEFAULT_BLOCK_SIZE is the moral equivalent of BLCKSZ; it is the built-in default value. This is used in a few places that just needed a buffer of an arbitrary size, but the dynamic value CLUSTER_BLOCK_SIZE should almost always be used instead. - CLUSTER_RELSEG_SIZE is used instead of RELSEG_SIZE, since internally we are storing the segment size in terms of number of blocks. RELSEG_SIZE is still kept, but is used in terms of the number of blocks of DEFAULT_BLOCK_SIZE; CLUSTER_RELSEG_SIZE scales appropriately (and is the only thing used internally) to keep the same target total segment size regardless of block size. This patch uses a precalculated table to store the block size itself, as well as additional derived values that have traditionally been compile-time constants (example: MaxHeapTuplesPerPage). The traditional macro names are kept so code that doesn't care about it should not need to change, however the definition of these has changed (see the CalcXXX() routines in blocksize.h for details). A new function, BlockSizeInit() populates the appropriate values based on the target block size. This should be called as early as possible in any code that utilizes block sizes. This patch adds this in the appropriate place on the handful of src/bin/ programs that used BLCKSZ, so this caveat mainly impacts new code. Code which had previously used BLCKZ should likely be able to get away with changing these instances to CLUSTER_BLOCK_SIZE, unless you're using a structure allocated on the stack. In these cases, the compiler will complain about dynamic structure. The solution is to utilize an expression with MAX_BLOCK_SIZE instead of BLCKSZ, ensuring enough stack space is allocated for the maximum size. This also does require using CLUSTER_BLOCK_SIZE or an expression based on it when actually using this structure, so in practice more stack space may be allocated then used in principal; as long as there is plenty of stack this should have no specific impacts on code. Initial (basic) performance testing shows only minor changes with the pgbench -S benchmark, though this is obviously an area that will need considerable testing/verification across multiple workloads. Thanks, David
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Drouvot, Bertrand"Дата:
Сообщение: Re: ProcessStartupPacket(): database_name and user_name truncation
Следующее
От: Bruce MomjianДата:
Сообщение: Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?