Обсуждение: Use-after-free in expand_partitioned_rtentry
Hello everyone, there seems to be a case of use-after-free in the function expand_partitioned_rtentry (src/backend/optimizer/util/inherit.c). In the NULL-check introduced to handle concurrently detached and dropped partitions (see [1]), the partition gets removed from the set of live partitions using bms_del_member but the returned Bitmapset is only assigned to relinfo->live_parts and not to the local variable live_parts being used in the while condition in Line 381. However, if the partition actually is the last one in the set, bms_del_member performs a pfree on the Bitmapset and returns NULL. relinfo->live_parts is set to NULL but the local variable live_parts still points to the old address. Therefore, it becomes a dangling pointer, leading to a use-after-free when accessed by bms_next_member. By slightly modifying the steps proposed by Kuntal Ghosh in [2], the use-after-free can be evoked and the program crashes: SETUP 1. ./configure --enable-debug --enable-depend --enable-cassert CFLAGS=-O0 2. make -j; make install -j; initdb -D ../primary; pg_ctl -D ../primary -l logfile start 3. alter system set plan_cache_mode to 'force_generic_plan' ; select pg_reload_conf(); 4. create table p( a int,b int) partition by range(a);create table p1 partition of p for values from (0) to (1);create table p2 partition of p for values from (1) to (2); insert into p(a,b) values (1,0); REPRODUCTION Session 1: 1. Attach GDB and put a breakpoint at ATExecDetachPartition Session 2: 1. SQL:prepare p1 as select * from p where a = 1; 2. Attach GDB and put a breakpoint at ProcessUtility() and inherit.c:394 Session 1: 1. alter table p detach partition p2 concurrently; 2. The session will be stalled at ATExecDetachPartition. Continue stepping next till CommitTransactionCommand(); Session 2: 1. SQL:execute p1; 2. The session will be stalled at ProcessUtility(). Before that, it takes the snapshot. Session 1: 1. Continue till DetachPartitionFinalize. Session 2: 1. Continue to inherit.c:394 (right before try_table_open is called on p2). Session 1: 1. Run to completion. 2. SQL: drop table p2 Session 2: 1. Continue. It will try to open p2, returning NULL, the Bitmapset is freed and the use-after-free via live_parts in the while condition crashes the program This has been tested on the current master branch. The patch attached seems to solve the problem by removing the local variable and relying on relinfo->live_parts only. Going through the steps described above with the patched version works fine. Looking forward to your feedback, Regards, Bernd Reiß [1] https://github.com/postgres/postgres/commit/52f3de874bbeaf16b3ae2efb505c1117be8355cc [2] https://www.postgresql.org/message-id/18559-b48286d2eacd9a4e%40postgresql.org
Вложения
On Fri, 29 Aug 2025 at 23:16, Bernd Reiß <bd_reiss@gmx.at> wrote: > there seems to be a case of use-after-free in the function > expand_partitioned_rtentry (src/backend/optimizer/util/inherit.c). In > the NULL-check introduced to handle concurrently detached and dropped > partitions (see [1]), the partition gets removed from the set of live > partitions using bms_del_member but the returned Bitmapset is only > assigned to relinfo->live_parts and not to the local variable live_parts > being used in the while condition in Line 381. However, if the partition > actually is the last one in the set, bms_del_member performs a pfree on > the Bitmapset and returns NULL. relinfo->live_parts is set to NULL but > the local variable live_parts still points to the old address. > Therefore, it becomes a dangling pointer, leading to a use-after-free > when accessed by bms_next_member. Yeah. Agreed. I did suspect this code might have predated 00b41463c (from 2023), and might have been ok when it was written, but that's not the case as it was only added in 52f3de874 (in 2024). Your fix looks good to me. I do prefer getting rid of the variable rather than adding the additional assignment as it reduces the chance of future omissions. David
Thanks for the quick response and the review. This is admittedly a pretty remote edge case, but still, better safe than sorry. Bernd On 8/29/25 1:29 PM, David Rowley wrote: > On Fri, 29 Aug 2025 at 23:16, Bernd Reiß <bd_reiss@gmx.at> wrote: >> there seems to be a case of use-after-free in the function >> expand_partitioned_rtentry (src/backend/optimizer/util/inherit.c). In >> the NULL-check introduced to handle concurrently detached and dropped >> partitions (see [1]), the partition gets removed from the set of live >> partitions using bms_del_member but the returned Bitmapset is only >> assigned to relinfo->live_parts and not to the local variable live_parts >> being used in the while condition in Line 381. However, if the partition >> actually is the last one in the set, bms_del_member performs a pfree on >> the Bitmapset and returns NULL. relinfo->live_parts is set to NULL but >> the local variable live_parts still points to the old address. >> Therefore, it becomes a dangling pointer, leading to a use-after-free >> when accessed by bms_next_member. > Yeah. Agreed. > > I did suspect this code might have predated 00b41463c (from 2023), and > might have been ok when it was written, but that's not the case as it > was only added in 52f3de874 (in 2024). > > Your fix looks good to me. I do prefer getting rid of the variable > rather than adding the additional assignment as it reduces the chance > of future omissions. > > David
On Fri, 29 Aug 2025 at 23:45, Bernd Reiß <bd_reiss@gmx.at> wrote: > Thanks for the quick response and the review. Thanks for the report, investigation and patch. I've pushed and backpatched this to 15. v14 doesn't have the RelOptInfo.live_parts field, so it didn't suffer from the issue. Technically, 15 isn't broken either as the bms_del_member() function in that version wouldn't pfree the set. I decided to patch 15 anyway to keep the code the same and to avoid assuming it's ok to ignore the return value of bms_del_member(). > This is admittedly a pretty remote edge case, but still, better safe > than sorry. Did you find it through code analysis or from a crash? It would just have been a matter of time before someone hit this. David
Glad I could be of help. I found this through code analysis. I've been working on a custom PG checker, adapting the Clang Static Checker for my bachelor thesis. Always nice to see, when academic work has real world benefits :) Bernd On 8/29/25 3:02 PM, David Rowley wrote: > On Fri, 29 Aug 2025 at 23:45, Bernd Reiß <bd_reiss@gmx.at> wrote: >> Thanks for the quick response and the review. > Thanks for the report, investigation and patch. > > I've pushed and backpatched this to 15. v14 doesn't have the > RelOptInfo.live_parts field, so it didn't suffer from the issue. > Technically, 15 isn't broken either as the bms_del_member() function > in that version wouldn't pfree the set. I decided to patch 15 anyway > to keep the code the same and to avoid assuming it's ok to ignore the > return value of bms_del_member(). > >> This is admittedly a pretty remote edge case, but still, better safe >> than sorry. > Did you find it through code analysis or from a crash? > > It would just have been a matter of time before someone hit this. > > David > >