Обсуждение: vacuumdb --missing-stats-only and permission issue
Hi, While following the discussion on vacuumdb --missing-stats-only [1], I noticed that this option queries pg_statistic and pg_statistic_ext_data. As a result, non-superusers like pg_maintain cannot use it because by default they lack permission to access those catalogs. I'm not sure whether --missing-stats-only was intended to work for non-superusers, but if so, this restriction is inconvenient. Would it make sense to use the views pg_stats and pg_stats_ext instead? Since the catalogs are only consulted to check whether statistics exist, the views should be sufficient. Thought? Regards, [1] https://postgr.es/m/20250820104226.8ba51e43164cd590b863ce41@sraoss.co.jp -- Fujii Masao
On Wed, Aug 20, 2025 at 11:06 PM Fujii Masao <masao.fujii@gmail.com> wrote:
I'm not sure whether --missing-stats-only was intended to work for
non-superusers, but if so, this restriction is inconvenient. Would it
make sense to use the views pg_stats and pg_stats_ext instead?
Since the catalogs are only consulted to check whether statistics exist,
the views should be sufficient. Thought?
It seems like it should be possible.
Initially, I was afraid that index statistics don't make it into pg_stats, but experimentation shows that they do as far back as I looked (10-stable).
A similar issue might happen with tables themselves, but if the user doesn't have permission to see the stats for that table, they weren't going to get far trying to vacuum the table.
Any issue with expressions columns in an index seems to be resolved as well, as the generated pg_attribute.attname is carried forward. I suppose there might be a collision with old stats and renamed columns, but that's a rare case, and it would still mean that the table had stats, just not for the new column.
The main problem would be if has_column_privilege() works on indexes for non-owners all the way back, if it doesn't, we're stuck.
One issue that may come up is that because pg_stats and pg_stats_ext are security barrier views, the planner is prone to hash joins (and thus full scans) of pg_stats, ignoring otherwise indexes that are ideal for single-row lookups like you get with EXISTS and NOT EXISTS clauses. See comment about "pg_class_relname_nsp_index" and associated query block in pg_dump.c for an example of what we had to do to make the planner avoid a full-scan.
Assuming that I'm not missing something, the fix seems straightforward. I'll set about coding it up tomorrow if nobody has done so by then.
On Thu, Aug 21, 2025 at 03:19:40AM -0400, Corey Huinker wrote: > Assuming that I'm not missing something, the fix seems straightforward. > I'll set about coding it up tomorrow if nobody has done so by then. I've added an open item for this. -- nathan
On Thu, Aug 21, 2025 at 9:44 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Aug 21, 2025 at 03:19:40AM -0400, Corey Huinker wrote:
> Assuming that I'm not missing something, the fix seems straightforward.
> I'll set about coding it up tomorrow if nobody has done so by then.
I've added an open item for this.
Here's the query changes, no regression test just yet.
Вложения
On Thu, Aug 21, 2025 at 12:52:17PM -0400, Corey Huinker wrote: > Here's the query changes, no regression test just yet. I think there's a problem with the privilege checks for pg_stats (and friends) versus ANALYZE. pg_stats checks for SELECT privileges on the column, while ANALYZE checks for MAINTAIN privileges. If a role lacks SELECT on the columns, it will always try to analyze the table. If a role lacks MAINTAIN on the table, it will fail if it tries to analyze the table. create role myrole login; create table no_stats_select as select generate_series(1, 10); create table no_stats_maintain as select generate_series(1, 10); create table stats_select as select generate_series(1, 10); create table stats_maintain as select generate_series(1, 10); grant maintain on no_stats_maintain to myrole; grant maintain on stats_maintain to myrole; grant select on no_stats_select to myrole; grant select on stats_select to myrole; analyze stats_select, stats_maintain; $ vacuumdb ... --missing-stats-only -Z -t stats_select vacuumdb: vacuuming database "postgres" $ vacuumdb ... --missing-stats-only -Z -t stats_maintain vacuumdb: vacuuming database "postgres" INFO: analyzing "public.stats_maintain" ... $ vacuumdb ... --missing-stats-only -Z -t no_stats_select vacuumdb: vacuuming database "postgres" WARNING: permission denied to analyze "no_stats_select", skipping it $ vacuumdb ... --missing-stats-only -Z -t no_stats_maintain vacuumdb: vacuuming database "postgres" INFO: analyzing "public.no_stats_maintain" ... Perhaps we should also filter for only columns for which the user has SELECT and MAINTAIN. Another option is just to look for SELECT and let the attempt to ANALYZE it fail (since it just produces a WARNING). Whatever we choose should probably be noted in the vacuumdb docs. -- nathan
On Thu, Aug 21, 2025 at 12:59:52PM -0500, Nathan Bossart wrote: > I think there's a problem with the privilege checks for pg_stats (and > friends) versus ANALYZE. pg_stats checks for SELECT privileges on the > column, while ANALYZE checks for MAINTAIN privileges. If a role lacks > SELECT on the columns, it will always try to analyze the table. If a role > lacks MAINTAIN on the table, it will fail if it tries to analyze the table. Unfortunately, pg_stats_ext is also different. The data for that view is restricted to table owners (or roles that inherit privileges of the table owner). -- nathan
Unfortunately, pg_stats_ext is also different. The data for that view is
restricted to table owners (or roles that inherit privileges of the table
owner).
Ok, I took the RLS and permissions quals from pg_stats and pg_stats_ext and put them in the corresponding EXISTs tests. The queries could be written a bit more succinctly (ex. we only need to do the RLS checks once) but putting them in each EXISTS clause drives home the point that we're duplicating the filters in pg_stats/pg_stats_ext.
So now, we avoid probing attributes for stats that we know were going to be filtered out, likewise for extended statistics objects.
The queries don't consult pg_stats_exprs because that's just exposing different stats from the same pg_statistic_ext_data row, and we already know that the row is there or not.
Вложения
On Thu, Aug 21, 2025 at 07:27:23PM -0400, Corey Huinker wrote: > Ok, I took the RLS and permissions quals from pg_stats and pg_stats_ext > and put them in the corresponding EXISTs tests. The queries could be > written a bit more succinctly (ex. we only need to do the RLS checks once) > but putting them in each EXISTS clause drives home the point that we're > duplicating the filters in pg_stats/pg_stats_ext. Looks reasonable to me. It's unfortunate that we have to be so restrictive, but since this is primarily intended for administrators to use post-upgrade, I think it's okay for v18. For v19, perhaps we should provide a view or function that returns whether stats are missing. That could require MAINTAIN on the relation to match what's needed for ANALYZE. We'll also need documentation and test updates, of course. For the docs, I'd try to borrow some of the wording from pg_stats and pg_stats_ext: This view allows access only to rows of pg_statistic that correspond to tables the user has permission to read... This view allows access only to rows of pg_statistic_ext and pg_statistic_ext_data that correspond to tables the user owns... -- nathan
On Fri, Aug 22, 2025 at 09:01:36AM -0500, Nathan Bossart wrote: > We'll also need documentation and test updates, of course. Here is an attempt at the docs/tests. I also fixed a couple of small issues in the query. -- nathan
Вложения
On Fri, Aug 22, 2025 at 3:34 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Aug 22, 2025 at 09:01:36AM -0500, Nathan Bossart wrote:
> We'll also need documentation and test updates, of course.
Here is an attempt at the docs/tests. I also fixed a couple of small
issues in the query.
--
nathan
Good catch on the query change. I like the docs change.
Tests look pretty straightforward. Was expecting one of them to test on a WARNING or not, but that might not be reliable enough to capture.
+1
On Sat, Aug 23, 2025 at 8:15 AM Corey Huinker <corey.huinker@gmail.com> wrote: > > On Fri, Aug 22, 2025 at 3:34 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> >> On Fri, Aug 22, 2025 at 09:01:36AM -0500, Nathan Bossart wrote: >> > We'll also need documentation and test updates, of course. >> >> Here is an attempt at the docs/tests. I also fixed a couple of small >> issues in the query. >> >> -- >> nathan > > > Good catch on the query change. I like the docs change. > > Tests look pretty straightforward. Was expecting one of them to test on a WARNING or not, but that might not be reliableenough to capture. > > +1 Thanks for working on this! I tested by creating many tables with make installcheck and running vacuumdb --missing-stats-only on the regression database. Without the patch, the query to find tables to analyze took about 60 ms, but with the patch it took 18 seconds. That seems too slow, so probably we'll need to tune the query? Regards, -- Fujii Masao
On Sat, Aug 23, 2025 at 10:59:43AM +0900, Fujii Masao wrote: > I tested by creating many tables with make installcheck and running > vacuumdb --missing-stats-only on the regression database. > Without the patch, the query to find tables to analyze took about 60 ms, > but with the patch it took 18 seconds. That seems too slow, > so probably we'll need to tune the query? Hm. Maybe we should just document that the option requires SELECT privileges on pg_statistic and pg_statistic_ext_data (which are restricted to superusers by default). I suspect we have relatively limited opportunities for tuning the query, and I'd like to avoid invasive changes to v18 at this point. -- nathan
On Sat, Aug 23, 2025 at 12:00 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Sat, Aug 23, 2025 at 10:59:43AM +0900, Fujii Masao wrote: > > I tested by creating many tables with make installcheck and running > > vacuumdb --missing-stats-only on the regression database. > > Without the patch, the query to find tables to analyze took about 60 ms, > > but with the patch it took 18 seconds. That seems too slow, > > so probably we'll need to tune the query? > > Hm. Maybe we should just document that the option requires SELECT > privileges on pg_statistic and pg_statistic_ext_data (which are restricted > to superusers by default). I suspect we have relatively limited > opportunities for tuning the query, and I'd like to avoid invasive changes > to v18 at this point. Yeah, adding a note about the permissions required for --missing-stats-only, leaving the query unchanged in v18, and revisiting the issue in v19 seems reasonable given the limited time before the v18 release. Regards, -- Fujii Masao
On Fri, Aug 22, 2025 at 11:20 PM Fujii Masao <masao.fujii@gmail.com> wrote:
On Sat, Aug 23, 2025 at 12:00 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Sat, Aug 23, 2025 at 10:59:43AM +0900, Fujii Masao wrote:
> > I tested by creating many tables with make installcheck and running
> > vacuumdb --missing-stats-only on the regression database.
> > Without the patch, the query to find tables to analyze took about 60 ms,
> > but with the patch it took 18 seconds. That seems too slow,
> > so probably we'll need to tune the query?
Sounds exactly like the issue we encountered with queries that used pg_stats in pg_dump.
>
> Hm. Maybe we should just document that the option requires SELECT
> privileges on pg_statistic and pg_statistic_ext_data (which are restricted
> to superusers by default). I suspect we have relatively limited
> opportunities for tuning the query, and I'd like to avoid invasive changes
> to v18 at this point.
Yeah, adding a note about the permissions required for --missing-stats-only,
leaving the query unchanged in v18, and revisiting the issue in v19 seems
reasonable given the limited time before the v18 release.
Rather than resorting to the redundant where-clause trick that we did in pg_dump.
Looking to v19, I think the problem could be solved if we exposed starelid in pg_stats and stxrelid in pg_stats_ext and joined on those. pg_dump would benefit from that as well, eventually.
Looking to v19, I think the problem could be solved if we exposed starelid in pg_stats and stxrelid in pg_stats_ext and joined on those. pg_dump would benefit from that as well, eventually.
On Sat, Aug 23, 2025 at 05:32:30AM -0400, Corey Huinker wrote: > On Fri, Aug 22, 2025 at 11:20 PM Fujii Masao <masao.fujii@gmail.com> wrote: >> On Sat, Aug 23, 2025 at 12:00 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >>> Hm. Maybe we should just document that the option requires SELECT >>> privileges on pg_statistic and pg_statistic_ext_data (which are restricted >>> to superusers by default). I suspect we have relatively limited >>> opportunities for tuning the query, and I'd like to avoid invasive changes >>> to v18 at this point. >> >> Yeah, adding a note about the permissions required for --missing-stats-only, >> leaving the query unchanged in v18, and revisiting the issue in v19 seems >> reasonable given the limited time before the v18 release. > > Rather than resorting to the redundant where-clause trick that we did in > pg_dump. Here's a patch for the documentation update. -- nathan
Вложения
On Sat, 23 Aug 2025 09:36:07 -0500 Nathan Bossart <nathandbossart@gmail.com> wrote: > On Sat, Aug 23, 2025 at 05:32:30AM -0400, Corey Huinker wrote: > > On Fri, Aug 22, 2025 at 11:20 PM Fujii Masao <masao.fujii@gmail.com> wrote: > >> On Sat, Aug 23, 2025 at 12:00 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > >>> Hm. Maybe we should just document that the option requires SELECT > >>> privileges on pg_statistic and pg_statistic_ext_data (which are restricted > >>> to superusers by default). I suspect we have relatively limited > >>> opportunities for tuning the query, and I'd like to avoid invasive changes > >>> to v18 at this point. > >> > >> Yeah, adding a note about the permissions required for --missing-stats-only, > >> leaving the query unchanged in v18, and revisiting the issue in v19 seems > >> reasonable given the limited time before the v18 release. > > > > Rather than resorting to the redundant where-clause trick that we did in > > pg_dump. > > Here's a patch for the documentation update. The documentation fix looks good to me. However, it’s not very user-friendly that, when the user lacks the required privileges, an error from the internal query is raised. Instead, how about checking whether the user has the necessary privileges and printing an appropriate message if any privilege is missing? I've added the 0002 patch to address this; the 0001 is the same as the previous. The check is performed per database, so it may be a bit slower when an enormous number of databases are processed, but the overhead should be relatively small compared with the other queries executed together. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Вложения
On Mon, Aug 25, 2025 at 10:30:32AM +0900, Yugo Nagata wrote: > The documentation fix looks good to me. However, it’s not very user-friendly that, > when the user lacks the required privileges, an error from the internal query is > raised. Instead, how about checking whether the user has the necessary privileges > and printing an appropriate message if any privilege is missing? Thanks for taking a look. I appreciate your suggestion, but I'm finding myself leaning -1 for a couple of reasons: * There's no precedent for this in vacuumdb. In fact, if the user specifies a nonexistent table, we return the error from the internal query, and if the user lacks privileges to VACUUM or ANALYZE a table, we let the command emit a WARNING and skip it. Intentionally or not, we seem to let the server do most of the work when it comes to this sort of thing. vacuumdb is just responsible for building and sending the commands. * I'm a little worried that warning about SELECT privileges on these catalogs will encourage people to grant privileges on them, which we probably don't want them to do. My proposed documentation note already carries some risk of this, but it at least specifically calls out that superusers should have the necessary privileges. * While probably rare, checking the privileges beforehand introduces race conditions that would cause the internal query error, anyway. I think there are already a few such race conditions already (e.g., if the table is dropped between the catalog query and the VACUUM or ANALYZE command), which AFAICT we just live with, but I'm wary of pressing our luck too much in this area. I'd be grateful for others' thoughts on this. I'm hoping to resolve this open item within the next couple of days, given 18rc1 is planned for next week. -- nathan
On Mon, 25 Aug 2025 09:29:15 -0500 Nathan Bossart <nathandbossart@gmail.com> wrote: > On Mon, Aug 25, 2025 at 10:30:32AM +0900, Yugo Nagata wrote: > > The documentation fix looks good to me. However, it’s not very user-friendly that, > > when the user lacks the required privileges, an error from the internal query is > > raised. Instead, how about checking whether the user has the necessary privileges > > and printing an appropriate message if any privilege is missing? > > Thanks for taking a look. I appreciate your suggestion, but I'm finding > myself leaning -1 for a couple of reasons: > > * There's no precedent for this in vacuumdb. In fact, if the user > specifies a nonexistent table, we return the error from the internal > query, and if the user lacks privileges to VACUUM or ANALYZE a table, we > let the command emit a WARNING and skip it. Intentionally or not, we > seem to let the server do most of the work when it comes to this sort of > thing. vacuumdb is just responsible for building and sending the > commands. > > * I'm a little worried that warning about SELECT privileges on these > catalogs will encourage people to grant privileges on them, which we > probably don't want them to do. My proposed documentation note already > carries some risk of this, but it at least specifically calls out that > superusers should have the necessary privileges. > > * While probably rare, checking the privileges beforehand introduces race > conditions that would cause the internal query error, anyway. I think > there are already a few such race conditions already (e.g., if the table > is dropped between the catalog query and the VACUUM or ANALYZE command), > which AFAICT we just live with, but I'm wary of pressing our luck too > much in this area. Thank you for the clarification. I understand your points now, so I'll withdraw my proposal. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Tue, Aug 26, 2025 at 12:24:27AM +0900, Yugo Nagata wrote: > Thank you for the clarification. I understand your points now, > so I'll withdraw my proposal. Okay. I'll plan on committing the documentation update in the next 24-48 hours, provided no additional feedback materializes. -- nathan
On Tue, Aug 26, 2025 at 1:20 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Aug 26, 2025 at 12:24:27AM +0900, Yugo Nagata wrote: > > Thank you for the clarification. I understand your points now, > > so I'll withdraw my proposal. > > Okay. I'll plan on committing the documentation update in the next 24-48 > hours, provided no additional feedback materializes. The patch looks good to me. Thanks! Regards, -- Fujii Masao
On Tue, Aug 26, 2025 at 09:00:53PM +0900, Fujii Masao wrote: > On Tue, Aug 26, 2025 at 1:20 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> Okay. I'll plan on committing the documentation update in the next 24-48 >> hours, provided no additional feedback materializes. > > The patch looks good to me. Thanks! Committed. -- nathan