Re: Analyze on table creation?
От | James Coleman |
---|---|
Тема | Re: Analyze on table creation? |
Дата | |
Msg-id | CAAaqYe9OT9+QMa=ip9PRHwtoSA5NrDgPuZRqm+eGdjOvmUMvUw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Analyze on table creation? (Pavel Stehule <pavel.stehule@gmail.com>) |
Ответы |
Re: Analyze on table creation?
(Pavel Stehule <pavel.stehule@gmail.com>)
|
Список | pgsql-hackers |
cc'ing Tom because I'm curious if he's willing to provide some greater context on the commit in question. On Mon, Jun 26, 2023 at 2:16 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > > po 26. 6. 2023 v 19:48 odesílatel James Coleman <jtc331@gmail.com> napsal: >> >> On Mon, Jun 26, 2023 at 1:45 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: >> > >> > >> > >> > po 26. 6. 2023 v 19:43 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal: >> >> >> >> Hi >> >> >> >> po 26. 6. 2023 v 19:41 odesílatel James Coleman <jtc331@gmail.com> napsal: >> >>> >> >>> Hello, >> >>> >> >>> Have we ever discussed running an analyze immediately after creating a table? >> >>> >> >>> Consider the following: >> >>> >> >>> create table stats(i int, t text not null); >> >>> explain select * from stats; >> >>> Seq Scan on stats (cost=0.00..22.70 rows=1270 width=36 >> >>> analyze stats; >> >>> explain select * from stats; >> >>> Seq Scan on stats (cost=0.00..0.00 rows=1 width=36) >> >>> >> >>> Combined with rapidly increasing error margin on row estimates when >> >>> adding joins means that a query joining to a bunch of empty tables >> >>> when a database first starts up can result in some pretty wild plan >> >>> costs. >> >>> >> >>> This feels like a simple idea to me, and so I assume people have >> >>> considered it before. If so, I'd like to understand why the conclusion >> >>> was not to do it, or, alternatively if it's a lack of tuits. >> >> >> >> >> >> I like this. On the second hand, described behaviour is designed for ensuring of back compatibility. >> > >> > >> > if you break this back compatibility, then the immediate ANALYZE is not necessary >> >> I don't follow what backwards compatibility you're referencing. Could >> you expand on that? > > > Originally, until the table had minimally one row, the PostgreSQL calculated with 10 pages. It was fixed (changed) in PostgreSQL14. > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3d351d916b20534f973eda760cde17d96545d4c4 > From that commit message: > Historically, we've considered the state with relpages and reltuples > both zero as indicating that we do not know the table's tuple density. > This is problematic because it's impossible to distinguish "never yet > vacuumed" from "vacuumed and seen to be empty". In particular, a user > cannot use VACUUM or ANALYZE to override the planner's normal heuristic > that an empty table should not be believed to be empty because it is > probably about to get populated. That heuristic is a good safety > measure, so I don't care to abandon it, but there should be a way to > override it if the table is indeed intended to stay empty. So that implicitly provides our reasoning for not analyzing up-front on table creation. I haven't thought about this too deeply yet, but it seems plausible to me that the dangers of overestimating row count here (at minimum in queries like I described with lots of joins) are higher than the dangers of underestimating, which we would do if we believed the table was empty. One critical question would be how fast we can assume the table will be auto-analyzed (i.e., how fast would the underestimate be corrected. Regards, James Coleman
В списке pgsql-hackers по дате отправления:
Следующее
От: Pavel LuzanovДата:
Сообщение: Re: psql: Add role's membership options to the \du+ command