Re: Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost
От | 萧鸿骏 |
---|---|
Тема | Re: Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost |
Дата | |
Msg-id | 4ea619cb.2e6.196d825321b.Coremail.23031212454@stu.xidian.edu.cn обсуждение исходный текст |
Ответ на | Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost (Dilip Kumar <dilipbalaut@gmail.com>) |
Список | pgsql-bugs |
Thank you very much! I will consider these in my future work. 2025-05-16 14:15:48 "Dilip Kumar" <dilipbalaut@gmail.com> 写道: > On Thu, May 15, 2025 at 6:42 PM 萧鸿骏 <23031212454@stu.xidian.edu.cn> wrote: > > > > Thank you very much for your reply! May I assume that we need to manually adjust the cost based on the actual situationof the system and testing computer? > > > > I have been conducting research on SQL performance issues in PG recently, and many of the problems I am currently studyingare related to it. I would like to consult with you. If the default cost value of the optimizer is not set reliably,resulting in the selection of a poor execution plan and significant performance differences, is this considereda direction for optimizer performance optimization? I think if we stand from the user's perspective, this may bedifficult to detect, leading to significant performance losses. > > > > Multiple factors influence the planner parameters seq_page_cost and > random_page_cost. PostgreSQL sets these to conservative default values > that aim to strike a balance, not assuming either extreme. On one end, > you might have a system where most data is on a slow spinning disk, > while on the other, you may have a machine with large amounts of RAM > such that almost all data is cached and disk I/O is rarely needed. > > These cost parameters directly influence the planner’s choice between > access paths, for example, index scan vs sequential scan: An index > scan involves random I/O, since it needs to jump around the heap to > fetch matching rows by TID. A sequential scan reads the table > linearly, which is generally faster on disk due to fewer seeks. > > Now, suppose your WHERE clause filters out 50% of the rows. The > planner might estimate that an index scan would involve a high cost > due to frequent random page reads, especially since the default > random_page_cost is 4 times higher than seq_page_cost. As a result, it > may choose a sequential scan as the cheaper plan. > > However, if most of your data is already in RAM, there is no > meaningful difference between random and sequential page reads; both > are fast. In such a case, the planner’s assumptions (based on default > cost values) can lead to a suboptimal plan, not because of a bug, but > because it's working with inaccurate cost estimates relative to your > hardware. > > So while the defaults work well for many systems, if you’re noticing > suboptimal plans, especially on machines with a lot of RAM or fast > SSDs, it's worth tuning these parameters. For systems that mostly > serve static data and have high cache hit ratios, reducing both > seq_page_cost and random_page_cost (and possibly making them equal) > may help the planner make better decisions. > > This is just my opinion, and others may think differently. > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления: