Re: Disable parallel query by default
От | Scott Mead |
---|---|
Тема | Re: Disable parallel query by default |
Дата | |
Msg-id | 58034f15-5ca5-4bc3-8750-76f44b0e77dc@app.fastmail.com обсуждение исходный текст |
Ответ на | Re: Disable parallel query by default (Greg Sabino Mullane <htamfids@gmail.com>) |
Ответы |
Re: Disable parallel query by default
|
Список | pgsql-hackers |
On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote: > On Tue, May 13, 2025 at 4:37 PM Scott Mead <scott@meads.us> wrote: >> I'll open by proposing that we prevent the planner from automatically selecting parallel plans by default > > That seems a pretty heavy hammer, when we have things like parallel_setup_cost that should be tweaked first. I agree it's a big hammer and I thought through parallel_setup_cost quite a bit myself. The problem with parallel_setup_costis that it doesn't actually represent the overhead of a setting up parallel query for a busy system. It does define the cost of setup for a *single* parallel session, but it cannot accurately express the cost of CPU and otheroverhead associated with the second, third, fourth, etc... query that is executed as parallel. The expense to the operatingsystem is a function of the _rate_ of parallel query executions being issued. Without new infrastructure, there'sno way to define something that will give me a true representation of the cost of issuing a query with parallelism. >> The recommendation that I give to users is pretty straightforward: "Disable automatic parallel query, enable it for querieswhere you find substantial savings and can control the rate of execution." I always tell users that if they're usingparallel query for anything that should execute in less than 5 minutes, they're probably pushing on the wrong tuningstrategy as the load induced by the parallel query infrastructure is likely going to negate the savings that they'regetting. > > Five minutes?! That's not been my experience. Not claiming parallelism is perfect yet, but there are plenty of parallelperformance savings under the five minute mark. Absolutely, I've seen 1 second queries go to 200ms with parallelism of 2. The problem isn't about making that query fasterin isolation, the problem is that every single one of those means a new connection. If you have a connection pool from your application and you issue 60 queries per minute, each that can go from 1 second to200 ms, That means that you are making 120 connections per minute back to the DB. As we know, connection establishment is brutal.... Executing a pgbench with 15 clients for 10 seconds using "SELECT 1;" as the workload gives me 315,449 tps. If I add the-C flag (connect / disconnect for each transaction), I get 684 TPS. The overhead of a connection is more than just tothe specific query being optimized, it has far-reaching impact even outside of the postgres processes on the machine. $ pgbench -f select1.sql --no-vacuum -c 15 -T 10 pgbench (16.4) transaction type: select1.sql scaling factor: 1 query mode: simple number of clients: 15 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 3146966 number of failed transactions: 0 (0.000%) latency average = 0.048 ms initial connection time = 28.854 ms tps = 315449.609763 (without initial connection time) $ pgbench -f select1.sql --no-vacuum -c 15 -T 10 -C pgbench (16.4) transaction type: select1.sql scaling factor: 1 query mode: simple number of clients: 15 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 6848 number of failed transactions: 0 (0.000%) latency average = 21.910 ms average connection time = 1.455 ms tps = 684.615907 (including reconnection times) > > Cheers, > Greg > > -- > Crunchy Data - https://www.crunchydata.com > Enterprise Postgres Software Products & Tech Support > -- Scott Mead Amazon Web Services scott@meads.us
В списке pgsql-hackers по дате отправления: