Re: Re: fix cost subqueryscan wrong parallel cost

Поиск

Список

Период

Сортировка

От	bucoo@sohu.com
Тема	Re: Re: fix cost subqueryscan wrong parallel cost
Дата	20 апреля 2022 г. 14:00:46
Msg-id	2022042022004640700325@sohu.com обсуждение исходный текст
Ответ на	fix cost subqueryscan wrong parallel cost ("bucoo@sohu.com" <bucoo@sohu.com>)
Ответы	Re: Re: fix cost subqueryscan wrong parallel cost
Список	pgsql-hackers

> Sure, but that doesn't make the patch correct. The patch proposes

> that, when parallelism in use, a subquery scan will produce fewer rows

> than when parallelism is not in use, and that's 100% false. Compare

> this with the case of a parallel sequential scan. If a table contains

> 1000 rows, and we scan it with a regular Seq Scan, the Seq Scan will

> return 1000 rows. But if we scan it with a Parallel Seq Scan using

> say 4 workers, the number of rows returned in each worker will be

> substantially less than 1000, because 1000 is now the *total* number

> of rows to be returned across *all* processes, and what we need is the

> number of rows returned in *each* process.

for now fuction cost_subqueryscan always using *total* rows even parallel

path. like this:

Gather (rows=30000)

Workers Planned: 2

-> Subquery Scan (rows=30000) -- *total* rows, should be equal subpath

-> Parallel Seq Scan (rows=10000)

Maybe the codes:

/* Mark the path with the correct row estimate */

if (param_info)

path->path.rows = param_info->ppi_rows;

else

path->path.rows = baserel->rows;

should change to:

/* Mark the path with the correct row estimate */

if (path->path.parallel_workers > 0)

path->path.rows = path->subpath->rows;

else if (param_info)

path->path.rows = param_info->ppi_rows;

else

path->path.rows = baserel->rows;

bucoo@sohu.com

В списке pgsql-hackers по дате отправления: