Re: Re: fix cost subqueryscan wrong parallel cost

Поиск
Список
Период
Сортировка
От bucoo@sohu.com
Тема Re: Re: fix cost subqueryscan wrong parallel cost
Дата
Msg-id 2022042022004640700325@sohu.com
обсуждение исходный текст
Ответ на fix cost subqueryscan wrong parallel cost  ("bucoo@sohu.com" <bucoo@sohu.com>)
Ответы Re: Re: fix cost subqueryscan wrong parallel cost  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
> Sure, but that doesn't make the patch correct. The patch proposes
> that, when parallelism in use, a subquery scan will produce fewer rows
> than when parallelism is not in use, and that's 100% false. Compare
> this with the case of a parallel sequential scan. If a table contains
> 1000 rows, and we scan it with a regular Seq Scan, the Seq Scan will
> return 1000 rows.  But if we scan it with a Parallel Seq Scan using
> say 4 workers, the number of rows returned in each worker will be
> substantially less than 1000, because 1000 is now the *total* number
> of rows to be returned across *all* processes, and what we need is the
> number of rows returned in *each* process.

for now fuction cost_subqueryscan always using *total* rows even parallel
path. like this:

Gather (rows=30000)
  Workers Planned: 2
  ->  Subquery Scan  (rows=30000) -- *total* rows, should be equal subpath
        ->  Parallel Seq Scan  (rows=10000)

Maybe the codes:

/* Mark the path with the correct row estimate */
if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;

should change to:

/* Mark the path with the correct row estimate */
if (path->path.parallel_workers > 0)
path->path.rows = path->subpath->rows;
else if (param_info)
path->path.rows = param_info->ppi_rows;
else
path->path.rows = baserel->rows;


bucoo@sohu.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Odd off-by-one dirty buffers and checkpoint buffers written
Следующее
От: Robert Haas
Дата:
Сообщение: Re: generalized conveyor belt storage