Re: Add index scan progress to pg_stat_progress_vacuum
От | Imseih (AWS), Sami |
---|---|
Тема | Re: Add index scan progress to pg_stat_progress_vacuum |
Дата | |
Msg-id | 78C9A1C5-8593-47F8-8717-42C554A724E9@amazon.com обсуждение исходный текст |
Ответ на | Re: Add index scan progress to pg_stat_progress_vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Ответы |
Re: Add index scan progress to pg_stat_progress_vacuum
Re: Add index scan progress to pg_stat_progress_vacuum |
Список | pgsql-hackers |
> I'm still unsure the current design of 0001 patch is better than other > approaches we’ve discussed. Even users who don't use parallel vacuum > are forced to allocate shared memory for index vacuum progress, with > GetMaxBackends() entries from the beginning. Also, it’s likely to > extend the progress tracking feature for other parallel operations in > the future but I think the current design is not extensible. If we > want to do that, we will end up creating similar things for each of > them or re-creating index vacuum progress tracking feature while > creating a common infra. It might not be a problem as of now but I'm > concerned that introducing a feature that is not extensible and forces > users to allocate additional shmem might be a blocker in the future. > Looking at the precedent example, When we introduce the progress > tracking feature, we implemented it in an extensible way. On the other > hand, others in this thread seem to agree with this approach, so I'd > like to leave it to committers. Thanks for the review! I think you make strong arguments as to why we need to take a different approach now than later. Flaws with current patch set: 1. GetMaxBackends() is a really heavy-handed overallocation of a shared memory serving a very specific purpose. 2. Going with the approach of a vacuum specific hash breaks the design of progress which is meant to be extensible. 3. Even if we go with this current approach as an interim solution, it will be a real pain in the future. With that said, v7 introduces the new infrastructure. 0001 includes the new infrastructure and 0002 takes advantage of this. This approach is the following: 1. Introduces a new API called pgstat_progress_update_param_parallel along with some others support functions. This new infrastructureis in backend_progress.c 2. There is still a shared memory involved, but the size is capped to " max_worker_processes" which is the max to how manyparallel workers can be doing work at any given time. The shared memory hash includes a st_progress_param array justlike the Backend Status array. typedef struct ProgressParallelEntry { pid_t leader_pid; int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; } ProgressParallelEntry; 3. The progress update function is "pgstat_progress_update_param_parallel" and will aggregate totals reported for a specificprogress parameter For example , it can be called lie below. In the case below, PROGRESS_VACUUM_INDEXES_COMPLETED is incremented by 1 in theshared memory entry shared by the workers and leader. case PARALLEL_INDVAC_STATUS_NEED_BULKDELETE: istat_res = vac_bulkdel_one_index(&ivinfo, istat, pvs->dead_items); pgstat_progress_update_param_parallel(pvs->shared->leader_pid, PROGRESS_VACUUM_INDEXES_COMPLETED,1); <<----- break; 4. pg_stat_get_progress_info will call a function called pgstat_progress_set_parallel which will set the parameter valueto the total from the shared memory hash. I believe this approach gives proper infrastructure for future use-cases of workers reporting progress -and- does not dothe heavy-handed shared memory allocation. -- Sami Imseih Amazon Web Services
Вложения
В списке pgsql-hackers по дате отправления: