Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x
| От | Thomas Munro |
|---|---|
| Тема | Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x |
| Дата | |
| Msg-id | CA+hUKGKpQJCWcgyy3QTC9vdn6uKAR_8r__A-MMm2GYfj45caag@mail.gmail.com обсуждение исходный текст |
| Список | pgsql-bugs |
On Tue, Feb 19, 2019 at 7:31 AM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 15641 > Logged by: Hans Buschmann > Email address: buschmann@nidsa.net > PostgreSQL version: 11.2 > Operating system: Windows Server 2019 Standard > Description: > > I recently moved a production system from PG 10.7 to 11.2 on a different > Server. > > The configuration settings where mostly taken from the old system and > enhanced by new features of PG 11. > > pg_prewarm was used for a long time (with no specific configuration). > > Now I have added Huge page support for Windows in the OS and verified it > with vmmap tool from Sysinternals to be active. > (the shared buffers are locked in memory: Lock_WS is set). > > When pg_prewarm.autoprewarm is set to on (using the default after initial > database import via pg_restore), the autoprewarm worker process > terminates immediately and generates a huge number of logfile entries > like: > > CPS PRD 2019-02-17 16:11:53 CET 00000 11:> LOG: background worker > "autoprewarm worker" (PID 3996) exited with exit code 1 > CPS PRD 2019-02-17 16:11:53 CET 55000 1:> ERROR: could not map dynamic > shared memory segment Hmm. It's not clear to me how using large pages for the main PostgreSQL shared memory region could have any impact on autoprewarm's entirely separate DSM segment. I wonder if other DSM use cases are impacted. Does parallel query work? For example, the following produces a parallel query that uses a few DSM segments: create table foo as select generate_series(1, 1000000)::int i; analyze foo; explain analyze select count(*) from foo f1 join foo f2 using (i); Looking at the place where that error occurs, it seems like it simply failed to find the handle, as if it didn't exist at all at the time dsm_attach() was called. I'm not entirely sure how that could happen just because you turned on huge pages. Is it possible that there is a race where apw_load_buffers() manages to detach before the worker attached, and the timing changes? At a glance, that shouldn't happen because apw_start_database_worker() waits for the work to exit before returning. I think we'll need one of our Windows-enabled hackers to take a look. PS Sorry for breaking the thread. I wish our archives app had a "[re]send me this email" button, for people who subscribed after the message was sent... -- Thomas Munro https://enterprisedb.com
В списке pgsql-bugs по дате отправления: