Re: [parallel query] random server crash while running tpc-h query on power2
От | Amit Kapila |
---|---|
Тема | Re: [parallel query] random server crash while running tpc-h query on power2 |
Дата | |
Msg-id | CAA4eK1LNxOnsuRS1my0PpbW=mqKu5wWXEfxg6xn3_CMWghOckw@mail.gmail.com обсуждение исходный текст |
Ответ на | [parallel query] random server crash while running tpc-h query on power2 (Rushabh Lathia <rushabh.lathia@gmail.com>) |
Ответы |
Re: [parallel query] random server crash while running
tpc-h query on power2
|
Список | pgsql-hackers |
On Sat, Aug 13, 2016 at 11:10 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > Hi All, > > Recently while running tpc-h queries on postgresql master branch, I am > noticed > random server crash. Most of the time server crash coming while turn tpch > query > number 3 - (but its very random). > > > Here its clear that work_instrument is either corrupted or Un-inililized > that is the > reason its ending up with server crash. > > With bit more debugging and looked at git history I found that issue started > coming > with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext() > calls > ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers() > calls ExecParallelFinish() which collects the instrumentation before marking > ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do the > allocation > of planstate->worker_instrument. > > With commit af33039317 now we calling the gather_readnext() with per-tuple > context, > but with nreader == 0 with ExecShutdownGatherWorkers() we end up with > allocation > of planstate->worker_instrument into per-tuple context - which is wrong. > > Now fix can be: > > 1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and > let > ExecEndGather() do that things. > I don't think we can wait till ExecEndGather() to collect statistics, as we need it before that for explain path. However, we do call ExecShutdownNode() from ExecutePlan() when there are no more tuples which can take care of ensuring the shutdown of Gather node. I think the advantage of calling it in gather_readnext() is that it will resources to be released early and populating the instrumentation/statistics as early as possible. > But with this change, gather_readread() and > gather_getnext() depend on planstate->reader structure to continue reading > tuple. > Now either we can change those condition to be depend on planstate->nreaders > or > just pfree(planstate->reader) into gather_readnext() instead of calling > ExecShutdownGatherWorkers(). > > > Attaching patch, which fix the issue with approach 1). > AFAICS, your patch seems to be the right fix for this issue, unless we need the instrumentation information during execution (other than for explain) for some purpose. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: