Re: initdb issue on 64-bit Windows - (Was: [pgsql-packagers] PG 9.6beta2 tarballs are ready)

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: initdb issue on 64-bit Windows - (Was: [pgsql-packagers] PG 9.6beta2 tarballs are ready)
Дата
Msg-id CAMsr+YHfBbgKSHMx6BtndrfUhbX3d-PfBnEbn5eCO0VEANoReA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: initdb issue on 64-bit Windows - (Was: [pgsql-packagers] PG 9.6beta2 tarballs are ready)  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: initdb issue on 64-bit Windows - (Was: [pgsql-packagers] PG 9.6beta2 tarballs are ready)  (Michael Paquier <michael.paquier@gmail.com>)
Re: initdb issue on 64-bit Windows - (Was: [pgsql-packagers] PG 9.6beta2 tarballs are ready)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers


On 24 June 2016 at 10:28, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jun 24, 2016 at 11:21 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>   * Launch a VS x86 command prompt
>   * devenv /debugexe bin\initdb.exe -D test
>   * Set a breakpoint in initdb.c:3557 and initdb.c:3307
>   * Run
>   * When it traps at get_restricted_token(), manually move the execution
> pointer over the setup of the restricted execution token by dragging &
> dropping the yellow instruction pointer arrow. Yes, really. Or, y'know,
> comment it out and rebuild, but I was working with a supplied binary.
>   * Continue until next breakpoint
>   * Launch process explorer and find the pid of the postgres child process
>   * Debug->attach to process, attach to the child postgres. This doesn't
> detach the parent, VS does multiprocess debugging.
>   * Continue execution
>   * vs will trap on the child when it crashes

Do you think a crash dump could have been created by creating
crashdumps/ in PGDATA as part of initdb before this query is run?


The answer is "yes" btw. Add "crashdumps" to the static array of directories created by initdb and it works great.

Sigh. It'd be less annoying if I hadn't written most of the original patch.

For convenience I also commented out the check_root call in src/backend/main.c and the get_restricted_token(progname) call in initdb.c, so I could run it easily under an admin account where I can also install tools etc without hassle. Not recommended on a non-throwaway machine of course.

The generated crashdump shows the same crash in the same location.

I have absolutely no idea why it's trying to access memory at what looks like   (uint64)(-1) though.  Nothing in the auto vars list:

+ &restrictlist 0x000000000043f7b0 {0x0000000009e32600 {type=T_List (656) length=1 head=0x0000000009e325e0 {data={ptr_value=...} ...} ...}} List * *
+ inner_rel 0x0000000009e7ad68 {type=T_EquivalenceClass (537) reloptkind=RELOPT_BASEREL (0) relids=0x0000000009e30520 {...} ...} RelOptInfo *
+ inner_rel->relids 0x0000000009e30520 {nwords=658 words=0x0000000009e30524 {...} } Bitmapset *
+ outer_rel 0x00000001401dec98 {postgres.exe!build_joinrel_tlist(PlannerInfo * root, RelOptInfo * joinrel, RelOptInfo * input_rel), Line 646} {...} RelOptInfo *
+ outer_rel->relids 0xe808498b48d78b48 {nwords=??? words=0xe808498b48d78b4c {...} } Bitmapset *
+ sjinfo 0x000000000043f870 {type=T_SpecialJoinInfo (543) min_lefthand=0x0000000009e7abd0 {nwords=1 words=0x0000000009e7abd4 {...} } ...} SpecialJoinInfo *

or locals:

+ inner_rel 0x0000000009e7ad68 {type=T_EquivalenceClass (537) reloptkind=RELOPT_BASEREL (0) relids=0x0000000009e30520 {...} ...} RelOptInfo *
inner_rows 270.00000000000000 double
+ outer_rel 0x00000001401dec98 {postgres.exe!build_joinrel_tlist(PlannerInfo * root, RelOptInfo * joinrel, RelOptInfo * input_rel), Line 646} {...} RelOptInfo *
outer_rows 2.653351978175e-314#DEN double
+ restrictlist 0x0000000009e32600 {type=T_List (656) length=1 head=0x0000000009e325e0 {data={ptr_value=0x0000000009e31788 ...} ...} ...} List *
+ root 0x0000000009e7b3f8 {type=1 parse=0x0000000000504ad0 {type=T_AllocSetContext (601) commandType=CMD_UNKNOWN (0) ...} ...} PlannerInfo *
+ sjinfo 0x000000000043f870 {type=T_SpecialJoinInfo (543) min_lefthand=0x0000000009e7abd0 {nwords=1 words=0x0000000009e7abd4 {...} } ...} SpecialJoinInfo *

seems to fit. Though outer_rel->relids is a pretty weird address - 0xe808498b48d78b48? Really?

I'd point DrMemory at it, but unfortunately it only supports 32-bit applications so far. I don't have access to any of the commerical tools like Purify. Maybe someone at EDB can help out with that, if you guys do?

Register states are:

RAX = 000000000043F7B0 RBX = 0000000009E32218 RCX = 0000000009E78510 RDX = 0000000009E7ABD0 RSI = 0000000009E78510 RDI = 0000000009E32218 R8  = 0000000009E7B3F8 R9  = 0000000009E7B1E8 R10 = 0000000009E7A9C0 R11 = 0000000000000001 R12 = 0000000009E32200 R13 = 0000000000000000 R14 = 0000000009E7B1E8 R15 = 0000000000000000 RIP = 00000001401A59D1 RSP = 000000000043F6E0 RBP = 0000000009E7A9C0 EFL = 00010202 

and the exact crash site is

fkselec = get_foreign_key_join_selectivity(root,
  outer_rel->relids,
  inner_rel->relids,
  sjinfo,
  &restrictlist);
00000001401A59AB  mov         r8,qword ptr [r8+8]  
00000001401A59AF  mov         rdx,qword ptr [rdx+8]  
00000001401A59B3  movaps      xmmword ptr [rax-28h],xmm6  
00000001401A59B7  movaps      xmmword ptr [rax-38h],xmm7  
00000001401A59BB  movaps      xmmword ptr [rax-48h],xmm8  
00000001401A59C0  movaps      xmmword ptr [rax-58h],xmm9  
00000001401A59C5  lea         rax,[rax+38h]  
00000001401A59C9  movaps      xmm7,xmm3  
00000001401A59CC  mov         qword ptr [rsp+20h],rax  
00000001401A59D1  movaps      xmmword ptr [rax-68h],xmm10     <---- here
00000001401A59D6  mov         qword ptr [rax-48h],r14  
00000001401A59DA  mov         r14,qword ptr [sjinfo]  
00000001401A59E2  mov         ebp,dword ptr [r14+28h]  
00000001401A59E6  mov         qword ptr [rax-50h],r15  
00000001401A59EA  mov         r9,r14  
00000001401A59ED  mov         r15,rcx  
00000001401A59F0  call        get_foreign_key_join_selectivity (01401A5C30h)  

with

XMM3 000000000000000040A5720000000000
RAX 000000000043F7B0
XMM7 000000000000000040A5720000000000
RSP 000000000043F6E0
XMM10 00000000000000000000000000000000


I'm about 100% ignorant of x64 asm, but hopefully someone can interpret this usefully. I can tell it's doing a sse "Move Aligned Packed Single-Precision Floating-Point Values" (from memory into a sse register?) but that's about it.

rax-68h is 0x000000000043F748. The memory at that location is

00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0 bf 00 00 00 00 00 00 00 00 c0 a9 e7 09 00 00 00 00 f8 b3 e7 09 00 00


So there you go, a whole bunch of data and I, at least, am still none the wiser.




--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [BUG] pg_basebackup from disconnected standby fails
Следующее
От: Etsuro Fujita
Дата:
Сообщение: Re: Postgres_fdw join pushdown - wrong results with whole-row reference