Обсуждение: Problem on AIX with current

Поиск
Список
Период
Сортировка

Problem on AIX with current

От
Tatsuo Ishii
Дата:
Per Tom's request(1000 concurrent backends), I tried current on IBM
AIX 5L and found that make check hungs:

parallel group (13 tests): float4 oid varchar

pgbench hungs too if more than 4 or so concurrent backends are
involved. Unfortunately gdb does not work well on AIX, so I'm stucked.
Maybe a new locking code?

BTW PostgreSQL 7.1.3 works fine.
--
Tatsuo Ishii


Re: Problem on AIX with current

От
"Zeugswetter Andreas SB SD"
Дата:
> Per Tom's request(1000 concurrent backends), I tried current on IBM
> AIX 5L and found that make check hungs:
> 
> parallel group (13 tests): float4 oid varchar
> 
> pgbench hungs too if more than 4 or so concurrent backends are
> involved.

I once had hangs during make check on AIX 4, but after make distclean
and 
rebuild was never able to reproduce.

Can you read the man page for cs(3), AIX 4 sais it is not recommended
suggests to use compare_and_swap, maybe AIX 5 has more to say ?

> Unfortunately gdb does not work well on AIX, so I'm stucked.
> Maybe a new locking code?

Use dbx (and ddd) ?

I don't have access to AIX 5.

Andreas


Re: Problem on AIX with current

От
Tatsuo Ishii
Дата:
> > Per Tom's request(1000 concurrent backends), I tried current on IBM
> > AIX 5L and found that make check hungs:
> > 
> > parallel group (13 tests): float4 oid varchar
> > 
> > pgbench hungs too if more than 4 or so concurrent backends are
> > involved.
> 
> I once had hangs during make check on AIX 4, but after make distclean
> and 
> rebuild was never able to reproduce.

I thing I did make distclean.

> Can you read the man page for cs(3), AIX 4 sais it is not recommended
> suggests to use compare_and_swap, maybe AIX 5 has more to say ?
   Note: The cs subroutine is only provided to support binary   compatibility with AIX Version 3 applications.
Whenwriting new   applications, it is not recommended to use this subroutine; it may cause   reduced performance in the
future.Applications should use the   compare_and_swap (compare_and_swap Subroutine) subroutine, unless they   need
touse unaligned memory locations.
 

Seems same as AIX 4?

> > Unfortunately gdb does not work well on AIX, so I'm stucked.
> > Maybe a new locking code?
> 
> Use dbx (and ddd) ?

Here is a stack trace using dbx.

semop(??, ??, ??) at 0xd02be73c
IpcSemaphoreLock(??, ??, ??), line 425 in "ipc.c"
LWLockAcquire(??, ??), line 270 in "lwlock.c"
LockAcquire(??, ??, ??, ??, ??), line 482 in "lock.c"
LockRelation(??, ??), line 153 in "lmgr.c"
heap_openr(??, ??), line 512 in "heapam.c"
scan_pg_rel_ind(??, ??), line 380 in "relcache.c"
ScanPgRelation(??, ??), line 307 in "relcache.c"
IndexedAccessMethodInitialize(??, ??, ??), line 994 in "relcache.c"
RelationNameGetRelation(??), line 1484 in "relcache.c"
heap_openr(??, ??), line 502 in "heapam.c"
setTargetTable(??, ??, ??, ??), line 136 in "parse_clause.c"
transformUpdateStmt(??, ??), line 2416 in "analyze.c"
transformStmt(??, ??), line 228 in "analyze.c"
parse_analyze(??, ??), line 92 in "analyze.c"
pg_analyze_and_rewrite(??), line 428 in "postgres.c"
unnamed block $b1877, line 740 in "postgres.c"
unnamed block $b1876, line 740 in "postgres.c"
unnamed block $b1872, line 740 in "postgres.c"
pg_exec_query_string(??, ??, ??), line 740 in "postgres.c"
PostgresMain(??, ??, ??, ??, ??), line 1943 in "postgres.c"
DoBackend(??), line 2104 in "postmaster.c"
BackendStartup(??), line 1837 in "postmaster.c"
unnamed block $b1665, line 917 in "postmaster.c"
ServerLoop(), line 917 in "postmaster.c"
PostmasterMain(??, ??), line 712 in "postmaster.c"
main(argc = 0, argv = (nil)), line 178 in "main.c"


Re: Problem on AIX with current

От
"Zeugswetter Andreas SB SD"
Дата:
... cs(3)
> > Seems same as AIX 4?

Yes, identical.

> 
> Hmm, does anyone want to produce new s_lock code for AIX that uses
> compare_and_swap?  But I'm not sure that's the problem here.

I did once, but performance was worse, so I discarded it.
Since AIX 5 still has it, I see no reason to change it.

Still, testing it on AIX 5 might reveal that compare_and_swap 
is now faster, Tatsuo ?

Andreas


Re: Problem on AIX with current

От
Tom Lane
Дата:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> Can you read the man page for cs(3), AIX 4 sais it is not recommended
>> suggests to use compare_and_swap, maybe AIX 5 has more to say ?

>     Note: The cs subroutine is only provided to support binary
>     compatibility with AIX Version 3 applications. When writing new
>     applications, it is not recommended to use this subroutine; it may cause
>     reduced performance in the future. Applications should use the
>     compare_and_swap (compare_and_swap Subroutine) subroutine, unless they
>     need to use unaligned memory locations.

> Seems same as AIX 4?

Hmm, does anyone want to produce new s_lock code for AIX that uses
compare_and_swap?  But I'm not sure that's the problem here.

> Here is a stack trace using dbx.

> semop(??, ??, ??) at 0xd02be73c
> IpcSemaphoreLock(??, ??, ??), line 425 in "ipc.c"
> LWLockAcquire(??, ??), line 270 in "lwlock.c"
> LockAcquire(??, ??, ??, ??, ??), line 482 in "lock.c"

This process is waiting to acquire the LockMgr lock.  You need to look
at the rest of the processes and try to figure out who's got the lock.
        regards, tom lane


Re: Problem on AIX with current

От
Tatsuo Ishii
Дата:
> > Here is a stack trace using dbx.
> 
> > semop(??, ??, ??) at 0xd02be73c
> > IpcSemaphoreLock(??, ??, ??), line 425 in "ipc.c"
> > LWLockAcquire(??, ??), line 270 in "lwlock.c"
> > LockAcquire(??, ??, ??, ??, ??), line 482 in "lock.c"
> 
> This process is waiting to acquire the LockMgr lock.  You need to look
> at the rest of the processes and try to figure out who's got the lock.

Strange enough, there's no other backend (of course except stats
collectors) here. I make sure this with ps and pg_stat_activity view.

BTW pg_stat_activity view shows:

16556 | test    |  197378 |        1 | postgres | update accounts set abalance = abalance + 406, filler = 'added amount
toabalance is 406' where aid = 1447
 
--
Tatsuo Ishii


Re: Problem on AIX with current

От
Tom Lane
Дата:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> Strange enough, there's no other backend (of course except stats
> collectors) here. I make sure this with ps and pg_stat_activity view.

If you have no better way of determining what's going on, it might help
to recompile with LOCK_DEBUG defined, then enable trace_lwlocks in
postgresql.conf (better turn on debug_print_query, log_timestamp, and
log_pid too).  This will generate rather voluminous log output, perhaps
enough to provide a clue.
        regards, tom lane


Re: Problem on AIX with current

От
Tatsuo Ishii
Дата:
> If you have no better way of determining what's going on, it might help
> to recompile with LOCK_DEBUG defined, then enable trace_lwlocks in
> postgresql.conf (better turn on debug_print_query, log_timestamp, and
> log_pid too).  This will generate rather voluminous log output, perhaps
> enough to provide a clue.

When I recompiled with LOCK_DEBUG and trace_lwlocks = true, it *works*
(and saw lots of lock debugging messages, of course). However if I
turn trace_lwlocks to off, the backend stucks again. Is there anything
I can do?

Note the machine has 4 processors. Is that related to?
--
Tatsuo Ishii


Re: Problem on AIX with current

От
Tom Lane
Дата:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> When I recompiled with LOCK_DEBUG and trace_lwlocks = true, it *works*
> (and saw lots of lock debugging messages, of course). However if I
> turn trace_lwlocks to off, the backend stucks again.

Ugh ... ye classic Heisenbug ...

> Is there anything I can do?

Apparently the problem is timing-sensitive, which is hardly surprising
for a lock issue.  You might find that it occurs some of the time if
you repeat the test over and over.

> Note the machine has 4 processors. Is that related to?

Hard to tell at this point, but considering that no one else has
reported a problem so far, it does seem like multiple CPUs at least
help to make the failure more probable.  But it could just be a
portability problem.  Do you have another machine with identical OS
and fewer processors to try for comparison?

Andreas, have you tried CVS tip lately on AIX?  What's your results?
        regards, tom lane


Re: Problem on AIX with current

От
"Zeugswetter Andreas SB SD"
Дата:
> Andreas, have you tried CVS tip lately on AIX?  What's your results?

All 77 ok, no hangs, with make check on single CPU AIX 4.3.2. 
Only problem on AIX is, that the argv[0] stuff does not work anymore
(I think since we don't exec() anymore), which is rather annoying.

Andreas


Re: Problem on AIX with current

От
Tom Lane
Дата:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
> Only problem on AIX is, that the argv[0] stuff does not work anymore
> (I think since we don't exec() anymore), which is rather annoying.

Hmm, perhaps we are selecting the wrong PS_STRINGS method for AIX?
Please look at src/backend/utils/misc/ps_status.c and see if one of
the other methods will work on AIX.
        regards, tom lane


Re: Problem on AIX with current

От
"Zeugswetter Andreas SB SD"
Дата:
> > Only problem on AIX is, that the argv[0] stuff does not work anymore
> > (I think since we don't exec() anymore), which is rather annoying.
>
> Hmm, perhaps we are selecting the wrong PS_STRINGS method for AIX?
> Please look at src/backend/utils/misc/ps_status.c and see if one of
> the other methods will work on AIX.

Yes, I see. Quite silly that I did not look earlier.
The compiler does not define _AIX4 or _AIX3, no idea who thought that.
It only defines _AIX, _AIX32, _AIX41 and _AIX43.

I am quite sure that all AIX Versions accept the CLOBBER method,
thus I ask you to apply the following patch, to make it work.

Andreas

Вложения

Re: Problem on AIX with current

От
Bruce Momjian
Дата:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

>
> > > Only problem on AIX is, that the argv[0] stuff does not work anymore
> > > (I think since we don't exec() anymore), which is rather annoying.
> >
> > Hmm, perhaps we are selecting the wrong PS_STRINGS method for AIX?
> > Please look at src/backend/utils/misc/ps_status.c and see if one of
> > the other methods will work on AIX.
>
> Yes, I see. Quite silly that I did not look earlier.
> The compiler does not define _AIX4 or _AIX3, no idea who thought that.
> It only defines _AIX, _AIX32, _AIX41 and _AIX43.
>
> I am quite sure that all AIX Versions accept the CLOBBER method,
> thus I ask you to apply the following patch, to make it work.
>
> Andreas

Content-Description: ps_status.patch

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Problem on AIX with current

От
Tatsuo Ishii
Дата:
> > > Only problem on AIX is, that the argv[0] stuff does not work anymore
> > > (I think since we don't exec() anymore), which is rather annoying.
> >
> > Hmm, perhaps we are selecting the wrong PS_STRINGS method for AIX?
> > Please look at src/backend/utils/misc/ps_status.c and see if one of
> > the other methods will work on AIX.
>
> Yes, I see. Quite silly that I did not look earlier.
> The compiler does not define _AIX4 or _AIX3, no idea who thought that.
> It only defines _AIX, _AIX32, _AIX41 and _AIX43.
>
> I am quite sure that all AIX Versions accept the CLOBBER method,
> thus I ask you to apply the following patch, to make it work.

CLOBBER does not work with AIX5L, nor CHANGE_ARGV. (SETPROCTITLE,
PSTAT and PS_STRINGS can not be used since AIX5L does not have
appropreate header files).
--
Tatsuo Ishii


Re: Problem on AIX with current

От
Bruce Momjian
Дата:
Patch rejected, please resubmit:

CLOBBER does not work with AIX5L, nor CHANGE_ARGV. (SETPROCTITLE,
PSTAT and PS_STRINGS can not be used since AIX5L does not have
appropreate header files).
--
Tatsuo Ishii

>
> > > Only problem on AIX is, that the argv[0] stuff does not work anymore
> > > (I think since we don't exec() anymore), which is rather annoying.
> >
> > Hmm, perhaps we are selecting the wrong PS_STRINGS method for AIX?
> > Please look at src/backend/utils/misc/ps_status.c and see if one of
> > the other methods will work on AIX.
>
> Yes, I see. Quite silly that I did not look earlier.
> The compiler does not define _AIX4 or _AIX3, no idea who thought that.
> It only defines _AIX, _AIX32, _AIX41 and _AIX43.
>
> I am quite sure that all AIX Versions accept the CLOBBER method,
> thus I ask you to apply the following patch, to make it work.
>
> Andreas

Content-Description: ps_status.patch

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Problem on AIX with current

От
"Zeugswetter Andreas SB SD"
Дата:
> > I am quite sure that all AIX Versions accept the CLOBBER method,
> > thus I ask you to apply the following patch, to make it work.
> 
> CLOBBER does not work with AIX5L, nor CHANGE_ARGV. (SETPROCTITLE,
> PSTAT and PS_STRINGS can not be used since AIX5L does not have
> appropreate header files).

Have you actually tried my patch, and what was the effect ? 
The previous code was wrong, since it did not do any PS magic,
it defaulted to PS_USE_NONE.

Else can you please tell me a predefine for AIX5, thanks. 

Andreas


Re: Problem on AIX with current

От
Tatsuo Ishii
Дата:
> > > I am quite sure that all AIX Versions accept the CLOBBER method,
> > > thus I ask you to apply the following patch, to make it work.
> > 
> > CLOBBER does not work with AIX5L, nor CHANGE_ARGV. (SETPROCTITLE,
> > PSTAT and PS_STRINGS can not be used since AIX5L does not have
> > appropreate header files).
> 
> Have you actually tried my patch, and what was the effect ? 
> The previous code was wrong, since it did not do any PS magic,
> it defaulted to PS_USE_NONE.

To make sure I did everything correctly, I cvsed fresh sources and
applied your patches again. The result: It works fine! I don't know
why, but I must have done something wrong.:-< Sorry for the wrong
info. Bruce, please apply the patches.

BTW, still I'm getting the stucking backends. New info: a snapshot
dated on 10/3 works fine.
--
Tatsuo Ishii


Re: Problem on AIX with current

От
"Zeugswetter Andreas SB SD"
Дата:
> BTW, still I'm getting the stucking backends. New info: a snapshot
> dated on 10/3 works fine.

I allways have trouble with those different date formats. Do you
mean, that the problem is fixed as of 3. October, or that an old
snapshot from 10. March still worked ?

Snapshot of 1. Oct 2001 does not hang in "make check" on AIX 4.3.2
4 CPU machine.
So it seems to be a problem on AIX5L only :-( Maybe a semaphore bug ? 

Andreas


Re: Problem on AIX with current

От
Tatsuo Ishii
Дата:
> > BTW, still I'm getting the stucking backends. New info: a snapshot
> > dated on 10/3 works fine.
> 
> I allways have trouble with those different date formats. Do you
> mean, that the problem is fixed as of 3. October, or that an old
> snapshot from 10. March still worked ?

Of course the working source is 3rd October.

> Snapshot of 1. Oct 2001 does not hang in "make check" on AIX 4.3.2
> 4 CPU machine.

Oh, you have 4 way machine too?

> So it seems to be a problem on AIX5L only :-( Maybe a semaphore bug ? 

Maybe. BTW, what is your compiler? I'm using xlc.
--
Tatsuo Ishii


Re: Problem on AIX with current

От
"Zeugswetter Andreas SB SD"
Дата:
> > > BTW, still I'm getting the stucking backends. New info: a snapshot
> > > dated on 10/3 works fine.
> > 
> > I allways have trouble with those different date formats. Do you
> > mean, that the problem is fixed as of 3. October, or that an old
> > snapshot from 10. March still worked ?
> 
> Of course the working source is 3rd October.

Tom, do you have an idea what you might have fixed to that effect ?

> 
> > Snapshot of 1. Oct 2001 does not hang in "make check" on AIX 4.3.2
> > 4 CPU machine.
> 
> Oh, you have 4 way machine too?

Well, the company I work for has all sorts of AIX hardware, but no AIX5
yet.
I usually use a 43P 150 with one 604e CPU for development and testing,
but "borrowed" another one to test the 4 CPU hang :-)

> > So it seems to be a problem on AIX5L only :-( Maybe a 
> semaphore bug ? 
> 
> Maybe. BTW, what is your compiler? I'm using xlc.

Same here, xlc from VisualAge C++, maybe other version: 
vac.C                      5.0.1.3  COMMITTED  C for AIX Compiler

I made the experience, that gcc compiled code is somewhat slower.

Andreas


Re: Problem on AIX with current

От
Bruce Momjian
Дата:
> 
> > > I am quite sure that all AIX Versions accept the CLOBBER method,
> > > thus I ask you to apply the following patch, to make it work.
> > 
> > CLOBBER does not work with AIX5L, nor CHANGE_ARGV. (SETPROCTITLE,
> > PSTAT and PS_STRINGS can not be used since AIX5L does not have
> > appropreate header files).
> 
> Have you actually tried my patch, and what was the effect ? 
> The previous code was wrong, since it did not do any PS magic,
> it defaulted to PS_USE_NONE.
> 
> Else can you please tell me a predefine for AIX5, thanks. 

Patch applied.  Thanks.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Problem on AIX with current

От
Tom Lane
Дата:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
>> Of course the working source is 3rd October.

> Tom, do you have an idea what you might have fixed to that effect ?

No idea.  I've been fixing some portability issues in dynahash.c,
but AFAIK they only affected the pgstats collector process not backends.
Also, that breakage had existed for months...
        regards, tom lane