I was recently running some tests with huge page tables. I ran them on two different architectures: x86 and PPC64.
I saw some discussion going on over here so thought of sharing.
I was using 3 Cores, 8GB RAM, 2 LUN for filesystem (1 for dbfiles and 1 for logfiles) for these tests...
I had dedicated
(shared_buffers + 400bytes*max_connection + wal_buffers)/Pagesize [from /proc/meminfo] for huge pages. I kept some overcommit_hugepages to be used by work_mem (max_connection*work_mem)/Pagesize
x86_64 bit gave me a benefit of 2-5% for TPC-C workload( I scaled from 1 to 100 users). PPC64 which uses 16MB and 64MB did not give me any benefits in fact the performance degraded as the concurrency of system increased.
my 2 cents, hope it helps.