Обсуждение: vacuum problem

Поиск
Список
Период
Сортировка

vacuum problem

От
"John R Pierce"
Дата:
Got something really odd happening here.

Simple test program, in java, with jdbc, postgres 7.4.5, on a redhat linux
system...   app does a heavy loop of a prepared UPDATE, then Commit, 10000s
of times.   the table has a few columns, nothing fancy at all.    On our
Redhat Enterprise 2.1 server (dual xeon, 3GB ram, etc), I can't vacuum the
table it generates, it won't free the 'dead' rows...

$ vacuumdb -f -d test -t test -v
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 373553 nonremovable row versions in 3492
pages
DETAIL:  373533 dead row versions cannot be removed yet.
Nonremovable row versions range from 72 to 72 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 146596 bytes.
0 pages are or will become empty, including 0 at the end of the table.
1 pages containing 6956 free bytes are potential move destinations.
CPU 0.05s/0.37u sec elapsed 0.41 sec.
INFO:  index "test_pkey" now contains 373553 row versions in 1305 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "test": moved 0 row versions, truncated 3492 to 3492 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM



yet, on an otherwise identical RH Linux 8 system (same java, same app, same
postgres version), it works fine.   vacuum gets the free space, I don't even
have to use -f

there are no other connections to this database.   It doesn't matter if the
app is still running or not (on the RHL8 system, I can vacuum it while the
app is running, and the space is freed immediately, and the app returns to
full speed, while on the RHEL2.1 system, I can't vacuum it any which ways,
app running or not).

now, if I STOP the postgres server process, and restart it, THEN I can
vacuum it, but thats not acceptible to my actual application, its a 24/7
data extraction process.

we *have* to use RHEL2.1 for this app due to some 3rd party vendor crud
thats only supported there (had glibc problems on RHEL3 with this crudware
which couldn't be resolved by the libc-compatible libraries)

box that works...

$ uname -a
Linux svxeon1 2.4.20-28.8smp #1 SMP Thu Dec 18 12:25:21 EST 2003 i686 i686
i386 GNU/Linux


box that don't work...

$ uname -a
Linux svxeon2.netsys.stsv.seagate.com 2.4.9-e.43enterprise #1 SMP Tue Jun 29
13:57:26 EDT 2004 i686 unknown


And, finally, here's the schema for the test case that shows this problem...

    CREATE TABLE test
    (
      subject varchar(128) NOT NULL,
      seq int4 NOT NULL,
      reset_time varchar(32) NOT NULL,
      CONSTRAINT test_pkey PRIMARY KEY (subject)
    )
    WITH OIDS;

and, here's the java app (compiled and run with Sun JRE 1.4.2_something)..

(invoked with command: java -cp . PGTest -r 20 -l 50 -c 4000


//package JavaDBI;

//import Flags;
import java.io.*;
import java.sql.*;


public class PGTest
{


   public static void main(String[] args)
   {

     int rowCnt = 20, loopCnt = 50, cycle = 4000, delay = 0;
     try {

      Connection con_;
      String driver_;

      Class.forName( "org.postgresql.Driver" );
      con_ = DriverManager.getConnection(
                "jdbc:postgresql://localhost:5432/test", "test", "test");

      PreparedStatement tstStat, updStmt, insStmt, delStmt;
      tstStat = con_.prepareStatement("SELECT subject, seq, reset_time FROM
test");
      insStmt = con_.prepareStatement("INSERT INTO test VALUES (?,?,?)");
      updStmt = con_.prepareStatement("UPDATE test SET seq = ?, reset_time =
? WHERE subject = ?");
      delStmt = con_.prepareStatement("DELETE FROM test");

      ResultSet rs1_;

      Timestamp ts1;
      java.util.Date dt1 , dt2, st1, st2;

      st1 = new java.util.Date();
      dt1 = new java.util.Date();
      ts1 = new Timestamp(dt1.getTime() );
      rs1_ = tstStat.executeQuery();

      System.out.println( "PGTest App" );
      /*
      while (rs1_.next())
      {
        System.out.println("-row read-");
        System.out.println("control_id:"+rs1_.getString(1));

      }
      rs1_.close();
      System.out.println( "Read from db");
      */

      delStmt.execute();
      char tmp = 'A';
      String[]  rowS = new String[rowCnt];
      for (int j = 0; j < rowCnt; j++ )
      {
        rowS[j] = Character.toString(tmp);
        insStmt.setString(1, rowS[j]);
        insStmt.setInt(2,0);
        insStmt.setString(3, dt1.toString());
        insStmt.execute();
        tmp++;
      }
      con_.commit();


      for (int h = 0; h < cycle; h++ )
      {
        dt1 = new java.util.Date();
        for (int j = 0; j < loopCnt; j++ )
        {
          for (int i = 0; i < rowCnt; i++ )
          {
            updStmt.setInt(1, i);
            updStmt.setString(2, dt1.toString() );
            updStmt.setString(3, rowS[i]);
            updStmt.execute();
            con_.commit();
            //System.out.println( i );
            if (delay > 0)
            {
              try {
                Thread.sleep(delay); // 100 ms sleep
              } catch (Exception ex) {
                System.out.println("Waiting");
                //blah
              }
            }
          }
        }
        dt2 = new java.util.Date();
        System.out.println( "Processing took:" + (dt2.getTime() -
dt1.getTime()) +
             " for " + (rowCnt*loopCnt) + " rows");
      }

      con_.commit();
      st2 = new java.util.Date();
      System.out.println( "Total time:" + (st2.getTime() - st1.getTime()) );

     } catch (Exception ex) {
      System.out.println( "Error WR" );
      System.out.println( ex.toString() );
     }
   }

} // end PGTest

Re: vacuum problem

От
Tom Lane
Дата:
"John R Pierce" <pierce@hogranch.com> writes:
> Got something really odd happening here.
> Simple test program, in java, with jdbc, postgres 7.4.5, on a redhat linux
> system...   app does a heavy loop of a prepared UPDATE, then Commit, 10000s
> of times.   the table has a few columns, nothing fancy at all.    On our
> Redhat Enterprise 2.1 server (dual xeon, 3GB ram, etc), I can't vacuum the
> table it generates, it won't free the 'dead' rows...

You've got some background client holding a transaction open.  Or else
the test program isn't really committing when you think it is.

            regards, tom lane

Re: vacuum problem

От
"John R Pierce"
Дата:
> "John R Pierce" <pierce@hogranch.com> writes:
>> Got something really odd happening here.
>> Simple test program, in java, with jdbc, postgres 7.4.5, on a redhat
>> linux
>> system...   app does a heavy loop of a prepared UPDATE, then Commit,
>> 10000s
>> of times.   the table has a few columns, nothing fancy at all.    On our
>> Redhat Enterprise 2.1 server (dual xeon, 3GB ram, etc), I can't vacuum
>> the
>> table it generates, it won't free the 'dead' rows...
>
> You've got some background client holding a transaction open.  Or else
> the test program isn't really committing when you think it is.

there are no background programs.  I've done all the usual checking of `ps'
outputs looking for such.

in the test case I mailed to this list, I had created a standalone database
with one table, and run the test program directly against it.


*HOWEVER*...

About an hour after mailing that, I realized that the buggy RHEL2.1 system
was running with Intel Hyperthreading enabled (so it appeared as 4 CPUs)
while the no-problems RH8.0 system had hyperthreading enabled (we'd
previously been benchmarking some multithreaded stuff both ways).   So, its
*just* possible that the earlier RHEL2.1 (kernel 2.4.9-e.43enterprise) had
issues which the later RH8 (2.4.20-28.8smp) were resolved.    I'll not be
able to test this hypothesis until monday morning.

Re: vacuum problem

От
Tom Lane
Дата:
"John R Pierce" <pierce@hogranch.com> writes:
>> You've got some background client holding a transaction open.  Or else
>> the test program isn't really committing when you think it is.

> there are no background programs.  I've done all the usual checking of `ps'
> outputs looking for such.

> in the test case I mailed to this list, I had created a standalone database
> with one table, and run the test program directly against it.

That sounds suspiciously like you think that only background clients
connected to the same database count.  Actually, any other client
connected to the same *postmaster* can look like an open transaction
to VACUUM ...

            regards, tom lane

Re: vacuum problem

От
Gaetano Mendola
Дата:
John R Pierce wrote:

  > *HOWEVER*...
>
> About an hour after mailing that, I realized that the buggy RHEL2.1
> system was running with Intel Hyperthreading enabled (so it appeared as
> 4 CPUs) while the no-problems RH8.0 system had hyperthreading enabled
> (we'd previously been benchmarking some multithreaded stuff both
> ways).   So, its *just* possible that the earlier RHEL2.1 (kernel
> 2.4.9-e.43enterprise) had issues which the later RH8 (2.4.20-28.8smp)
> were resolved.    I'll not be able to test this hypothesis until monday
> morning.

I run RHEL2.1 on the same your HW with HT enabled and I never had a problem
with that.


Regards
Gaetano Mendola

Re: vacuum problem

От
"John R Pierce"
Дата:
>> About an hour after mailing that, I realized that the buggy RHEL2.1
>> system was running with Intel Hyperthreading enabled (so it appeared as 4
>> CPUs) while the no-problems RH8.0 system had hyperthreading enabled (we'd
>> previously been benchmarking some multithreaded stuff both ways).   So,
>> its *just* possible that the earlier RHEL2.1 (kernel
>> 2.4.9-e.43enterprise) had issues which the later RH8 (2.4.20-28.8smp)
>> were resolved.    I'll not be able to test this hypothesis until monday
>> morning.
>
> I run RHEL2.1 on the same your HW with HT enabled and I never had a
> problem
> with that.

are you running PGSql 7.4.x ? (we have this same issue with 7.4.2 and
7.4.5).   fyi, my RHEL2.1 has all available up2date fixes, and I did *NOT*
install RH's postgres RPM, rather, I fetched and built 7.4.5 from source

could you perhaps give my test program a try?   create a database 'test', a
user 'test' with password 'test', and table 'test' as described in my
original message, and compile and run this program:

    http://www.hogranch.com/files/PGTest.java

compile the java program with Sun J2SE JDK 1.4.x..

    $ javac PGTest.java

Temporarily disable any autovacuum cronjobs you may have, and invoke the
program with

    $ java -cp . PGTest -r 20 -l 50 -c 4000

(postgresql.jar has to be in your CLASSPATH, or you can specify it as -cp
.:/path/postgresql.jar .. I'm using the one that compiled with 7.4.5 and my
jdk 1.4.x)

and let it run for several hours.   On our dual 2.8Ghz Xeon (Intel SE7501WV2
server), it starts out around 13000mS per thousand inserts, after a hour it
slows to 20000, and a few more hours later, its up to 40000, thats when I
try a `vacuumdb -v -d test`   (or any other combination of vacuum options
that include this table in this database).  It doesn't matter if the app is
running or not.

I am unable to vacuum the freespace in this table until I stop and restart
the postgresql service.    put your autovacuum back online when you're done
messing.   I can not reproduce this problem on a RH8 (2.4.20.x) system

Re: vacuum problem

От
Gaetano Mendola
Дата:
John R Pierce wrote:

>>> About an hour after mailing that, I realized that the buggy RHEL2.1
>>> system was running with Intel Hyperthreading enabled (so it appeared
>>> as 4 CPUs) while the no-problems RH8.0 system had hyperthreading
>>> enabled (we'd previously been benchmarking some multithreaded stuff
>>> both ways).   So, its *just* possible that the earlier RHEL2.1
>>> (kernel 2.4.9-e.43enterprise) had issues which the later RH8
>>> (2.4.20-28.8smp) were resolved.    I'll not be able to test this
>>> hypothesis until monday morning.
>>
>>
>> I run RHEL2.1 on the same your HW with HT enabled and I never had a
>> problem
>> with that.
>
>
> are you running PGSql 7.4.x ? (we have this same issue with 7.4.2 and
> 7.4.5).   fyi, my RHEL2.1 has all available up2date fixes, and I did
> *NOT* install RH's postgres RPM, rather, I fetched and built 7.4.5 from
> source

I was running 7.4.2 and since 3 weeks we are running a 7.4.5
Suddenly after the upgrade an ill coded servlets ( in place long time )
started to drive Postgres to PANIC; now the servlet is fixed and the
problem disappeared. See my post: "invalid string enlargement". Tom Lane
also was able to reproduce it and I bet already he fixed the problem.

> could you perhaps give my test program a try?   create a database
> 'test', a user 'test' with password 'test', and table 'test' as
> described in my original message, and compile and run this program:
>
>    http://www.hogranch.com/files/PGTest.java
>
> compile the java program with Sun J2SE JDK 1.4.x..
>
>    $ javac PGTest.java

Unfortunatelly I can not try it for two reasons: the machine is a
production machine and I can not use it for experiments and the other
reason is that our Java reference is the 1.3.x  :-(

> (postgresql.jar has to be in your CLASSPATH, or you can specify it as
> -cp .:/path/postgresql.jar .. I'm using the one that compiled with 7.4.5
> and my jdk 1.4.x)

Which one ? I hope you are using pg74.213.jdbc3.jar

> I am unable to vacuum the freespace in this table until I stop and
> restart the postgresql service.    put your autovacuum back online when
> you're done messing.   I can not reproduce this problem on a RH8
> (2.4.20.x) system

Do you mean that is not enough stop your java application in order to
retrieve the space with a vacuum ? Do you have some connection in status:
"Idle in transaction" ?




Regards
Gaetano Mendola

Re: vacuum problem

От
"John R Pierce"
Дата:
>> there are no background programs.  I've done all the usual checking of
>> `ps'
>> outputs looking for such.
>
>> in the test case I mailed to this list, I had created a standalone
>> database
>> with one table, and run the test program directly against it.
>
> That sounds suspiciously like you think that only background clients
> connected to the same database count.  Actually, any other client
> connected to the same *postmaster* can look like an open transaction
> to VACUUM ...

indeed, I did think this.  A completely different development project on the
same server was in fact leaving a postmaster 'idle in transaction'...  I
didn't realize this would matter.

I had that developer kill his 'idle in transaction' process, and voila, I
can vacuum my table now on the fly.      I told him to fix his logic to STOP
doing that, he's a SQL novice.

so, my huge apologies for wasting the buglist's time on this.