Обсуждение: PGCopyOutputStream close() v. endCopy()

Поиск
Список
Период
Сортировка

PGCopyOutputStream close() v. endCopy()

От
Rob Sargent
Дата:

This is a repost of original on pgsql-general.

As is often the case, I'm unsure of which of these methods to use, or if I'm using them correctly.

PG10.5, jooq-3.10.8, postgresql-42.1.4, linux (redhat 6.9) and logback to a file.

I have been using close() for a while but thought I would make use of either the returned long from endCopy() or perhaps getHandledRowCount().

Both work perfectly, but when I use endCopy() I always get the exception shown near the bottom of this log excerpt. The COPY is on its own thread from the same connection as the direct jooq writes also listed.  Again, the data is all saved but I am worried that I'm not closing properly even if I use close(). The data here doesn't warrant bulk copy but it's just a quick example to repeat.

13:32:55.785 [pool-1-thread-1] DEBUG edu.utah.camplab.jx.PayloadFromMux - STAGING TABLE CREATED: bulk."flk_g16-forcing very long name to trigger truncation_22_8045c0"
13:32:55.786 [pool-1-thread-1] INFO  edu.utah.camplab.jx.PayloadFromMux - 8045c057-99ec-490b-90c1-85875269afee: started COPY work at 1546979575786
13:32:55.789 [pool-1-thread-1] INFO  edu.utah.camplab.jx.PayloadFromMux - 8045c057-99ec-490b-90c1-85875269afee: Total segment save took 22 ms
13:32:55.790 [pool-1-thread-1] INFO  edu.utah.camplab.jx.AbstractPayload - 8045c057-99ec-490b-90c1-85875269afee: closing process 8045c057-99ec-490b-90c1-85875269afee
13:32:55.790 [8045c057-99ec-490b-90c1-85875269afee] INFO  e.u.camplab.jx.PayloadWriterThread - bulk."flk_g16-forcing very long name to trigger truncation_22_8045c0": Begin bulk copy segment
13:32:55.793 [8045c057-99ec-490b-90c1-85875269afee] INFO  e.u.camplab.jx.PayloadWriterThread - bulked up to 89, maybe?
13:32:55.793 [pool-1-thread-1] DEBUG org.jooq.tools.LoggerListener - Executing batch query    : insert into "process_input" ("id", "process_id", "input_type", "input_ref") values (?, ?, ?, ?)
13:32:55.795 [8045c057-99ec-490b-90c1-85875269afee] INFO  e.u.camplab.jx.PayloadWriterThread - bulked up to 197, maybe?
13:32:55.797 [8045c057-99ec-490b-90c1-85875269afee] INFO  e.u.camplab.jx.PayloadWriterThread - bulked up to 318, maybe?
13:32:55.798 [8045c057-99ec-490b-90c1-85875269afee] INFO  e.u.camplab.jx.PayloadWriterThread - bulked up to 393, maybe?
13:32:55.799 [8045c057-99ec-490b-90c1-85875269afee] INFO  e.u.camplab.jx.PayloadWriterThread - 393/393 segments delivered in 9 ms
13:32:55.799 [8045c057-99ec-490b-90c1-85875269afee] DEBUG e.u.camplab.jx.PayloadWriterThread - staged in 9 ms
13:32:55.800 [pool-1-thread-1] DEBUG org.jooq.tools.LoggerListener - Executing batch query    : insert into "process_output" ("id", "process_id", "output_type", "output_ref") values (?, ?, ?, ?)
13:32:55.805 [8045c057-99ec-490b-90c1-85875269afee] ERROR e.u.camplab.jx.PayloadWriterThread - bulk."flk_g16-forcing very long name to trigger truncation_22_8045c0": i/o trouble
java.io.IOException: Ending write to copy failed.
    at org.postgresql.copy.PGCopyOutputStream.close(PGCopyOutputStream.java:107) ~[postgresql-42.1.4.jar:42.1.4]
    at edu.utah.camplab.jx.PayloadWriterThread.run(PayloadWriterThread.java:75) ~[transport/:na]
Caused by: org.postgresql.util.PSQLException: Tried to write to an inactive copy operation
    at org.postgresql.core.v3.QueryExecutorImpl.writeToCopy(QueryExecutorImpl.java:978) ~[postgresql-42.1.4.jar:42.1.4]
    at org.postgresql.core.v3.CopyInImpl.writeToCopy(CopyInImpl.java:35) ~[postgresql-42.1.4.jar:42.1.4]
    at org.postgresql.copy.PGCopyOutputStream.endCopy(PGCopyOutputStream.java:166) ~[postgresql-42.1.4.jar:42.1.4]
    at org.postgresql.copy.PGCopyOutputStream.close(PGCopyOutputStream.java:105) ~[postgresql-42.1.4.jar:42.1.4]
    ... 1 common frames omitted
13:32:55.810 [pool-1-thread-1] DEBUG org.jooq.tools.LoggerListener - Executing batch query    : insert into "process_arg" ("id", "process_id", "argname", "argvalue_int", "argvalue_float", "argvalue_text") values (?, ?, ?, ?, ?, ?)

The class doing the bulk work, PayloadWriterThread extends Thread, the thread name is set from the caller, itself in an executor, with the critical parts as follows:

    @Override
    public void run() {
        try (PGCopyOutputStream writer = new PGCopyOutputStream(copyIn)) {
            long startAt = System.currentTimeMillis();
            deliverSegments(writer);
            long postDeliver = System.currentTimeMillis();
            logger.debug("staged in {} ms", postDeliver - startAt);
        }
        catch (SQLException sqle) {
            sqle.printStackTrace();
            logger.error("{}: sql trouble", tableName, sqle);
            throw new RuntimeException("Bulk copy failed on sql", sqle);
        }
        catch (IOException ioe) {
            ioe.printStackTrace();
            logger.error("{}: i/o trouble", tableName, ioe);
            throw new RuntimeException("Bulk copy failed on i/o", ioe);
        }
    }
    private void deliverSegments(PGCopyOutputStream writer) throws IOException, SQLException {
        logger.info("{}: Begin bulk copy segment", tableName);

        long counter = 0;
        long copied = 0;
        //These had better end up the same number
        long st = System.currentTimeMillis();
        for (Map.Entry<String, List<Segment>> seglistME : segmentListMap.entrySet()) {
            String pbsCSV = seglistME.getKey();
            UUID currentPbsId = probandsetIdMap.get(pbsCSV);
            if (currentPbsId == null) {
                throw new RuntimeException("Could not find pbsId by " + pbsCSV);
            }
            counter += seglistME.getValue().size();
            for (Segment segment : seglistME.getValue()) {
                String segmentCSV = makeCSV(segment, currentPbsId);
                writer.write(segmentCSV.getBytes());
            }
            logger.info("bulked up to {}, maybe?", counter);
        }
        copied = writer.endCopy();
        logger.info("{}/{} segments delivered in {} ms", copied, counter, System.currentTimeMillis() - st);
    }

Also, the stack trace always references the line of the closing brace of the try-with-resources in the run() method and not on the call to endCopy. Because it's a caught exception? Or it's closing a closed writer? Or injection from somewhere?


Re: PGCopyOutputStream close() v. endCopy()

От
Sehrope Sarkuni
Дата:
You're getting that error because endCopy() is being called twice. Once by you explicitly in deliverSegments() and again when the PGCopyOutputStream gets automatically closed by the try-with-resources: https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/copy/PGCopyOutputStream.java#L98-L112

PGCopyOutputStream is a wrapper CopyIn (write to OutputStream => COPY TO STDIN) that adds some buffering. You shouldn't be calling endCopy() on it.

You can either:

1. Remove the PGCopyOutputStream.endCopy() entirely as close() will handle it. If do not need the final row count then you're done.
2. Use CopyIn directly, manage the buffer yourself, and call endCopy() on it.
3. Manually manage the resources for PGCopyOutputStream so you can reference it for getHandledRowCount() after the close().
4. Call close() instead of endCopy() on PGCopyOutputStream as it handles being invoked more than once (it's a no-op for subsequent close() calls).

I'm going to take a look why PGCopyOutputStream.endCopy() is declared public as currently there's no way to call both it and close() without getting an error, and you're definitely supposed to call close(). Either it should not be public or it should be performing the same work as close(), i.e. clearing the internal "op" (copy operation) member to indicate that the close is complete so subsequent close() calls are not errant.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: PGCopyOutputStream close() v. endCopy()

От
Rob Sargent
Дата:

Thanks you, very much. 

I'll revert to using close() (even though the try-with will also as it make me more comfortable).

On 1/11/19 6:36 AM, Sehrope Sarkuni wrote:
You're getting that error because endCopy() is being called twice. Once by you explicitly in deliverSegments() and again when the PGCopyOutputStream gets automatically closed by the try-with-resources: https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/copy/PGCopyOutputStream.java#L98-L112

PGCopyOutputStream is a wrapper CopyIn (write to OutputStream => COPY TO STDIN) that adds some buffering. You shouldn't be calling endCopy() on it.

You can either:

1. Remove the PGCopyOutputStream.endCopy() entirely as close() will handle it. If do not need the final row count then you're done.
2. Use CopyIn directly, manage the buffer yourself, and call endCopy() on it.
3. Manually manage the resources for PGCopyOutputStream so you can reference it for getHandledRowCount() after the close().
4. Call close() instead of endCopy() on PGCopyOutputStream as it handles being invoked more than once (it's a no-op for subsequent close() calls).

I'm going to take a look why PGCopyOutputStream.endCopy() is declared public as currently there's no way to call both it and close() without getting an error, and you're definitely supposed to call close(). Either it should not be public or it should be performing the same work as close(), i.e. clearing the internal "op" (copy operation) member to indicate that the close is complete so subsequent close() calls are not errant.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/