On 2017-06-13 13:15:57 -0400, Robert Haas wrote:
> On Mon, Jun 12, 2017 at 11:04 PM, Andres Freund <andres@anarazel.de> wrote:
> > ExportSnapshot() has, right at the beginning, the following block:
> >
> > /*
> > * We cannot export a snapshot from a subtransaction because there's no
> > * easy way for importers to verify that the same subtransaction is still
> > * running.
> > */
> > if (IsSubTransaction())
> > ereport(ERROR,
> > (errcode(ERRCODE_ACTIVE_SQL_TRANSACTION),
> > errmsg("cannot export a snapshot from a subtransaction")));
> >
> > that reasoning doesn't seem to make too much sense to me. Given that
> > exported snapshots don't make the exporting-transaction's changes
> > visible, I don't see why that restriction is needed?
> >
> > As long as the exported snapshot enforces xmin to be retained, which it
> > does via the pairingheap, I don't understand why we'd have to enforce
> > that the subtransaction is still running?
>
> I think you're touching on what is perhaps the key issue here, which
> is that if it were possible to remove a tuple that the snapshot ought
> to see before the snapshot got used, that would be bad.
I don't think that's really an issue here. Exported snapshots export a
state of the database of that moment, *except* that changes made by the
exporting transaction are not visible. IOW, a subtransaction's changes
aren't going to be visible anyway.
As far as I can tell that property makes:
> I don't
> immediately see why we couldn't remove a tuple inserted by the aborted
> subtransaction immediately. On a quick look, it seems to me that
> HeapTupleSatisfiesVacuum() could fall through all the way to this
> case:
> Even If I'm wrong about that, it seems like someone might want to make
> me correct in the future - i.e. removing aborted tuples ASAP seems
> like a good idea.
> Another point to consider is whether a relfilenode assignment visible
> to that snapshot might be a file that's since been truncated or
> removed altogether.
All pretty much moot?
The reason subxids are exported right now is to *ignore* changes made by
them. But for that we could just as well set suboverflowed to true, and
just include the toplevel xid in ->xip[].
- Andres