Re: BUG #15427: DROP INDEX did not free up disk space
| От | Andres Freund |
|---|---|
| Тема | Re: BUG #15427: DROP INDEX did not free up disk space |
| Дата | |
| Msg-id | 20181012045148.rhohmjjy7ehrczsi@alap3.anarazel.de обсуждение исходный текст |
| Ответ на | Re: BUG #15427: DROP INDEX did not free up disk space (Tom Lane <tgl@sss.pgh.pa.us>) |
| Список | pgsql-bugs |
Hi,
On 2018-10-12 00:33:14 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-10-11 23:57:16 -0400, Tom Lane wrote:
> >> Uh, what's that got to do with it?
>
> > If you look at the bugreport: As soon as the op, on my suggestion,
> > triggered sinval processing (by issuing a SELECT 1;) the space was
> > freed. So clearly the open FDs were part of the problem.
>
> TBH, I think the space-freeup was more likely driven off a background
> checkpoint completion, which is where the truncation happens.
Uh, as I wrote, mdunlinkfork(), which backs dropping an index via
index_drop()->RelationDropStorage() and then
smgrDoPendingDeletes()->smgrdounlinkall()->mdunlink()->mdunlinkfork(),
unlinks all segments beyond the first itself:
static void
mdunlinkfork(RelFileNodeBackend rnode, ForkNumber forkNum, bool isRedo)
{
char *path;
int ret;
path = relpath(rnode, forkNum);
/*
* Delete or truncate the first segment.
*/
if (isRedo || forkNum != MAIN_FORKNUM || RelFileNodeBackendIsTemp(rnode))
{
ret = unlink(path);
if (ret < 0 && errno != ENOENT)
ereport(WARNING,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m", path)));
}
else
{
/* truncate(2) would be easier here, but Windows hasn't got it */
int fd;
fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
if (fd >= 0)
{
int save_errno;
ret = ftruncate(fd, 0);
save_errno = errno;
CloseTransientFile(fd);
errno = save_errno;
}
else
ret = -1;
if (ret < 0 && errno != ENOENT)
ereport(WARNING,
(errcode_for_file_access(),
errmsg("could not truncate file \"%s\": %m", path)));
/* Register request to unlink first segment later */
register_unlink(rnode);
}
/*
* Delete any additional segments.
*/
if (ret >= 0)
{
char *segpath = (char *) palloc(strlen(path) + 12);
BlockNumber segno;
/*
* Note that because we loop until getting ENOENT, we will correctly
* remove all inactive segments as well as active ones.
*/
for (segno = 1;; segno++)
{
sprintf(segpath, "%s.%u", path, segno);
if (unlink(segpath) < 0)
{
/* ENOENT is expected after the last segment... */
if (errno != ENOENT)
ereport(WARNING,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m", segpath)));
break;
}
}
pfree(segpath);
}
pfree(path);
}
As you clearly can see, unlink() is called directly here for anything
but the first segment (which is registered to be unlinked in
checkpointer via register_unlink()).
Note that the checkpointer based machinery doesn't even *support*
unlinking anything beyond the first segment:
void
mdpostckpt(void)
{
...
while (pendingUnlinks != NIL)
...
/* Unlink the file */
path = relpathperm(entry->rnode, MAIN_FORKNUM);
if (unlink(path) < 0)
there's no segment handling here.
You're right that mdtruncate() leaves later segments around in a
truncated manner. But that's because of an orthogonal concern:
* The full and partial segments are collectively the "active" segments.
* Inactive segments are those that once contained data but are currently
* not needed because of an mdtruncate() operation. The reason for leaving
* them present at size zero, rather than unlinking them, is that other
* backends and/or the checkpointer might be holding open file references to
* such segments. If the relation expands again after mdtruncate(), such
* that a deactivated segment becomes active again, it is important that
* such file references still be valid --- else data might get written
* out to an unlinked old copy of a segment file that will eventually
* disappear.
that doesn't apply to dropping relations.
Greetings,
Andres Freund
В списке pgsql-bugs по дате отправления: