Обсуждение: hash_search and out of memory
If OOM happens during expand_table() in hash_search_with_hash_value() for RelationCacheInsert, the hash table entry is allocated and stored in the hash table, but idhentry->reldesc remains NULL. Since OOM causes AbortTransaction(), in AtEOXact_RelationCache() this NULL pointer is referenced and we hit SIGSEGV. The fix would be either catch OOM error with PG_TRY() and undo the hash entry, or do the expansion first and put the entry later. The latter is a bit ugly because we have to re-calculate hash bucket after we decided to expand, so the former looks better. Do you think of other solutions? Thanks, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > If OOM happens during expand_table() in hash_search_with_hash_value() > for RelationCacheInsert, What OOM? expand_table is supposed to return without doing anything if it can't expand the table. If that's not happening, that's a bug in the hash code. regards, tom lane
I wrote: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> If OOM happens during expand_table() in hash_search_with_hash_value() >> for RelationCacheInsert, > What OOM? expand_table is supposed to return without doing anything > if it can't expand the table. If that's not happening, that's a bug > in the hash code. Oh, wait, I take that back --- the palloc-based allocator does throw errors. I think that when that was designed, we were thinking that palloc-based hash tables would be thrown away anyway after an error, but of course that's not true for long-lived tables such as the relcache hash table. I'm not terribly comfortable with trying to use a PG_TRY block to catch an OOM error - there are too many ways that could break, and this code path is by definition not very testable. I think moving up the expand_table action is probably the best bet. Will you submit a patch? regards, tom lane
On Thu, Oct 18, 2012 at 8:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Hitoshi Harada <umi.tanuki@gmail.com> writes: >>> If OOM happens during expand_table() in hash_search_with_hash_value() >>> for RelationCacheInsert, > > the palloc-based allocator does throw > errors. I think that when that was designed, we were thinking that > palloc-based hash tables would be thrown away anyway after an error, > but of course that's not true for long-lived tables such as the relcache > hash table. > > I'm not terribly comfortable with trying to use a PG_TRY block to catch > an OOM error - there are too many ways that could break, and this code > path is by definition not very testable. I think moving up the > expand_table action is probably the best bet. Will you submit a patch? Here it is. I factored out the bucket finding code to re-calculate it after expansion. Thanks, -- Hitoshi Harada
Вложения
Hitoshi Harada <umi.tanuki@gmail.com> writes: > On Thu, Oct 18, 2012 at 8:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not terribly comfortable with trying to use a PG_TRY block to catch >> an OOM error - there are too many ways that could break, and this code >> path is by definition not very testable. I think moving up the >> expand_table action is probably the best bet. Will you submit a patch? > Here it is. I factored out the bucket finding code to re-calculate it > after expansion. I didn't like that too much. I think a better solution is just to do the table expansion at the very start of the function, along the lines of the attached patch. This requires an extra test of the "action" parameter, but I think that's probably cheaper than an extra function call. It's definitely cheaper than recalculating the hash etc when a split does occur. regards, tom lane diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 6d0e0f5366b5a733af9bc712d8ccdb9850721f98..31ac2b25e8ffa2f50dfa4c9bde0d248baf2c0979 100644 *** a/src/backend/utils/hash/dynahash.c --- b/src/backend/utils/hash/dynahash.c *************** *** 21,26 **** --- 21,31 ---- * lookup key's hash value as a partition number --- this will work because * of the way calc_bucket() maps hash values to bucket numbers. * + * For hash tables in shared memory, the memory allocator function should + * match malloc's semantics of returning NULL on failure. For hash tables + * in local memory, we typically use palloc() which will throw error on + * failure. The code in this file has to cope with both cases. + * * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * *************** hash_search_with_hash_value(HTAB *hashp, *** 821,826 **** --- 826,852 ---- #endif /* + * If inserting, check if it is time to split a bucket. + * + * NOTE: failure to expand table is not a fatal error, it just means we + * have to run at higher fill factor than we wanted. However, if we're + * using the palloc allocator then it will throw error anyway on + * out-of-memory, so we must do this before modifying the table. + */ + if (action == HASH_ENTER || action == HASH_ENTER_NULL) + { + /* + * Can't split if running in partitioned mode, nor if frozen, nor if + * table is the subject of any active hash_seq_search scans. Strange + * order of these tests is to try to check cheaper conditions first. + */ + if (!IS_PARTITIONED(hctl) && !hashp->frozen && + hctl->nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor && + !has_seq_scans(hashp)) + (void) expand_table(hashp); + } + + /* * Do the initial lookup */ bucket = calc_bucket(hctl, hashvalue); *************** hash_search_with_hash_value(HTAB *hashp, *** 940,963 **** currBucket->hashvalue = hashvalue; hashp->keycopy(ELEMENTKEY(currBucket), keyPtr, keysize); - /* caller is expected to fill the data field on return */ - /* ! * Check if it is time to split a bucket. Can't split if running ! * in partitioned mode, nor if table is the subject of any active ! * hash_seq_search scans. Strange order of these tests is to try ! * to check cheaper conditions first. */ - if (!IS_PARTITIONED(hctl) && - hctl->nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor && - !has_seq_scans(hashp)) - { - /* - * NOTE: failure to expand table is not a fatal error, it just - * means we have to run at higher fill factor than we wanted. - */ - expand_table(hashp); - } return (void *) ELEMENTKEY(currBucket); } --- 966,977 ---- currBucket->hashvalue = hashvalue; hashp->keycopy(ELEMENTKEY(currBucket), keyPtr, keysize); /* ! * Caller is expected to fill the data field on return. DO NOT ! * insert any code that could possibly throw error here, as doing ! * so would leave the table entry incomplete and hence corrupt the ! * caller's data structure. */ return (void *) ELEMENTKEY(currBucket); }
On Fri, Oct 19, 2012 at 11:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hitoshi Harada <umi.tanuki@gmail.com> writes: >> On Thu, Oct 18, 2012 at 8:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'm not terribly comfortable with trying to use a PG_TRY block to catch >>> an OOM error - there are too many ways that could break, and this code >>> path is by definition not very testable. I think moving up the >>> expand_table action is probably the best bet. Will you submit a patch? > >> Here it is. I factored out the bucket finding code to re-calculate it >> after expansion. > > I didn't like that too much. I think a better solution is just to do > the table expansion at the very start of the function, along the lines > of the attached patch. This requires an extra test of the "action" > parameter, but I think that's probably cheaper than an extra function > call. It's definitely cheaper than recalculating the hash etc when > a split does occur. > OK. Looks better. But nentries should be bogus a little now? + /* + * Can't split if running in partitioned mode, nor if frozen, nor if + * table is the subject of any active hash_seq_search scans. Strange + * order of these tests is to try to check cheaper conditions first. + */ + if (!IS_PARTITIONED(hctl) && !hashp->frozen && + hctl->nentries / (long) (hctl->max_bucket + 1) >= hctl->ffactor && + !has_seq_scans(hashp)) + (void) expand_table(hashp); hctl->nentries + 1 sounds appropriate? Thanks, -- Hitoshi Harada
Hitoshi Harada <umi.tanuki@gmail.com> writes: > OK. Looks better. But nentries should be bogus a little now? No, I think it's fine as is. Essentially this logic says "attempt to split when the new insertion would make us go over the target fill factor", whereas the old logic split when the just-completed insertion reached the target. There's not a lot of reason to care about the precise boundary anyway, but to the extent that you believe that the target is exact, I think this behavior is actually a bit more precise. regards, tom lane