Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
От | Tomas Vondra |
---|---|
Тема | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop |
Дата | |
Msg-id | 0c3d7ca9-1619-8b78-0a33-277a08db335a@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop (Andres Freund <andres@anarazel.de>) |
Список | pgsql-bugs |
On 12/06/2017 10:12 PM, Andres Freund wrote: > On 2017-12-06 22:05:22 +0100, Tomas Vondra wrote: >> On 12/06/2017 09:46 PM, Andres Freund wrote: >>> On 2017-12-06 21:38:42 +0100, Tomas Vondra wrote: >>>> It's one thing when the hash table takes longer to lookup something or >>> >>> longer aka "forever". >>> >> >> Not necessarily. > >> The datasets I shared are somewhat extreme in the sense that there are >> many contiguous sequences of hash values, but it only takes one such >> sequence with at least SH_GROW_MAX_MOVE values to trigger the issue. So >> the hash table may still be perfectly fine for most keys, and only >> slightly slower for the keys in the sequence. > > Meh, we're talking about adversarial attacks here. > > >>>> when it consumes a bit more memory. Say, ~2x more than needed, give or >>>> take. I'm perfectly fine with that, particularly when it's a worst-case >>>> evil data set like this one. >>> >>> I think the way to prevent that kind of attack is to add randomization. >>> >> >> By randomization you mean universal hashing [1], or something else? > > No, adding in a random seed to the hash. It'd be a lot better if we > had a way to provide internal state to the hashfunction, but even > just using hash_combine()ing a random number into a hash would be a > lot better than what we're doing now. > The universal hashing (of "our" hash value) would have the benefit that we could change it when growing the hash table. I.e. do this h(x) = ((A * hash_any(x) + B) mod P)) mod M and generate new A, B, P whenever we grow the hashtable. That would further reduce the probability of either of the failures described in this thread (I mean, hitting it again after growing the table). I don't know if it's worth the additional CPU cycles, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления: