Обсуждение: Hash table scans outside transactions
Hi, Hash table scans (seq_scan_table/level) are cleaned up at the end of a transaction in AtEOXact_HashTables(). If a hash seq scan continues beyond transaction end it will meet "ERROR: no hash_seq_search scan for hash table" in deregister_seq_scan(). That seems like a limiting the hash table usage. Our use case is 1. Add/update/remove entries in hash table 2. Scan the existing entries and perform one transaction per entry 3. Close scan repeat above steps in an infinite loop. Note that we do not add/modify/delete entries in step 2. We can't use linked lists since the entries need to be updated or deleted using hash keys. Because the hash seq scan is cleaned up at the end of the transaction, we encounter error in the 3rd step. I don't see that the actual hash table scan depends upon the seq_scan_table/level[] which is cleaned up at the end of the transaction. I have following questions 1. Is there a way to avoid cleaning up seq_scan_table/level() when the transaction ends? 2. Is there a way that we can use hash table implementation in PostgreSQL code for our purpose? -- Best Wishes, Ashutosh Bapat
Bumping it to attract some attention. On Tue, Mar 21, 2023 at 12:51 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > Hi, > Hash table scans (seq_scan_table/level) are cleaned up at the end of a > transaction in AtEOXact_HashTables(). If a hash seq scan continues > beyond transaction end it will meet "ERROR: no hash_seq_search scan > for hash table" in deregister_seq_scan(). That seems like a limiting > the hash table usage. > > Our use case is > 1. Add/update/remove entries in hash table > 2. Scan the existing entries and perform one transaction per entry > 3. Close scan > > repeat above steps in an infinite loop. Note that we do not > add/modify/delete entries in step 2. We can't use linked lists since > the entries need to be updated or deleted using hash keys. Because the > hash seq scan is cleaned up at the end of the transaction, we > encounter error in the 3rd step. I don't see that the actual hash > table scan depends upon the seq_scan_table/level[] which is cleaned up > at the end of the transaction. > > I have following questions > 1. Is there a way to avoid cleaning up seq_scan_table/level() when the > transaction ends? > 2. Is there a way that we can use hash table implementation in > PostgreSQL code for our purpose? > > > -- > Best Wishes, > Ashutosh Bapat -- Best Wishes, Ashutosh Bapat
Aidar Imamov wrote 2025-05-21 00:32: > Ashutosh Bapat wrote 2023-03-28 15:58: >> Bumping it to attract some attention. >> >> On Tue, Mar 21, 2023 at 12:51 PM Ashutosh Bapat >> <ashutosh.bapat.oss@gmail.com> wrote: >>> >>> Hi, >>> Hash table scans (seq_scan_table/level) are cleaned up at the end of >>> a >>> transaction in AtEOXact_HashTables(). If a hash seq scan continues >>> beyond transaction end it will meet "ERROR: no hash_seq_search scan >>> for hash table" in deregister_seq_scan(). That seems like a limiting >>> the hash table usage. >>> >>> Our use case is >>> 1. Add/update/remove entries in hash table >>> 2. Scan the existing entries and perform one transaction per entry >>> 3. Close scan >>> >>> repeat above steps in an infinite loop. Note that we do not >>> add/modify/delete entries in step 2. We can't use linked lists since >>> the entries need to be updated or deleted using hash keys. Because >>> the >>> hash seq scan is cleaned up at the end of the transaction, we >>> encounter error in the 3rd step. I don't see that the actual hash >>> table scan depends upon the seq_scan_table/level[] which is cleaned >>> up >>> at the end of the transaction. >>> >>> I have following questions >>> 1. Is there a way to avoid cleaning up seq_scan_table/level() when >>> the >>> transaction ends? >>> 2. Is there a way that we can use hash table implementation in >>> PostgreSQL code for our purpose? >>> >>> >>> -- >>> Best Wishes, >>> Ashutosh Bapat > > Hi! > I tried to resend this thread directly to myself, but for some reason > it > ended up in the whole hackers list.. > > I thought I'd chime in on this topic since it hasn't really been > discussed > anywhere else (or maybe I missed it). > I've attached two patches: the first one is a little test extension to > demonstrate the problem (just add "hash_test" to > "shared_preload_libraries"), > and the second is a possible solution. Basically, the idea is that we > don't > reset the scan counter if we find scans that started outside of the > current > transaction at the end. I have to admit, though, that I can't > immediately > say what other implications this might have or what else we need to > watch > out for if we try this. > Maybe any thoughts on that? > > regards, > Aidar Imamov Just bumping it again regards, Aidar Imamov
On Wed, May 21, 2025 at 3:02 AM Aidar Imamov <a.imamov@postgrespro.ru> wrote: > > Hi! > I tried to resend this thread directly to myself, but for some reason it > ended up in the whole hackers list.. > > I thought I'd chime in on this topic since it hasn't really been > discussed > anywhere else (or maybe I missed it). > I've attached two patches: the first one is a little test extension to > demonstrate the problem (just add "hash_test" to > "shared_preload_libraries"), > and the second is a possible solution. Basically, the idea is that we > don't > reset the scan counter if we find scans that started outside of the > current > transaction at the end. I have to admit, though, that I can't > immediately > say what other implications this might have or what else we need to > watch > out for if we try this. > Maybe any thoughts on that? I haven't reviewed the complete patch or tested it, but I don't see any issues with it. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On 5/25/25 13:36, Dilip Kumar wrote: > On Wed, May 21, 2025 at 3:02 AM Aidar Imamov <a.imamov@postgrespro.ru> wrote: >> >> Hi! >> I tried to resend this thread directly to myself, but for some reason it >> ended up in the whole hackers list.. >> >> I thought I'd chime in on this topic since it hasn't really been >> discussed >> anywhere else (or maybe I missed it). >> I've attached two patches: the first one is a little test extension to >> demonstrate the problem (just add "hash_test" to >> "shared_preload_libraries"), >> and the second is a possible solution. Basically, the idea is that we >> don't >> reset the scan counter if we find scans that started outside of the >> current >> transaction at the end. I have to admit, though, that I can't >> immediately >> say what other implications this might have or what else we need to >> watch >> out for if we try this. >> Maybe any thoughts on that? > > I haven't reviewed the complete patch or tested it, but I don't see > any issues with it. > I may be wrong, but I'd guess the restriction is there for a reason. Maybe it's not needed anymore, or maybe it's needed only in some cases, or something like that. So the most important thing for a patch removing the restriction it is to explain why it's there and why it's safe to relax it. The extension demonstrating that the restriction exists doesn't really help with that, it shows why we have it. Not hat it's safe to remove it. I'm not a dynahash expert, but isn't the first sentence from dynahash.c relevant: * dynahash.c supports both local-to-a-backend hash tables and hash * tables in shared memory. For shared hash tables, it is the caller's * responsibility to provide appropriate access interlocking. The * simplest convention is that a single LWLock protects the whole hash * table. Searches (HASH_FIND or hash_seq_search) need only shared * lock, but any update requires exclusive lock. In other words, let's say you have a hash table, protected by LWLock. That is, you're holding the lock in shared mode during seqscan. And then we drop the locks for some reason (say, transaction end). The table can be modified - won't that break the seqscan? FWIW I think with the use case from the beginning of this thread: 1. Add/update/remove entries in hash table 2. Scan the existing entries and perform one transaction per entry 3. Close scan Why not to simply build a linked list after step (1)? regards -- Tomas Vondra