Re: [HACKERS] A design for amcheck heapam verification
От | Peter Geoghegan |
---|---|
Тема | Re: [HACKERS] A design for amcheck heapam verification |
Дата | |
Msg-id | CAH2-Wznm5ZOjS0_DJoWrcm9Us19gzbkm0aTKt5hHprvjHFVHpQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] A design for amcheck heapam verification (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: [HACKERS] A design for amcheck heapam verification
|
Список | pgsql-hackers |
On Tue, Nov 28, 2017 at 9:54 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Nov 28, 2017 at 9:50 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >>> Would that address your concern? There would be an SQL interface, but >>> it would be trivial. >> >> That's exactly what I think you should do, and mentioned so upthread. >> A SQL interface can also show a good example of how developers can use >> this API. Attach revision, v5, adds a new test harness -- test_bloomfilter. This can be used to experimentally verify that the meets the well known "1% false positive rate with 9.6 bits per element" standard. It manages to do exactly that: postgres=# set client_min_messages = 'debug1'; SET postgres=# SELECT test_bloomfilter(power => 23, nelements => 873813, seed => -1, tests => 3); DEBUG: beginning test #1... DEBUG: bloom_work_mem (KB): 1024 DEBUG: false positives: 8630 (rate: 0.009876, proportion bits set: 0.517625, seed: 1373191603) DEBUG: beginning test #2... DEBUG: bloom_work_mem (KB): 1024 DEBUG: false positives: 8623 (rate: 0.009868, proportion bits set: 0.517623, seed: 406665822) DEBUG: beginning test #3... DEBUG: bloom_work_mem (KB): 1024 WARNING: false positives: 8840 (rate: 0.010117, proportion bits set: 0.517748, seed: 398116374) test_bloomfilter ------------------ (1 row) Here, we repeat the same test 3 times, varying only the seed value used for each run. The last message is a WARNING because we exceed the 1% threshold (hard-coded into test_bloomfilter.c), though only by a tiny margin, due only to random variations in seed value. We round up to 10 bits per element for the regression tests. That's where the *actual* "nelements" argument comes from within the tests, so pg_regress tests should never see the WARNING (if they do, that counts as a failure). I've experimentally observed that we get the 1% false positive rate with any possible bitset when "nelements" works out at 9.6 bitset bits per element. Inter-run variation is tiny. With 50 tests, I didn't observe these same Bloom filter parameters produce a false positive rate that came near to 1.1%. 1.01% or 1.02% was about as bad as it got. There is a fairly extensive README, which I hope will clear up the theory behind the bloomfilter.c strategy on bitset size and false positives. Also, there was a regression that I had to fix in bloomfilter.c, in seeding. It didn't reliably cause variation in the false positives. And, there was bitrot with the documentation that I fixed up. -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: