Re: WIP: BRIN multi-range indexes
От | Tomas Vondra |
---|---|
Тема | Re: WIP: BRIN multi-range indexes |
Дата | |
Msg-id | 88b3831d-8afa-4135-a420-27b12bf9fae4@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: WIP: BRIN multi-range indexes (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: WIP: BRIN multi-range indexes
Re: WIP: BRIN multi-range indexes |
Список | pgsql-hackers |
Hi, Here's a rebased version of the patch series, to keep the cfbot happy. I've also restricted the false positive rate to [0.001, 0.1] instead of the original [0.0, 1.0], per discussion on this thread. I've done a bunch of experiments, comparing the "regular" bloom indexes with the one-hashing scheme proposed by John Naylor. I've been wondering if there's some measurable difference, especially in: * efficiency (query duration) * false positive rate depending on the fill factor So I ran a bunch of tests on synthetic data sets, varying parameters affecting the BRIN bloom indexes: 1) different pages_per_range 2) number of distinct values per range 3) fill factor of the bloom filter (66%, 100%, 200%) Attached is a script I used to test this, and a simple spreadsheet summarizing the results, comparing the results for each combination of parameters. For each combination it shows average query duration (over 10 runs) and scan fraction (what fraction of table was scanned). Overall, I think there's very little difference, particularly in the "match" cases when we're searching for a value that we know is in the table. The one-hash variant seems to perform a bit better, but the difference is fairly small. In the "mismatch" cases (searching for value that is not in the table) the differences are more significant, but it might be noise. It does seem much more "green" than "red", i.e. the one-hash variant seems to be faster (although this does depend on the values for formatting). To sum this up, I think the one-hash approach seems interesting. It's not going to give us huge speedups because we're only hashing int32 values anyway (not the source data), but it's worth exploring. I've started looking at the one-hash code changes, and I've discovered a couple issues. I've been wondering how expensive the naive prime sieve is - it's not extremely hot code path, as we're only running it for each page range. But still. So my plan was to create the largest bloom filter possible, and see how much time generate_primes() takes. So I initialized a cluster with 32kB blocks and tried to do this: create index on t using brin (a int4_bloom_ops(n_distinct_per_range=120000, false_positive_rate=0.1)); which ends up using nbits=575104 (which is 2x the page size, but let's ignore that) and nhashes=3. That however crashes and burns, because: a) set_bloom_partitions does this: while (primes[pidx + nhashes - 1] <= target && primes[pidx] > 0) pidx++; which is broken, because the second part of the condition only checks the current index - we may end up using nhashes primes after that, and some of them may be 0. So this needs to be: while (primes[pidx + nhashes - 1] <= target && primes[pidx + nhashes] > 0) pidx++; (We know there's always at least one 0 at the end, so it's OK not to check the length explicitly.) b) set_bloom_partitions does this to generate primes: /* * Increase the limit to ensure we have some primes higher than * the target partition length. The 100 value is arbitrary, but * should be well over what we need. */ primes = generate_primes(target_partlen + 100); It's not clear to me why 100 is sufficient, particularly for large page sizes. AFAIK the primes get more and more sparse, so how does this guarantee we'll get enough "sufficiently large" primes? c) generate_primes uses uint16 to store the primes, so it can only generate primes up to 32768. That's (probably) enough for 8kB pages, but for 32kB pages it's clearly insufficient. I've fixes these issues in a separate WIP patch, with some simple debugging logging. As for the original question how expensive this naive sieve is, I haven't been able to measure any significant timings. The logging aroung generate_primes usually looks like this: 2020-11-07 20:36:10.614 CET [232789] LOG: generating primes nbits 575104 nhashes 3 target_partlen 191701 2020-11-07 20:36:10.614 CET [232789] LOG: primes generated So it takes 0.000 second for this extreme page size. I don't think we need to invent anything more elaborate. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
- i5.ods
- 0001-Pass-all-scan-keys-to-BRIN-consistent-funct-20201107.patch
- 0002-Move-IS-NOT-NULL-handling-from-BRIN-support-20201107.patch
- 0003-Optimize-allocations-in-bringetbitmap-20201107.patch
- 0004-BRIN-bloom-indexes-20201107.patch
- 0005-use-one-hash-bloom-variant-20201107.patch
- 0006-one-hash-tweaks-20201107.patch
- 0007-BRIN-minmax-multi-indexes-20201107.patch
- 0008-Ignore-correlation-for-new-BRIN-opclasses-20201107.patch
- bloom-test.sh
В списке pgsql-hackers по дате отправления: