Обсуждение: Wrong example in the bloom documentation
Hi, as this does not get any attention on the docs-list, once again here. Any thoughts on this? Regards Daniel From: Daniel Westermann (DWE) Sent: Sunday, September 27, 2020 17:58 To: Pg Docs <pgsql-docs@lists.postgresql.org> Subject: Wrong example in the bloom documentation Hi, I've briefly discussed this with Bruce some time ago in [1]. Replaying the example referenced in the documentation does not give a Bitmap Heap Scan on tbloom but a parallel seq scanwith the default configuration: -- tested on head postgres=# CREATE TABLE tbloom AS postgres-# SELECT postgres-# (random() * 1000000)::int as i1, postgres-# (random() * 1000000)::int as i2, postgres-# (random() * 1000000)::int as i3, postgres-# (random() * 1000000)::int as i4, postgres-# (random() * 1000000)::int as i5, postgres-# (random() * 1000000)::int as i6 postgres-# FROM postgres-# generate_series(1,10000000); SELECT 10000000 postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6); CREATE INDEX postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6); CREATE INDEX postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..127220.00 rows=250 width=24) (actual time=2134.851..2221.836 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=104 width=24) (actual time=1770.691..1770.692 rows=0 loops=3) Filter: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Filter: 3333333 Planning Time: 0.895 ms JIT: Functions: 6 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 65.512 ms, Inlining 0.000 ms, Optimization 46.328 ms, Emission 40.658 ms, Total 152.499 ms Execution Time: 2288.056 ms (12 rows) As bloom was introduced in 9.6 I quickly tried with 9.6.17 and indeed for this version the example is correct: postgres=# select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit (1 row) postgres=# CREATE TABLE tbloom AS postgres-# SELECT postgres-# (random() * 1000000)::int as i1, postgres-# (random() * 1000000)::int as i2, postgres-# (random() * 1000000)::int as i3, postgres-# (random() * 1000000)::int as i4, postgres-# (random() * 1000000)::int as i5, postgres-# (random() * 1000000)::int as i6 postgres-# FROM postgres-# generate_series(1,10000000); SELECT 10000000 postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6); CREATE INDEX postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6); CREATE INDEX postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on tbloom (cost=178436.06..179392.83 rows=250 width=24) (actual time=2279.363..2279.363 rows=0 loops=1) Recheck Cond: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Index Recheck: 2329 Heap Blocks: exact=2288 -> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=250 width=0) (actual time=994.406..994.406 rows=2329 loops=1) Index Cond: ((i2 = 898732) AND (i5 = 123451)) Planning time: 282.059 ms Execution time: 2286.138 ms (8 rows) The reason is that parallel execution is disabled by default in 9.6, and if that is turned on the plan changes there as well: postgres=# set max_parallel_workers_per_gather = 2; SET postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..127194.29 rows=1 width=24) (actual time=1148.047..1148.206 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on tbloom (cost=0.00..126194.19 rows=1 width=24) (actual time=1039.501..1039.501 rows=0 loops=3) Filter: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Filter: 3333333 Planning time: 0.580 ms Execution time: 1148.247 ms (8 rows) Starting with PostgreSQL 10 the example in the documentation is therefore wrong. Attached a proposal to fix this. The newexample starts with 100x reduced rows (as suggested by Bruce in [1] and adds a note that the behavior changes as soonas parallel execution is cheaper than the index access. Thoughts? Regards Daniel [1] https://www.postgresql.org/message-id/flat/20191105231854.GA26542%40momjian.us#7859b106ce614dd9530941196dad5bbc
Вложения
On Thu, Oct 8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote: > Hi, > > as this does not get any attention on the docs-list, once again here. > Any thoughts on this? I was hoping someone more experienced with this would comment, but seeing none, I will apply it in a day or two to all supported versions? Have you tested this output back to 9.5? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Hi Bruce,
>On Thu, Oct 8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:
>> Hi,
>>
>> as this does not get any attention on the docs-list, once again here.
>> Any thoughts on this?
>I was hoping someone more experienced with this would comment, but
>seeing none, I will apply it in a day or two to all supported versions?
>Have you tested this output back to 9.5?
>> Hi,
>>
>> as this does not get any attention on the docs-list, once again here.
>> Any thoughts on this?
>I was hoping someone more experienced with this would comment, but
>seeing none, I will apply it in a day or two to all supported versions?
>Have you tested this output back to 9.5?
I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
Regards
Daniel
On Thu, Oct 8, 2020 at 06:12:47PM +0000, Daniel Westermann (DWE) wrote: > Hi Bruce, > > >On Thu, Oct 8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote: > >> Hi, > >> > >> as this does not get any attention on the docs-list, once again here. > >> Any thoughts on this? > > >I was hoping someone more experienced with this would comment, but > >seeing none, I will apply it in a day or two to all supported versions? > >Have you tested this output back to 9.5? > > I hoped that as well. No, I tested down to 9.6 because the change happened in > 10. OK, so the patch should be applied only to PG 10 and later? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes: >> I was hoping someone more experienced with this would comment, but >> seeing none, I will apply it in a day or two to all supported versions? >> Have you tested this output back to 9.5? > I hoped that as well. No, I tested down to 9.6 because the change happened in 10. The patch assumes that parallel query is enabled, which is not true by default before v10, so it should certainly not be applied before v10 (at least not without significant revisions). regards, tom lane
On Thu, Oct 8, 2020 at 03:43:32PM -0400, Tom Lane wrote: > "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes: > >> I was hoping someone more experienced with this would comment, but > >> seeing none, I will apply it in a day or two to all supported versions? > >> Have you tested this output back to 9.5? > > > I hoped that as well. No, I tested down to 9.6 because the change happened in 10. > > The patch assumes that parallel query is enabled, which is not true by > default before v10, so it should certainly not be applied before v10 > (at least not without significant revisions). I think we should just go for simple application cases for this. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Hi Bruce, Tom, On Thu, Oct 8, 2020 at 03:43:32PM -0400, Tom Lane wrote: >> "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes: >> >> I was hoping someone more experienced with this would comment, but >> >> seeing none, I will apply it in a day or two to all supported versions? >> >> Have you tested this output back to 9.5? >> >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10. >> >> The patch assumes that parallel query is enabled, which is not true by >> default before v10, so it should certainly not be applied before v10 >> (at least not without significant revisions). Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwisethe whole example needs to be rewritten. Regards Daniel
On Fri, Oct 9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote: > Hi Bruce, Tom, > > On Thu, Oct 8, 2020 at 03:43:32PM -0400, Tom Lane wrote: > >> "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes: > >> >> I was hoping someone more experienced with this would comment, but > >> >> seeing none, I will apply it in a day or two to all supported versions? > >> >> Have you tested this output back to 9.5? > >> > >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10. > >> > >> The patch assumes that parallel query is enabled, which is not true by > >> default before v10, so it should certainly not be applied before v10 > >> (at least not without significant revisions). > > Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwisethe whole example needs to be rewritten. Agreed. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Fri, Oct 9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote: > On Fri, Oct 9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote: > > Hi Bruce, Tom, > > > > On Thu, Oct 8, 2020 at 03:43:32PM -0400, Tom Lane wrote: > > >> "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes: > > >> >> I was hoping someone more experienced with this would comment, but > > >> >> seeing none, I will apply it in a day or two to all supported versions? > > >> >> Have you tested this output back to 9.5? > > >> > > >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10. > > >> > > >> The patch assumes that parallel query is enabled, which is not true by > > >> default before v10, so it should certainly not be applied before v10 > > >> (at least not without significant revisions). > > > > Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwisethe whole example needs to be rewritten. > > Agreed. This is not applying to PG 12 or earlier because the patch mentions JIT, which was only mentioned in the PG bloom docs in PG 13+. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Fri, Oct 9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote: > On Fri, Oct 9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote: > > Hi Bruce, Tom, > > > > On Thu, Oct 8, 2020 at 03:43:32PM -0400, Tom Lane wrote: > > >> "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes: > > >> >> I was hoping someone more experienced with this would comment, but > > >> >> seeing none, I will apply it in a day or two to all supported versions? > > >> >> Have you tested this output back to 9.5? > > >> > > >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10. > > >> > > >> The patch assumes that parallel query is enabled, which is not true by > > >> default before v10, so it should certainly not be applied before v10 > > >> (at least not without significant revisions). > >> > >> Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version, otherwisethe whole example needs to be rewritten. >> >> Agreed. >This is not applying to PG 12 or earlier because the patch mentions JIT, >which was only mentioned in the PG bloom docs in PG 13+. Does that mean we need separate patches for each release starting with 10? As I am not frequently writing patches, I would need some help here. Regards Daniel
On Sat, Oct 17, 2020 at 01:50:26PM +0000, Daniel Westermann (DWE) wrote: > On Fri, Oct 9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote: > >This is not applying to PG 12 or earlier because the patch mentions JIT, > >which was only mentioned in the PG bloom docs in PG 13+. > > Does that mean we need separate patches for each release starting with 10? > As I am not frequently writing patches, I would need some help here. I can regenerate the output for older versions using your patch. However, I am confused about the parallelism you are seeing. Your patch shows: Without the two indexes being created, a parallel sequential scan will happen for the query below: ------------------- <programlisting> =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on tbloom (cost=0.00..214.00 rows=1 width=24) (actual time=2.729..2.731 rows=0 loops=1) Filter: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Filter: 10000 Planning Time: 0.257 ms Execution Time: 2.764 ms (5 rows) However, I don't see any parallelism in this output. Also, I don't see any parallelism once the indexes are created. What PG version is this? and what config settings did you use? Thanks. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Mon, Oct 26, 2020 at 05:04:09PM -0400, Bruce Momjian wrote: > On Sat, Oct 17, 2020 at 01:50:26PM +0000, Daniel Westermann (DWE) wrote: > > On Fri, Oct 9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote: > > >This is not applying to PG 12 or earlier because the patch mentions JIT, > > >which was only mentioned in the PG bloom docs in PG 13+. > > > > Does that mean we need separate patches for each release starting with 10? > > As I am not frequently writing patches, I would need some help here. > > I can regenerate the output for older versions using your patch. > However, I am confused about the parallelism you are seeing. Your patch > shows: > > Without the two indexes being created, a parallel sequential scan will happen for the query below: > ------------------- > <programlisting> > =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; > QUERY PLAN > --------------------------------------------------------------------------------------------------- > Seq Scan on tbloom (cost=0.00..214.00 rows=1 width=24) (actual time=2.729..2.731 rows=0 loops=1) > Filter: ((i2 = 898732) AND (i5 = 123451)) > Rows Removed by Filter: 10000 > Planning Time: 0.257 ms > Execution Time: 2.764 ms > (5 rows) > > However, I don't see any parallelism in this output. Also, I don't see > any parallelism once the indexes are created. What PG version is this? > and what config settings did you use? Thanks. I figured it out --- you have to use the larger generate_series value to get the parallel output. I have adjusted all the docs back to 9.6 to show accurate output for that version, and simplified the query ordering --- patch to master attached. The other releases are similar. Daniel, please let me know if I have left out any details. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Вложения
>I figured it out --- you have to use the larger generate_series value to >get the parallel output. I have adjusted all the docs back to 9.6 to >show accurate output for that version, and simplified the query >ordering --- patch to master attached. The other releases are similar. >Daniel, please let me know if I have left out any details. Thanks for your support on this. Looks good to me. Regards Daniel