Обсуждение: Wrong example in the bloom documentation

Поиск

Список

Период

Сортировка

Wrong example in the bloom documentation

От

"Daniel Westermann (DWE)"

Дата:

08 октября 2020 г., 06:34:32

Hi,

as this does not get any attention on the docs-list, once again here.
Any thoughts on this?

Regards
Daniel





From: Daniel Westermann (DWE)
Sent: Sunday, September 27, 2020 17:58
To: Pg Docs <pgsql-docs@lists.postgresql.org>
Subject: Wrong example in the bloom documentation
 
Hi,

I've briefly discussed this with Bruce some time ago in [1].
Replaying the example referenced in the documentation does not give a Bitmap Heap Scan on tbloom but a parallel seq
scanwith the default configuration: 

-- tested on head
postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                         QUERY
PLAN                                                          

-----------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..127220.00 rows=250 width=24) (actual time=2134.851..2221.836 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbloom  (cost=0.00..126195.00 rows=104 width=24) (actual time=1770.691..1770.692 rows=0
loops=3)
         Filter: ((i2 = 898732) AND (i5 = 123451))
         Rows Removed by Filter: 3333333
 Planning Time: 0.895 ms
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 65.512 ms, Inlining 0.000 ms, Optimization 46.328 ms, Emission 40.658 ms, Total 152.499 ms
 Execution Time: 2288.056 ms
(12 rows)


As bloom was introduced in 9.6 I quickly tried with 9.6.17 and indeed for this version the example is correct:
postgres=# select version();
                                                 version                                                 
----------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit
(1 row)
postgres=# CREATE TABLE tbloom AS
postgres-#    SELECT
postgres-#      (random() * 1000000)::int as i1,
postgres-#      (random() * 1000000)::int as i2,
postgres-#      (random() * 1000000)::int as i3,
postgres-#      (random() * 1000000)::int as i4,
postgres-#      (random() * 1000000)::int as i5,
postgres-#      (random() * 1000000)::int as i6
postgres-#    FROM
postgres-#   generate_series(1,10000000);
SELECT 10000000
postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                          QUERY
PLAN                                                           

-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=178436.06..179392.83 rows=250 width=24) (actual time=2279.363..2279.363 rows=0
loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2329
   Heap Blocks: exact=2288
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178436.00 rows=250 width=0) (actual time=994.406..994.406 rows=2329
loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning time: 282.059 ms
 Execution time: 2286.138 ms
(8 rows)

The reason is that parallel execution is disabled by default in 9.6, and if that is turned on the plan changes there as
well:

postgres=# set max_parallel_workers_per_gather = 2;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                        QUERY
PLAN                                                         

---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..127194.29 rows=1 width=24) (actual time=1148.047..1148.206 rows=0 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on tbloom  (cost=0.00..126194.19 rows=1 width=24) (actual time=1039.501..1039.501 rows=0
loops=3)
         Filter: ((i2 = 898732) AND (i5 = 123451))
         Rows Removed by Filter: 3333333
 Planning time: 0.580 ms
 Execution time: 1148.247 ms
(8 rows)

Starting with PostgreSQL 10 the example in the documentation is therefore wrong. Attached a proposal to fix this. The
newexample starts with 100x reduced rows (as suggested by Bruce in [1] and adds a note that the behavior changes as
soonas parallel execution is cheaper than the index access. 

Thoughts?

Regards
Daniel

[1] https://www.postgresql.org/message-id/flat/20191105231854.GA26542%40momjian.us#7859b106ce614dd9530941196dad5bbc

Вложения

bloom-doc-fix-v01.patch

Re: Wrong example in the bloom documentation

От

Bruce Momjian

Дата:

08 октября 2020 г., 16:23:10

On Thu, Oct  8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:
> Hi,
> 
> as this does not get any attention on the docs-list, once again here.
> Any thoughts on this?

I was hoping someone more experienced with this would comment, but
seeing none, I will apply it in a day or two to all supported versions? 
Have you tested this output back to 9.5?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Wrong example in the bloom documentation

От

"Daniel Westermann (DWE)"

Дата:

08 октября 2020 г., 18:12:47

Hi Bruce,

>On Thu, Oct 8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:
>> Hi,
>>
>> as this does not get any attention on the docs-list, once again here.
>> Any thoughts on this?

>I was hoping someone more experienced with this would comment, but
>seeing none, I will apply it in a day or two to all supported versions?
>Have you tested this output back to 9.5?

I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

Regards

Daniel

Re: Wrong example in the bloom documentation

От

Bruce Momjian

Дата:

08 октября 2020 г., 18:32:00

On Thu, Oct  8, 2020 at 06:12:47PM +0000, Daniel Westermann (DWE) wrote:
> Hi Bruce,
>  
> >On Thu, Oct  8, 2020 at 06:34:32AM +0000, Daniel Westermann (DWE) wrote:
> >> Hi,
> >>
> >> as this does not get any attention on the docs-list, once again here.
> >> Any thoughts on this?
> 
> >I was hoping someone more experienced with this would comment, but
> >seeing none, I will apply it in a day or two to all supported versions?
> >Have you tested this output back to 9.5?
> 
> I hoped that as well. No, I tested down to 9.6 because the change happened in
> 10.

OK, so the patch should be applied only to PG 10 and later?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Wrong example in the bloom documentation

От

Tom Lane

Дата:

08 октября 2020 г., 19:43:32

"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
>> I was hoping someone more experienced with this would comment, but
>> seeing none, I will apply it in a day or two to all supported versions?
>> Have you tested this output back to 9.5?

> I hoped that as well. No, I tested down to 9.6 because the change happened in 10.

The patch assumes that parallel query is enabled, which is not true by
default before v10, so it should certainly not be applied before v10
(at least not without significant revisions).

            regards, tom lane

Re: Wrong example in the bloom documentation

От

Bruce Momjian

Дата:

08 октября 2020 г., 19:53:52

On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:
> "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
> >> I was hoping someone more experienced with this would comment, but
> >> seeing none, I will apply it in a day or two to all supported versions?
> >> Have you tested this output back to 9.5?
> 
> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
> 
> The patch assumes that parallel query is enabled, which is not true by
> default before v10, so it should certainly not be applied before v10
> (at least not without significant revisions).

I think we should just go for simple application cases for this.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Wrong example in the bloom documentation

От

"Daniel Westermann (DWE)"

Дата:

09 октября 2020 г., 05:44:57

Hi Bruce, Tom,

On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:
>> "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
>> >> I was hoping someone more experienced with this would comment, but
>> >> seeing none, I will apply it in a day or two to all supported versions?
>> >> Have you tested this output back to 9.5?
>>
>> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
>>
>> The patch assumes that parallel query is enabled, which is not true by
>> default before v10, so it should certainly not be applied before v10
>> (at least not without significant revisions).

Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version,
otherwisethe whole example needs to be rewritten. 

Regards
Daniel

Re: Wrong example in the bloom documentation

От

Bruce Momjian

Дата:

09 октября 2020 г., 15:08:32

On Fri, Oct  9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote:
> Hi Bruce, Tom,
> 
> On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:
> >> "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
> >> >> I was hoping someone more experienced with this would comment, but
> >> >> seeing none, I will apply it in a day or two to all supported versions?
> >> >> Have you tested this output back to 9.5?
> >> 
> >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
> >> 
> >> The patch assumes that parallel query is enabled, which is not true by
> >> default before v10, so it should certainly not be applied before v10
> >> (at least not without significant revisions).
> 
> Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version,
otherwisethe whole example needs to be rewritten.
 

Agreed.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Wrong example in the bloom documentation

От

Bruce Momjian

Дата:

15 октября 2020 г., 23:50:19

On Fri, Oct  9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:
> On Fri, Oct  9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote:
> > Hi Bruce, Tom,
> > 
> > On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:
> > >> "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
> > >> >> I was hoping someone more experienced with this would comment, but
> > >> >> seeing none, I will apply it in a day or two to all supported versions?
> > >> >> Have you tested this output back to 9.5?
> > >> 
> > >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
> > >> 
> > >> The patch assumes that parallel query is enabled, which is not true by
> > >> default before v10, so it should certainly not be applied before v10
> > >> (at least not without significant revisions).
> > 
> > Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version,
otherwisethe whole example needs to be rewritten.
 
> 
> Agreed.

This is not applying to PG 12 or earlier because the patch mentions JIT,
which was only mentioned in the PG bloom docs in PG 13+.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Wrong example in the bloom documentation

От

"Daniel Westermann (DWE)"

Дата:

17 октября 2020 г., 13:50:26

On Fri, Oct  9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:
> On Fri, Oct  9, 2020 at 05:44:57AM +0000, Daniel Westermann (DWE) wrote:
> > Hi Bruce, Tom,
> >
> > On Thu, Oct  8, 2020 at 03:43:32PM -0400, Tom Lane wrote:
> > >> "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
> > >> >> I was hoping someone more experienced with this would comment, but
> > >> >> seeing none, I will apply it in a day or two to all supported versions?
> > >> >> Have you tested this output back to 9.5?
> > >>
> > >> > I hoped that as well. No, I tested down to 9.6 because the change happened in 10.
> > >>
> > >> The patch assumes that parallel query is enabled, which is not true by
> > >> default before v10, so it should certainly not be applied before v10
> > >> (at least not without significant revisions).
> >>
> >> Yes, the behavior change was in 10. Before 10 the example is fine, I would not apply that to any prior version,
otherwisethe whole example needs to be rewritten. 
>>
>> Agreed.

>This is not applying to PG 12 or earlier because the patch mentions JIT,
>which was only mentioned in the PG bloom docs in PG 13+.

Does that mean we need separate patches for each release starting with 10?
As I am not frequently writing patches, I would need some help here.

Regards
Daniel

Re: Wrong example in the bloom documentation

От

Bruce Momjian

Дата:

26 октября 2020 г., 21:04:09

On Sat, Oct 17, 2020 at 01:50:26PM +0000, Daniel Westermann (DWE) wrote:
> On Fri, Oct  9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:
> >This is not applying to PG 12 or earlier because the patch mentions JIT,
> >which was only mentioned in the PG bloom docs in PG 13+.
> 
> Does that mean we need separate patches for each release starting with 10? 
> As I am not frequently writing patches, I would need some help here.

I can regenerate the output for older versions using your patch.
However, I am confused about the parallelism you are seeing.  Your patch
shows:

       Without the two indexes being created, a parallel sequential scan will happen for the query below:
                                                -------------------
    <programlisting>
    =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                QUERY PLAN
    ---------------------------------------------------------------------------------------------------
     Seq Scan on tbloom  (cost=0.00..214.00 rows=1 width=24) (actual time=2.729..2.731 rows=0 loops=1)
       Filter: ((i2 = 898732) AND (i5 = 123451))
       Rows Removed by Filter: 10000
     Planning Time: 0.257 ms
     Execution Time: 2.764 ms
    (5 rows)

However, I don't see any parallelism in this output.  Also, I don't see
any parallelism once the indexes are created.  What PG version is this?
and what config settings did you use?  Thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Wrong example in the bloom documentation

От

Bruce Momjian

Дата:

26 октября 2020 г., 23:18:52

On Mon, Oct 26, 2020 at 05:04:09PM -0400, Bruce Momjian wrote:
> On Sat, Oct 17, 2020 at 01:50:26PM +0000, Daniel Westermann (DWE) wrote:
> > On Fri, Oct  9, 2020 at 11:08:32AM -0400, Bruce Momjian wrote:
> > >This is not applying to PG 12 or earlier because the patch mentions JIT,
> > >which was only mentioned in the PG bloom docs in PG 13+.
> > 
> > Does that mean we need separate patches for each release starting with 10? 
> > As I am not frequently writing patches, I would need some help here.
> 
> I can regenerate the output for older versions using your patch.
> However, I am confused about the parallelism you are seeing.  Your patch
> shows:
> 
>        Without the two indexes being created, a parallel sequential scan will happen for the query below:
>                                                 -------------------
>     <programlisting>
>     =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
>                                                 QUERY PLAN
>     ---------------------------------------------------------------------------------------------------
>      Seq Scan on tbloom  (cost=0.00..214.00 rows=1 width=24) (actual time=2.729..2.731 rows=0 loops=1)
>        Filter: ((i2 = 898732) AND (i5 = 123451))
>        Rows Removed by Filter: 10000
>      Planning Time: 0.257 ms
>      Execution Time: 2.764 ms
>     (5 rows)
> 
> However, I don't see any parallelism in this output.  Also, I don't see
> any parallelism once the indexes are created.  What PG version is this?
> and what config settings did you use?  Thanks.

I figured it out --- you have to use the larger generate_series value to
get the parallel output.  I have adjusted all the docs back to 9.6 to
show accurate output for that version, and simplified the query
ordering --- patch to master attached.  The other releases are similar. 
Daniel, please let me know if I have left out any details.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Вложения

master.diff

Re: Wrong example in the bloom documentation

От

"Daniel Westermann (DWE)"

Дата:

28 октября 2020 г., 09:14:50

>I figured it out --- you have to use the larger generate_series value to
>get the parallel output.  I have adjusted all the docs back to 9.6 to
>show accurate output for that version, and simplified the query
>ordering --- patch to master attached.  The other releases are similar.
>Daniel, please let me know if I have left out any details.

Thanks for your support on this. Looks good to me.

Regards
Daniel

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Wrong example in the bloom documentation

Вложения

Вложения