On Fri, Aug 6, 2021 at 2:00 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Experiment #1:
> As part of this experiment, I have modified the sender to keep the
> local copy of "mq_bytes_read" and "mq_bytes_written" in the local mqh
> handle so that we don't need to frequently read/write cache sensitive
> shared memory variables. So now we only read/write from the shared
> memory in the below conditions
>
> 1) If the number of available bytes is not enough to send the tuple,
> read the updated value of bytes read and also inform the reader about
> the new writes.
> 2) After every 4k bytes written, update the shared memory variable and
> inform the reader.
> 3) on detach for sending any remaining data.
...
> Results: (query EXPLAIN ANALYZE SELECT * FROM t;)
> 1) Non-parallel (default)
> Execution Time: 31627.492 ms
>
> 2) Parallel with 4 workers (force by setting parallel_tuple_cost to 0)
> Execution Time: 37498.672 ms
>
> 3) Same as above (2) but with the patch.
> Execution Time: 23649.287 ms
Here is the POC patch for the same, apart from this extreme case I am
able to see improvement with this patch for normal parallel queries as
well.
Next, I will perform some more tests with different sets of queries to
see the improvements and post the results. I will also try to
optimize the reader on the similar line.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com