Re: Add LogicalTapeSetExtend() to logtape.c

Поиск
Список
Период
Сортировка
От Adam Lee
Тема Re: Add LogicalTapeSetExtend() to logtape.c
Дата
Msg-id 20200228061641.GA883@earth.local
обсуждение исходный текст
Ответ на Add LogicalTapeSetExtend() to logtape.c  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Add LogicalTapeSetExtend() to logtape.c  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Thu, Feb 27, 2020 at 01:01:08PM -0800, Jeff Davis wrote:
> Attached is a patch that exports a new function from logtape.c:
> 
>    extern LogicalTapeSet *LogicalTapeSetExtend(
>             LogicalTapeSet *lts, int nAdditional);
> 
> The purpose is to allow adding new tapes dynamically for the upcoming
> Hash Aggregation work[0]. HashAgg doesn't know in advance how many
> tapes it will need, though only a limited number are actually active at
> a time.
> 
> This was proposed and originally written by Adam Lee[1] (extract only
> the changes to logtape.c/h from his posted patch). Strangely, I'm
> seeing ~5% regression with his version when doing:
> 
>     -- t20m_1_int4 has 20 million random integers
>     select * from t20m_1_int4 order by i offset 100000000;
> 
> Which seems to be due to using a pointer rather than a flexible array
> member (I'm using gcc[2]). My version (attached) still uses a flexible
> array member, which doesn't see the regression; but I repalloc the
> whole struct so the caller needs to save the new pointer to the tape
> set.
> 
> That doesn't entirely make sense to me, and I'm wondering if someone
> else can repro that result and/or make a suggestion, because I don't
> have a good explanation. I'm fine with my version of the patch, but it
> would be nice to know why there's such a big difference using a pointer
> versus a flexible array member.
> 
> Regards,
>     Jeff Davis

I noticed another difference, I was using palloc0(), which could be one of the
reason, but not sure.

Tested your hashagg-20200226.patch on my laptop(Apple clang version 11.0.0),
the average time is 25.9s:

```
create table t20m_1_int4(i int);
copy t20m_1_int4 from program 'shuf -i 1-4294967295 -n 20000000';
analyze t20m_1_int4;
```
```
explain analyze select * from t20m_1_int4 order by i offset 100000000;
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3310741.20..3310741.20 rows=1 width=4) (actual time=25666.471..25666.471 rows=0 loops=1)
   ->  Sort  (cost=3260740.96..3310741.20 rows=20000096 width=4) (actual time=20663.065..24715.269 rows=20000000
loops=1)
         Sort Key: i
         Sort Method: external merge  Disk: 274056kB
         ->  Seq Scan on t20m_1_int4  (cost=0.00..288496.96 rows=20000096 width=4) (actual time=0.075..2749.385
rows=20000000loops=1)
 
 Planning Time: 0.109 ms
 Execution Time: 25911.542 ms
(7 rows)
```

But if use the palloc0() or do the MemSet() like:

```
lts = (LogicalTapeSet *) palloc0(offsetof(LogicalTapeSet, tapes) +
                                ntapes * sizeof(LogicalTape));
...
MemSet(lts->tapes, 0, lts->nTapes * sizeof(LogicalTape)); <--- this line doesn't matter as I observed,
                                                               which makes a little sense the compiler
                                                               might know it's already zero.
```

The average time goes up to 26.6s:

```
explain analyze select * from t20m_1_int4 order by i offset 100000000;
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3310741.20..3310741.20 rows=1 width=4) (actual time=26419.712..26419.712 rows=0 loops=1)
   ->  Sort  (cost=3260740.96..3310741.20 rows=20000096 width=4) (actual time=21430.044..25468.661 rows=20000000
loops=1)
         Sort Key: i
         Sort Method: external merge  Disk: 274056kB
         ->  Seq Scan on t20m_1_int4  (cost=0.00..288496.96 rows=20000096 width=4) (actual time=0.060..2839.452
rows=20000000loops=1)
 
 Planning Time: 0.111 ms
 Execution Time: 26652.255 ms
(7 rows)
```

-- 
Adam Lee



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: pgbench: option delaying queries till connectionsestablishment?
Следующее
От: Richard Guo
Дата:
Сообщение: Trying to pull up EXPR SubLinks