Re: Initdb-time block size specification

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Initdb-time block size specification
Дата
Msg-id 5ac974d6-18d0-5dfe-b2ff-93ea8c2217f9@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Initdb-time block size specification  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Initdb-time block size specification  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers

On 9/1/23 16:57, Robert Haas wrote:
> On Thu, Aug 31, 2023 at 2:32 PM David Christensen
> <david.christensen@crunchydata.com> wrote:
>> Here's a patch atop the series which converts to 16-bit uints and
>> passes regressions, but I don't consider well-vetted at this point.
> 
> For what it's worth, my gut reaction to this patch series is similar
> to that of Andres: I think it will be a disaster. If the disaster is
> not evident to us, that's far more likely to mean that we've failed to
> test the right things than it is to mean that there is no disaster.
> 

Perhaps. The block size certainly affects a lot of places - both in
terms of the actual value, and being known (constant) at compile time.

> I don't see that there is a lot of upside, either. I don't think we
> have a lot of evidence that changing the block size is really going to
> help performance.

I don't think that's quite true. We have plenty of empirical evidence
that smaller block sizes bring significant improvements for certain
workloads. And we also have theoretical explanations for why that is.

> In fact, my guess is that there are large amounts of
> code that are heavily optimized, without the authors even realizing
> it, for 8kB blocks, because that's what we've always had. If we had
> much larger or smaller blocks, the structure of heap pages or of the
> various index AMs used for blocks might no longer be optimal, or might
> be less optimal than they are for an 8kB block size. If you use really
> large blocks, your blocks may need more internal structure than we
> have today in order to avoid CPU inefficiencies. I suspect there's
> been so little testing of non-default block sizes that I wouldn't even
> count on the code to not be outright buggy.
> 

Sure, and there are even various places where the page size implies hard
limits (e.g. index key size for btree indexes).

But so what? If that matters for your workload, keep using 8kB ...

> If we could find a safe way to get rid of full page writes, I would
> certainly agree that that was worth considering. I'm not sure that
> anything in this thread adds up to that being a reasonable way to go,
> but the savings would be massive.
> 

That's true, that'd be great. But that's clearly just a next level of
the optimization. It doesn't mean that if you can't eliminate FPW for
whatever reason it's worthless.

> I feel like the proposal here is a bit like deciding to change the
> speed limit on all American highways from 65 mph or whatever it is to
> 130 mph or 32.5 mph and see which way works out best. The whole
> infrastructure has basically been designed around the current rules.
> The rate of curvature of the roads is appropriate for the speed that
> you're currently allowed to drive on them. The vehicles are optimized
> for long-term operation at about that speed. The people who drive the
> vehicles are accustomed to driving at that speed, and the people who
> maintain them are accustomed to the problems that happen when you
> drive them at that speed. Just changing the speed limit doesn't change
> all that other stuff, and changing all that other stuff is a truly
> massive undertaking. Maybe this example somewhat overstates the
> difficulties here, but I do think the difficulties are considerable.
> The fact that we have 8kB block sizes has affected the thinking of
> hundreds of developers over decades in making thousands or tens of
> thousands or hundreds of thousands of decisions about algorithm
> selection and page format and all kinds of stuff. Even if some other
> page size seems to work better in a certain context, it's pretty hard
> to believe that it has much chance of being better overall, even
> without the added overhead of run-time configuration.
> 

Except that no one is forcing you to actually go 130mph or 32mph, right?
You make it seem like this patch forces people to use some other page
size, but that's clearly not what it's doing - it gives you the option
to use smaller or larger block, if you chose to. Just like increasing
the speed limit to 130mph doesn't mean you can't keep going 65mph.

The thing is - we *already* allow using different block size, except
that you have to do custom build. This just makes it easier.

I don't have strong opinions on how the patch actually does that, and
there certainly can be negative effects of making it dynamic. And yes,
we will have to do more testing with non-default block sizes. But
frankly, that's a gap we probably need to address anyway, considering we
allow changing the block size.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: Inefficiency in parallel pg_restore with many tables
Следующее
От: Alexander Lakhin
Дата:
Сообщение: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)