Re: Raw device on PostgreSQL

Поиск
Список
Период
Сортировка
От Jose Luis Tallon
Тема Re: Raw device on PostgreSQL
Дата
Msg-id 435d05a4-acd6-856c-3050-4dae70b85d00@adv-solutions.net
обсуждение исходный текст
Ответ на Re: Raw device on PostgreSQL  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
On 30/4/20 6:22, Thomas Munro wrote:
> On Thu, Apr 30, 2020 at 12:26 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> Yeah, I think the question is what are the expected benefits of using
>> raw devices. It might be an interesting exercise / experiment, but my
>> understanding is that most of the benefits can be achieved by using file
>> systems but with direct I/O and async I/O, which would allow us to
>> continue reusing the existing filesystem code with much less disruption
>> to our code base.
> Agreed.
>
> [snip] That's probably the main work
> required to make this work, and might be a valuable thing to have
> independently of whether you stick it on a raw device, a big data
> file, NV RAM
    ^^^^^^  THIS, with NV DIMMs / PMEM (persistent memory) possibly 
becoming a hot topic in the not-too-distant future
> or some other kind of storage system -- but it's a really
> difficult project.

Indeed.... But you might have already pointed out the *only* required 
feature for this to work: a "database" of relfilenode ---which is 
actually an int, or rather, a tuple (relfilenode,segment) where both 
components are 32-bit currently: that is, a 64bit "objectID" of sorts--- 
to "set of extents" ---yes, extents, not blocks: sequential I/O is still 
faster in all known storage/persistent (vs RAM) systems---- where the 
current I/O primitives would be able to write.

Some conversion from "absolute" (within the "file") to "relative" 
(within the "tablespace") offsets would need to happen before delegating 
to the kernel... or even dereferencing a pointer to an mmap'd region !, 
but not much more, ISTM (but I'm far from an expert in this area).

Out of the top of my head:

CREATE TABLESPACE tblspcname [other_options] LOCATION '/dev/nvme1n2' 
WITH (kind=raw, extent_min=4MB);

   or something similar to that approac might do it.

     Please note that I have purposefully specified "namespace 2" in an 
"enterprise" NVME device, to show the possibility.

OR

   use some filesystem (e.g. XFS) with DAX[1] (mount -o dax ) where 
available along something equivalent to  WITH(kind=mmaped)


... though the locking we currently get "for free" from the kernel would 
need to be replaced by something else.


Indeed it seems like an enormous amount of work.... but it may well pay 
off. I can't fully assess the effort, though


Just my .02€

[1] https://www.kernel.org/doc/Documentation/filesystems/dax.txt


Thanks,

     / J.L.





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Atsushi Torikoshi
Дата:
Сообщение: pg_stat_reset_slru(name) doesn't seem to work as documented
Следующее
От: Victor Wagner
Дата:
Сообщение: Postgresql Windows build and modern perl (>=5.28)