Re: SP-GiST confusing introductory paragraph

Поиск

Список

Период

Сортировка

От	Alvaro Herrera
Тема	Re: SP-GiST confusing introductory paragraph
Дата	16 октября 2023 г. 15:09:37
Msg-id	202310161509.jeennsgfk5st@alvherre.pgsql обсуждение исходный текст
Ответ на	SP-GiST confusing introductory paragraph (PG Doc comments form <noreply@postgresql.org>)
Ответы	Re: SP-GiST confusing introductory paragraph
Список	pgsql-docs

Дерево обсуждения

On 2023-Oct-16, PG Doc comments form wrote:

> I'm confused by this paragraph:
> 
> > These popular data structures were originally developed for in-memory
> > usage. In main memory, they are usually designed as a set of dynamically
> > allocated nodes linked by pointers. This is not suitable for direct storing
> > on disk, since these chains of pointers can be rather long which would
> > require too many disk accesses. In contrast, disk-based data structures
> > should have a high fanout to minimize I/O. The challenge addressed by
> > SP-GiST is to map search tree nodes to disk pages in such a way that a
> > search need access only a few disk pages, even if it traverses many nodes.
> 
> In particular, "These popular datastructures" is ambiguous -- based on how
> the previous paragraph ends, it sounds like the "popular datastructures" or
> SP-GiSTs, but then it goes on to say they were designed for in-memory, and
> then it mentions that SP-GiST's space partitioning (with high fanout) is
> more appropriate to minimize disk access. I think maybe the solution here
> would be to replace "These popular data structures" with something like
> "Classic Postgres indexes such as B-tree indexes..." or something like
> that.

Yeah, this paragraph is a rewording of Oleg's[1]

   SP-GiST is an abbreviation of space-partitioned GiST - the search
   tree, which allows to implement a wide range of different
   non-balanced disk-based data structures, such as quadtree, kd-tree,
   tries - popular data structures, originally developed for memory
   storage. Main memory access structures usually designed as a set of
   dynamically allocated nodes linked by pointers, which is not suitable
   for direct storing on disk, since these chains of pointers can be
   rather long and require too many disk accesses. In opposite, disk
   based data structures have a high fanout to minimize I/O. The
   challenge is to map nodes of tree to disk pages in such a way, so
   search algorithm accesses only a few disk pages, even if it traverse
   many nodes.

where the point is (IMO) much clearer.

[1] http://www.sai.msu.su/~megera/wiki/spgist_dev

> Also, I think a short parenthetical definition of "fanout" would be useful
> here, something like "high fanout (i.e. where each node has a potentially
> large number of child nodes that reside in the same disk page)". Took me
> some googling to realize what fanout meant in this context. 

Hmm.  We also use the term (hypenated as fan-out) in the reference to
recursive_worktable_factor.  Maybe we should list it in the glossary in
a way that works for both.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

В списке pgsql-docs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: SP-GiST confusing introductory paragraph