Re: Contributing some code

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Contributing some code
Дата
Msg-id CAMsr+YHw1738jaAk+0KWVF-NX1h=40ZA+xsSdKb2Druy=voOsA@mail.gmail.com
обсуждение исходный текст
Ответ на Contributing some code  (Antonio Belloni <antonio.belloni@gmail.com>)
Ответы Re: Contributing some code  (Antonio Belloni <antonio.belloni@gmail.com>)
Список pgsql-hackers
On 28 December 2017 at 01:40, Antonio Belloni <antonio.belloni@gmail.com> wrote:
Hi,

This is my first post on the list. My name is Antonio. I am a CS grad student and my field of study is about databases and information retrieval. To get some practical knowledge, I've been studying Postgresql codebase for a while.

Now I would like to contribute with some code and I've chosen the following topic of the TODO list : 

Allow reporting of which objects are in which tablespaces

This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace.

The topic suggests to use the pg_tablespace_databases to discover which database is using a specific tablespace and then connect to each database and find the objects in the tablespaces.

I checked the code of pg_tablespace_databases, defined in src/backend/utils/adt/misc.c, and see that it uses a much simpler approach : It just reads the tablespaces directories and return the name of the directories that represents databases OIDs. 

Although the function works as expected, I  can see some issues not addressed in the code :

- It does not check for permissions. Any user can execute it;
- It does not check if the platform supports symlinks, which can cause an error because the function is trying to follow the links defined in base/pg_tblspc.

I could use the same approach and write a function that goes down one more level in the directory structure and find the objects' OIDs inside each database directory, but I don't know if this is the better way to do that.



There's a bit of a trap hidden here. The names of relation extents look like oids, possibly with an extent number for relations bigger than 1GB. But they aren't.  They're relfilenode numbers.

PostgreSQL maps relation oids to relfilenodes. By default on a new system, relations will often have the same relfilenode as oid. That's a pity IMO; it'd be way less confusing if we allocated relfilenodes from a wholly different counter, because as it is, it gives people the false impression they can expect the filename relfilenode to be the relation oid.

In fact, what happens (per my probably imperfect understanding) is that PostgreSQL checks pg_class (via the relcache) for the oid of the table. It then uses RelationIsMapped to see if it's a normal relation with the filenode number in pg_class or not. If it's a normal (non-mapped) relation, it uses the Relation's rd_node to find the relation's physical address tablespace, dboid, and relfilenode. If it's a mapped relation, it instead consults the relmapper to find the relation's storage; see src/backend/utils/cache/relmapper.c . 

See also src/backend/storage/smgr/README, 

This means you can't determine relation oids from ondisk state without scanning pg_class. And pg_class is per-database, not a shared relation, so you must look at each db in turn, since Pg doesn't support cross-DB queries. Logical decoding handles this with the RelidByRelfilenode function, but there are issues there around making sure you have the right snapshot etc.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: RE: [HACKERS] Transactions involving multiple postgres foreignservers
Следующее
От: David Rowley
Дата:
Сообщение: Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath