Re: Large # of Tables, Getting ready for the enterprise

Поиск
Список
Период
Сортировка
От Ross J. Reedstrom
Тема Re: Large # of Tables, Getting ready for the enterprise
Дата
Msg-id 20000817103547.C17023@rice.edu
обсуждение исходный текст
Ответ на Large # of Tables, Getting ready for the enterprise  ("carl garland" <carlhgarland@hotmail.com>)
Список pgsql-hackers
On Thu, Aug 17, 2000 at 06:17:01AM -0400, carl garland wrote:
> As postgres becomes better and more in the spotlight there are a couple of 
> issues I think that the hacker group might want to address to better prepare 
> it for the enterprise and highend production systems.  Currently postgres 
> will support an incredible amount of tables whereas Interbase only supports 
> 64K, but the efficiency and performance of the pg backend quickly 
> degenerates after 1000 tables.  I know that most people will think that 
> filesystem will be the bottleneck but as XFS nears completion the problem 

Realize that pg is a cross platform program. The existance (or not) of a 
particular filesystem cannot be assumed, nor does the release of a new FS
on linux, for example, impact the entire postgresql community.

> will shift back to pg.  It is my understanding that the system tables where 
> lookups on tables occur are always done sequentially and not using any more 
> optimized (btree etc) solution.  I also think this may be applicable to the 

Nope. The pg_class table has indicices on oid and relname, which are used when
appropriate. In addition, there is a cache for system tables (syscache).

> toastable objects where large # of objects occur.  I want to start to look 
> at the code to maybe help out but have a few questions:
> 1)  When referencing a table is it only looked up once and then cached or 
> does a scan of the system table occur only once per session.

This would be the syscache, for the system tables, or the relcache, which
caches relation (i.e. table) descriptors for the general case. There's
been some discussion and work regarding cleaning up the seperation between
accessing relations and storing them in particular files. At one point,
Tom Lane mentioned that he was wondering if the knowledge of the relcache
needs to be move out of the bufmgr/smgr interfaces (that's the buffer
manager and storage manager, respectively) as part of that cleanup.

> 2)  Which files should I look at in tree.

Hmm, an awful lot of them. 

find . -name \*.[chyl] |xargs grep -l syscache | wc -l   98
find . -name \*.[chyl] |xargs grep -l relcache | wc -l   43 

And that's not nearly exhaustive.

> 3)  Any tips, suggestions, pitfalls I should remember.

Take a look at:

http://postgresql.org/docs/faq-dev-english.html

and 

file:/where/ever/you/put/pgsql/src/tools/backend/index.html

Very useful for getting the broad overview and links into your
local filetree, with descriptions.

This is a very complex bit of code, right at the heart of the
database. I've found it a might steep learning curve. Perhaps not the
best place to start for a first backend coding project. But dig in: for
me, there's no better way to learn than breaking code, and fixing it. The
core developers are good at not letting stupid ideas get commited to CVS.

One thing not mentioned above: memory management is quite complex, as well,
and underlies all the other code (of course). Take a look at the mmgr code.

In fact, if you've got any skill at code analysis and documentation,
further diagramming and describing of the interrelationships between
the different caches and managers (syscache, relcache, bufcache, bufmgr,
smgr, mmgr etc.) would be a welcome addition to the work Bruce has done
with the developers FAQ, above.

Ross
-- 
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> 
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jerome Raupach
Дата:
Сообщение: Optimization in C
Следующее
От: Stephan Szabo
Дата:
Сообщение: Re: Optimization in C