Optimizer fed bad data about some system-table indexes
От | Tom Lane |
---|---|
Тема | Optimizer fed bad data about some system-table indexes |
Дата | |
Msg-id | 28894.925487663@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: [HACKERS] Optimizer fed bad data about some system-table indexes
|
Список | pgsql-hackers |
Last night I was looking into optimizer misbehavior on the sample query explain select * from pg_class, pg_description where pg_class.oid = pg_description.objoid; As of yesterday the system was generating Hash Join (cost=86.59 rows=1007 width=101) -> Seq Scan on pg_description (cost=41.23 rows=1007 width=16) -> Hash (cost=0.00rows=0 width=0) -> Index Scan using pg_class_oid_index on pg_class (cost=5.57 rows=138 width=85) which was pretty stupid; why use an index scan to load the hashtable? The reason was that the optimizer was actually estimating the index scan to be cheaper than a sequential scan (cost of sequential scan was figured at 6.55). When I poked into this, I found that costsize.c was being fed a size of zero for pg_class_oid_index, and was generating a bogus cost for the index scan because of it. I changed costsize.c to ensure that cost_index with a selectivity of 1 will always return a larger value than cost_seqscan does with the same relation-size stats, regardless of what it's told about the index size. This fixes the immediate problem, but it's still bad that costsize is getting a bogus index size value; the cost estimates won't be very accurate. And considering that there are reasonable stats for pg_class_oid_index in pg_class, you'd sort of expect those numbers to get passed to the optimizer. As near as I can tell, the bogus data is the fault of the relation cache. Info about pg_class_oid_index and a couple of other indexes on system relations is preloaded into the relcache and locked there on startup --- and it is *not* coming from pg_class, but from an initialization file that evidently was made when these system tables were empty. Bottom line is that optimization estimates that involve these critical system indexes will be wrong. That's not a show-stopper, but it seems to me that it must be costing us performance somewhere along the line. I'd like to see if it can be fixed. Does anyone understand: (a) why does the relcache need an initialization file for the system index cache entries in the first place? If I'm reading the code correctly, it is able to build the initialization file from the info in pg_class, so one would think it'd be better to just do that during every startup and forget the initialization file. (b) if we can't just get rid of the init file, how about dropping and rebuilding it at the end of the initdb process (after template1 has been vacuumed)? Then at least it'd show a size of a few hundred for pg_class, instead of zero. regards, tom lane
В списке pgsql-hackers по дате отправления: