Re: Scrub one large table against another
| От | Brendan Curran |
|---|---|
| Тема | Re: Scrub one large table against another |
| Дата | |
| Msg-id | 452E83D0.5060309@gmail.com обсуждение исходный текст |
| Ответ на | Re: Scrub one large table against another ("Jim C. Nasby" <jim@nasby.net>) |
| Ответы |
Re: Scrub one large table against another
|
| Список | pgsql-performance |
> Well, IN != EXISTS != JOIN. Exists just stops as soon as it finds a > record. For some cases, it's equivalent to IN, but not all. IN has to > de-duplicate it's list in some fashion. For small IN lists, you can do > this with an OR, but at some point you need to switch to an actual > unique (actually, I suspect the difference in PostgreSQL just depends on > if you passed values into IN or a subquery). A join on the other hand > doesn't worry about duplicates at all. There may be some brains in the > planner that realize if a subquery will return a unique set (ie: you're > querying on a primary key). > I agree, and it makes sense now that I consider it that IN would force the planner to implement some form of unique check - possibly leveraging a PK or unique index if one is already available. Maybe I'll tack up a note to the online documentation letting people know so that it's a little more explicitly clear that when you choose IN on data that isn't explicitly unique (to the planner i.e. post-analyze) you get the baggage of a forced unique whether you need it or not. Or perhaps someone that knows the internals of the planner a little better than me should put some info up regarding that? > >> Just one more thing... I have found that maintaining a btree index on a >> varchar(255) value is extremely expensive on insert/update/delete. It is >> unfortunately necessary for me to maintain this index for queries and >> reports so I am transitioning to using an unindexed staging table to >> import data into before merging it with the larger table. All the docs >> and posts recommend is to drop the index, import your data, and then >> create the index again. This is untenable on a daily / bi-weekly basis. >> Is there a more elegant solution to this indexing problem? > > You might be happier with tsearch than a regular index. Thanks, I'll look into using tsearch2 as a possibility. From what I've seen so far it would add quite a bit of complexity (necessary updates after inserts, proprietary query syntax that might require a large amount of specialization from client apps) but in the end the overhead may be less than that of maintaining the btree. Thanks and Regards, B
В списке pgsql-performance по дате отправления: