Re: performance question (something to do w/ parameterized
От | Jeffrey Tenny |
---|---|
Тема | Re: performance question (something to do w/ parameterized |
Дата | |
Msg-id | 445FA13E.2090304@comcast.net обсуждение исходный текст |
Ответ на | Re: performance question (something to do w/ (Mark Lewis <mark.lewis@mir3.com>) |
Список | pgsql-performance |
Mark Lewis wrote: > Doing a SELECT with a large list of variables inside an IN runs slowly > on every database we've tested. We've tested mostly in Oracle and > PostgreSQL, and both get very slow very quickly (actually Oracle refuses > to process the query at all after it gets too many bind parameters). > > In our case, we have a (potentially quite large) set of external values > that we want to look up in the database. We originally thought that > doing a single select with a large IN clause was the way to go, but then > we did some performance analysis on the optimal batch size (number of > items to include per IN clause), and discovered that for most databases, > the optimal batch size was 1. For PostgreSQL I think it was 2. So that is for parameterized queries (the batch size?). In my case, I was concerned about latency between app and database server, so I try to minimize the number of queries I send to the database server. (My app servers can be anywhere, they /should/ be close to the database server, but there are no guarantees and I can't control it). The last time I tested for optimal batch size using non-parameterized queries with same-host database and app, I got a batch size of approximately 700 IN list elements (again, not variables in that test). That was on postgres 7.X.Y. Guess I'll have to try a test where I turn the parameterized statements into regular statements. I'm pretty sure it would be a bad idea for me to send one IN list element at a time in all cases. Even if the query query prep was fast, the network latency could kill my app. > > The moral of the story is that you're probably better off running a > bunch of small selects than in trying to optimize things with one > gargantuan select. The algorithm currently tries to ensure that IN-lists of not more than 700 elements are sent to the database server, and breaks them into multiple queries. If it has to break it into at least 3 queries, it uses parameterized statements for the first 2+ and then a non-parameterized statement for the last one (which may have a different number of IN list elements than the prior batches).
В списке pgsql-performance по дате отправления: