Re: distinct estimate of a hard-coded VALUES list
От | Alvaro Herrera |
---|---|
Тема | Re: distinct estimate of a hard-coded VALUES list |
Дата | |
Msg-id | 20160822174214.GA133273@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: distinct estimate of a hard-coded VALUES list (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: distinct estimate of a hard-coded VALUES list
|
Список | pgsql-hackers |
Robert Haas wrote: > On Sat, Aug 20, 2016 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Jeff Janes <jeff.janes@gmail.com> writes: > >> On Thu, Aug 18, 2016 at 2:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> It does know it, what it doesn't know is how many duplicates there are. > > > >> Does it know whether the count comes from a parsed query-string list/array, > >> rather than being an estimate from something else? If it came from a join, > >> I can see why it would be dangerous to assume they are mostly distinct. > >> But if someone throws 6000 things into a query string and only 200 distinct > >> values among them, they have no one to blame but themselves when it makes > >> bad choices off of that. > > > > I am not exactly sold on this assumption that applications have > > de-duplicated the contents of a VALUES or IN list. They haven't been > > asked to do that in the past, so why do you think they are doing it? > > It's hard to know, but my intuition is that most people would > deduplicate. I mean, nobody is going to want to their query generator > to send X IN (1, 1, <repeat a zillion more times>) to the server if it > could have just sent X IN (1). Also, if we patch it this way and somebody has a slow query because of a lot of duplicate values, it's easy to solve the problem by de-duplicating. But with the current code, people that have the opposite problem has no way to work around it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: