Abbreviated keys for text cost model fix
От | Peter Geoghegan |
---|---|
Тема | Abbreviated keys for text cost model fix |
Дата | |
Msg-id | CAM3SWZQeCekUShZXgW8_nPuuF5s9eUVt+Fi7LkvBSHmXUUDcZg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Abbreviated keys for text cost model fix
|
Список | pgsql-hackers |
Over in the abbreviated keys for numeric thread, Tomas Vondra reports a case where the ad-hoc cost model of abbreviated keys/sortsupport for text is far too conservative, and aborts a case where abbreviated keys can still greatly help. Previously, I proposed additions to the cost model that dealt with all this, by passing down a reltuples hint from the optimizer or a relcache entry to tuplesort. Robert seemed to think that this was a little bit much, and that it could be discussed at a later point. It's clear that we need to do something about cases like the one reported by Tomas, though. Attached patch adds a decay to the threshold that (estimated) abbreviated key cardinality must cross as a proportion of the (estimated) cardinality of the original set of strings to be sorted, while also quadrupling the initial required proportional cardinality to 20% of full key cardinality (but for just the first 10,000 strings, before this threshold is decayed aggressively). This seems significantly less invasive than what I previously proposed, while still being effective. I suggest that this be treated as a bugfix, since Tomas reports a 5% -7% regression with his case. OTOH, with this patch, I saw the same case have a ~300% improvement in performance. The adjusted cost model strongly prefers to decide to abort early, which seems desirable. In one way this makes it a little more vulnerable to skewed sets, since early data matters more, but in another more important way it's less vulnerable, since perhaps that skew indicates a general skew in the data, a skew that makes the idea of measuring the cardinality of abbreviated keys as a proxy for full key cardinality less useful. Thoughts? -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: