Re: pgsql: Extended statistics on expressions
От | Tomas Vondra |
---|---|
Тема | Re: pgsql: Extended statistics on expressions |
Дата | |
Msg-id | c61cfe32-251f-e9cb-1be1-0e15218cda17@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: pgsql: Extended statistics on expressions (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Ответы |
Re: pgsql: Extended statistics on expressions
|
Список | pgsql-committers |
On 3/31/21 7:54 PM, Tomas Vondra wrote: > On 3/31/21 7:08 PM, Alvaro Herrera wrote: >> On 2021-Mar-31, Tomas Vondra wrote: >> >>> Thanks for the report, I'll take a look. You're right this seems like an >>> out-of-bounds access, but mcv_match_expression is only expected to be >>> run on expressions we know are in the statistics (because we pick the >>> statistics like that). Clearly, that does not happen here, not sure why. >>> >>> It's quite weird that we end up running textlike(), when the statistics >>> is on (double precision, boolean) columns ... >> >> Uninitialized values somewhere? Maybe valgrind would help. >> > > Unlikely, I've ran it through valgrind repeatedly, including right > before commit (both on x86_64 and arm). > > FWIW I'm unable to reproduce it, so not sure what's going on. David, > what configure option are you using? Anything special? > > > It's a bit strange, because statext_mcv_clauselist_selectivity should > only estimate "matching" clauses on the statistics. So how come this > estimates such a complex expression using textlike(), when neither of > those columns is text? > > It'd be interesting to know what's happening in the code after > > stat = choose_best_statistics(...); > > i.e. what clauses it considers "compatible" with the statistics and why. > In fact, I wouldn't have expected the statistics to be used at all. > OK, I managed to reproduce/trigger the issue. The simplest query that triggers the issue for me is this: SELECT t1.c0 FROM ONLY t1 WHERE ( upper('x') LIKE ('x'||('[0,1]'::int4range)) AND (t1.c0 IN (0, 1) OR t1.c1) ) I think the code matching clauses to the statistics gets a bit confused when processing the AND clause. It extracts 2 attnums for the OR part, but the first part should be "incompatible" with the statistics. But after picking the statistics to apply, it gets confused and includes the first expression (the whole LIKE clause) as compatible too. The attached patch fixes this for me. David, can you check if this resolves the issue for you? I don't feel like I want to push a fix at midnight, and I'd like to think about maybe making this part of the code a bit clearer tomorrow. It's not very comprehensible, I'm afraid. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-committers по дате отправления: