Re: Missing update of all_hasnulls in BRIN opclasses
От | Tomas Vondra |
---|---|
Тема | Re: Missing update of all_hasnulls in BRIN opclasses |
Дата | |
Msg-id | a18e4ec1-86d7-8250-90ee-3337a382c0bb@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Missing update of all_hasnulls in BRIN opclasses (Justin Pryzby <pryzby@telsasoft.com>) |
Ответы |
Re: Missing update of all_hasnulls in BRIN opclasses
|
Список | pgsql-hackers |
Thanks Justin! I've applied all the fixes you proposed, and (hopefully) improved a couple other comments. I've been working on this over the past couple days, trying to polish and commit it over the weekend - both into master and backbranches. Sadly, the backpatching part turned out to be a bit more complicated than I expected, because of the BRIN reworks in PG14 (done by me, as foundation for the new opclasses, so ... well). Anyway, I got it done, but it's a bit uglier than I hoped for and I don't feel like pushing this on Sunday midnight. I think it's correct, but maybe another pass to polish it a bit more is better. So here are two patches - one for 11-13, the other for 14-master. There's also a separate patch with pageinspect tests, but only as a demonstration of various (non)broken cases, not for commit. And then also a bash script generating indexes with random data, randomized summarization etc. - on unpatched systems this happens to fail in about 1/3 of the runs (at least for me). I haven't seen any failures with the patches attached (on any branch). As for the issue / fix, I don't think there's a better solution than what the patch does - we need to distinguish empty / all-nulls ranges, but we can't add a flag because of on-disk format / ABI. So using the existing flags seems like the only option - I haven't heard any other ideas so far, and I couldn't come up with any myself either. I've also thought about alternative "encodings" into allnulls/hasnulls, instead of treating (true,true) as "empty" - but none of that ended up being any simpler, quite the opposite actually, as it would change what the individual flags mean etc. So AFAICS this is the best / least disruptive option. I went over all the places touching these flags, to double check if any of those needs some tweaks (similar to union_tuples, which I missed for a long time). But I haven't found anything else, so I think this version of the patches is complete. As for assessing how many indexes are affected - in principle, any index on columns with NULLs may be broken. But it only matters if the index is used for IS NULL queries, other queries are not affected. I also realized that this only affects insertion of individual tuples into existing all-null summaries, not "bulk" summarization that sees all values at once. This happens because in this case add_values_to_range sets hasnulls=true for the first (NULL) value, and then calls the addValue procedure for the second (non-NULL) one, which resets the allnulls flag to false. But when inserting individual rows, we first set hasnulls=true, but brin_form_tuple ignores that because of allnulls=true. And then when inserting the second row, we start with hasnulls=false again, and the opclass quietly resets the allnulls flag. I guess this further reduces the number of broken indexes, especially for data sets with small null_frac, or for append-only (or -mostly) tables where most of the summarization is bulk. I still feel a bit uneasy about this, but I think the patch is solid. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: