Re: row filtering for logical replication
От | Andres Freund |
---|---|
Тема | Re: row filtering for logical replication |
Дата | |
Msg-id | 20210128022032.eq2qqc6zxkqn5syt@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: row filtering for logical replication (Önder Kalacı <onderkalaci@gmail.com>) |
Список | pgsql-hackers |
Hi, On 2020-12-17 09:43:30 +0300, Önder Kalacı wrote: > The above part can be considered the core of the logic, executed per tuple. > As far as I can see, it has two downsides. > > First, calling `expression_planner()` for every tuple can be quite > expensive. I created a sample table, loaded data and ran a quick benchmark > to see its effect. I attached the very simple script that I used to > reproduce the issue on my laptop. I'm pretty sure you can find nicer ways > of doing similar perf tests, just sharing as a reference. > > The idea of the test is to add a WHERE clause to a table, but none of the > tuples are filtered out. They just go through this code-path and send it to > the remote node. > > #rows Patched | Master > 1M 00:00:25.067536 | 00:00:16.633988 > 10M 00:04:50.770791 | 00:02:40.945358 > > > So, it seems a significant overhead to me. What do you think? That seems almost prohibitively expensive. I think at the very least some of this work would need to be done in a cached manner, e.g. via get_rel_sync_entry(). > Secondly, probably more importantly, allowing any operator is as dangerous > as allowing any function as users can create/overload operator(s). That's not safe, indeed. It's not even just create/overloading operators, as far as I can tell the expression can contain just plain function calls. The issue also isn't primarily that the user can overload functions, it's that logical decoding is a limited environment, and not everything is safe to do within. You e.g. only catalog tables can be accessed. Therefore I don't think we can allow arbitrary expressions. > The other problematic area was the performance, as calling > `expression_planner()` for every tuple can be very expensive. To avoid > that, it might be considered to ask users to provide a function instead of > a free form WHERE clause, such that if the function returns true, the tuple > is sent. The allowed functions need to be immutable SQL functions with bool > return type. As we can parse the SQL functions, we should be able to allow > only functions that rely on the above mentioned procs. We can apply as many > restrictions (such as no modification query) as possible. For example, see > below: > ``` I don't think that would get us very far. From a safety aspect: A function's body can be changed by the user at any time, therefore we cannot rely on analyses of the function's body. From a performance POV: SQL functions are planned at every invocation, so that'd not buy us much either. I think what you would have to do instead is to ensure that the expression is "simple enough", and then process it into a cheaply executable format in get_rel_sync_entry(). I'd suggest that in the first version you just allow a simple ANDed list of 'foo.bar op constant' expressions. Does that make sense? Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: