Обсуждение: Sudden FTS-related error from parallel worker in 9.6
I have a Postgres 9.6.0 instance for testing purposes. It turned out that in postgresql.conf, this postgres cluster has "default_text_search_config" being set to "ru" for some reason. Suddenly, I encountered with an error relating to full text configuration in test database, where I didn't use FTS at all, I just tried to test parallel aggregation feature: test=# create table test as test-# select test-# i as id, test-# now() - random() * '5 years'::interval as ts, test-# (random() * 100000000)::int4 as number, test-# repeat('x', (10 + 40 * random())::int4) as txt test-# from test-# generate_series(1, 10000000) i; SELECT 10000000 test=# alter table test add primary key (id); ALTER TABLE test=# explain analyze select min(ts) from test; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=226505.45..226505.46 rows=1 width=8) (actual time=1460.227..1460.227 rows=1 loops=1) -> Seq Scan on test (cost=0.00..201504.76 rows=10000276 width=8) (actual time=0.013..697.676 rows=10000000 loops=1) Planning time: 0.098 ms Execution time: 1460.257 ms (4 rows) test=# set max_parallel_workers_per_gather to 8; SET test=# explain analyze select min(ts) from test; *ERROR: invalid value for parameter "default_text_search_config": "ru"* *CONTEXT: parallel worker* Such an error would be OK if I tried to use some text search function (since really no 'ru' configuration is present in 'test' database), but I ovbviously didn't. As you can see, no FTS features are involved in table definition, so it seems really strange to see FTS-related error Bottom line, steps to reproduce: 1) put not existing text search default configurtion 2) set max_parallel_workers_per_gather to 8; 3) do some aggregation query that would involved parallel workers and doesn't use FTS features. What I expected: no FTS-related error, because I FTS was not used by the query.
More observations. I've fixed that issue by obvious change -- commenting out "default_text_search_config" in postgresql.conf. The error gone. Then I altered database, placing incorrect FTS default configuration back: test=# alter database test set default_text_search_config to 'ru'; NOTICE: text search configuration "ru" does not exist ALTER DATABASE Now the query with aggregation produces only warnings (from each worker), no errors: test=# explain analyze select min(ts) from test; WARNING: invalid value for parameter "default_text_search_config": "ru" WARNING: invalid value for parameter "default_text_search_config": "ru" WARNING: invalid value for parameter "default_text_search_config": "ru" WARNING: invalid value for parameter "default_text_search_config": "ru" WARNING: invalid value for parameter "default_text_search_config": "ru" QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=127503.21..127503.22 rows=1 width=8) (actual time=494.276..494.276 rows=1 loops=1) -> Gather (cost=127502.69..127503.20 rows=5 width=8) (actual time=494.207..494.271 rows=6 loops=1) Workers Planned: 5 Workers Launched: 5 -> Partial Aggregate (cost=126502.69..126502.70 rows=1 width=8) (actual time=487.290..487.290 rows=1 loops=6) -> Parallel Seq Scan on test (cost=0.00..121502.55 rows=2000055 width=8) (actual time=0.019..311.045 rows=1666667 loops=6) Planning time: 0.122 ms Execution time: 509.248 ms (8 rows) On Tue, Oct 4, 2016 at 3:38 AM, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > I have a Postgres 9.6.0 instance for testing purposes. It turned out that > in postgresql.conf, this postgres cluster has "default_text_search_config" > being set to "ru" for some reason. > > Suddenly, I encountered with an error relating to full text configuration > in test database, where I didn't use FTS at all, I just tried to test > parallel aggregation feature: > > test=# create table test as > test-# select > test-# i as id, > test-# now() - random() * '5 years'::interval as ts, > test-# (random() * 100000000)::int4 as number, > test-# repeat('x', (10 + 40 * random())::int4) as txt > test-# from > test-# generate_series(1, 10000000) i; > SELECT 10000000 > > test=# alter table test add primary key (id); > ALTER TABLE > > test=# explain analyze select min(ts) from test; > QUERY PLAN > ------------------------------------------------------------ > ------------------------------------------------------------- > Aggregate (cost=226505.45..226505.46 rows=1 width=8) (actual > time=1460.227..1460.227 rows=1 loops=1) > -> Seq Scan on test (cost=0.00..201504.76 rows=10000276 width=8) > (actual time=0.013..697.676 rows=10000000 loops=1) > Planning time: 0.098 ms > Execution time: 1460.257 ms > (4 rows) > > test=# set max_parallel_workers_per_gather to 8; > SET > test=# explain analyze select min(ts) from test; > *ERROR: invalid value for parameter "default_text_search_config": "ru"* > *CONTEXT: parallel worker* > > Such an error would be OK if I tried to use some text search function > (since really no 'ru' configuration is present in 'test' database), but I > ovbviously didn't. > > As you can see, no FTS features are involved in table definition, so it > seems really strange to see FTS-related error > > Bottom line, steps to reproduce: > 1) put not existing text search default configurtion > 2) set max_parallel_workers_per_gather to 8; > 3) do some aggregation query that would involved parallel workers and > doesn't use FTS features. > > What I expected: no FTS-related error, because I FTS was not used by the > query. > >
On 10/3/2016 5:38 PM, Nikolay Samokhvalov wrote: > I have a Postgres 9.6.0 instance for testing purposes. It turned out > that in postgresql.conf, this postgres cluster has > "default_text_search_config" being set to "ru" for some reason. what OS, and how was 9.6.0 installed ? what language/encoding is this OS using by default ? -- john r pierce, recycling bits in santa cruz
On 4 October 2016 at 13:38, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > test=# explain analyze select min(ts) from test; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=226505.45..226505.46 rows=1 width=8) (actual > time=1460.227..1460.227 rows=1 loops=1) > -> Seq Scan on test (cost=0.00..201504.76 rows=10000276 width=8) > (actual time=0.013..697.676 rows=10000000 loops=1) > Planning time: 0.098 ms > Execution time: 1460.257 ms > (4 rows) > > test=# set max_parallel_workers_per_gather to 8; > SET > test=# explain analyze select min(ts) from test; > ERROR: invalid value for parameter "default_text_search_config": "ru" > CONTEXT: parallel worker The reason that you receive this message is that the main process must serialise all GUC settings into text before passing to the worker processes, and when the worker receive those serialised settings from the master, it must apply them so that workers behave the same way as the main process. Probably the question here is why is the worker process validating the GUC when it was not validated when postgresql.conf is loaded -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 4, 2016 at 6:33 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > > The reason that you receive this message is that the main process must > serialise all GUC settings into text before passing to the worker > processes, and when the worker receive those serialised settings from > the master, it must apply them so that workers behave the same way as > the main process. > Thanks for the explanation. I understand that it's side-effect, but nevertheless it's a bug leading to inability to use parallelization and it can break some applications. > > Probably the question here is why is the worker process validating the > GUC when it was not validated when postgresql.conf is loaded > Yes, the question is why the behavior is different. I also noticed that if I set default_text_search_config in postgresql.conf, I don't get warning when starting psql: postgres@dev:~$ psql test Timing is on. psql (9.6.0) Type "help" for help. But again, if I alter database test, setting the same option, I get the warning: postgres@dev:~$ psql test WARNING: invalid value for parameter "default_text_search_config": "ru" Timing is on. psql (9.6.0) Type "help" for help.
Hi, On Tue, 2016-10-04 at 03:38 +0300, Nikolay Samokhvalov wrote: > I have a Postgres 9.6.0 instance for testing purposes. It turned out that > in postgresql.conf, this postgres cluster has "default_text_search_config" > being set to "ru" for some reason. The message says: ERROR: invalid value for parameter "default_text_search_config": "ru" ru is not a valid value here, you should set it to "russian". Regards, -- Devrim GÜNDÜZ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR
On Tue, Oct 4, 2016 at 7:57 AM, Devrim G=C3=BCnd=C3=BCz <devrim@gunduz.org>= wrote: > > The message says: > > ERROR: invalid value for parameter "default_text_search_config": "ru" > > ru is not a valid value here, you should set it to "russian". > No. It's valid for my instance. I have manually created configuration called 'ru'. But not in the databases 'test'.
Hi, On Tue, 2016-10-04 at 08:13 +0300, Nikolay Samokhvalov wrote: > No. It's valid for my instance. > I have manually created configuration called 'ru'. But not in the databases > 'test'. I am not 100% familiar with this, but IIRC you need to create this in that database, too. I first created the tables as you did, and then: postgres=# SELECT version(); version ---------------------------------------------------------------------------------------------------------PostgreSQL 9.6.0on x86_64-pc-linux-gnu, compiled by gcc (GCC) 6.2.1 20160916 (Red Hat 6.2.1-2), 64-bit (1 row) postgres=# CREATE TEXT SEARCH CONFIGURATION ru (COPY=russian); CREATE TEXT SEARCH CONFIGURATION postgres=# EXPLAIN ANALYZE SELECT min(ts) FROM test; QUERY PLAN -----------------------------------------------------------------------------------------------------------------Aggregate (cost=2266.00..2266.01 rows=1 width=8) (actual time=15.740..15.740 rows=1 loops=1) -> Seq Scan on test (cost=0.00..2016.00rows=100000 width=8) (actual time=0.006..8.032 rows=100000 loops=1)Planning time: 0.212 msExecution time:15.782 ms (4 rows) postgres=# SET default_text_search_config TO ru; SET postgres=# EXPLAIN ANALYZE SELECT min(ts) FROM test; QUERY PLAN -----------------------------------------------------------------------------------------------------------------Aggregate (cost=2266.00..2266.01 rows=1 width=8) (actual time=13.292..13.293 rows=1 loops=1) -> Seq Scan on test (cost=0.00..2016.00rows=100000 width=8) (actual time=0.009..6.572 rows=100000 loops=1)Planning time: 0.069 msExecution time:13.317 ms (4 rows) postgres=# SET max_parallel_workers_per_gather TO 8; SET postgres=# EXPLAIN ANALYZE SELECT min(ts) FROM test; QUERY PLAN -----------------------------------------------------------------------------------------------------------------Aggregate (cost=2266.00..2266.01 rows=1 width=8) (actual time=13.701..13.701 rows=1 loops=1) -> Seq Scan on test (cost=0.00..2016.00rows=100000 width=8) (actual time=0.009..6.882 rows=100000 loops=1)Planning time: 0.063 msExecution time:13.722 ms (4 rows) postgres=# \c test You are now connected to database "test" as user "postgres". test=# SET default_text_search_config TO ru; ERROR: invalid value for parameter "default_text_search_config": "ru" -HTH -- Devrim GÜNDÜZ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Twitter: @DevrimGunduz , @DevrimGunduzTR
On Tue, Oct 4, 2016 at 8:37 AM, Devrim G=C3=BCnd=C3=BCz <devrim@gunduz.org>= wrote: > > > I am not 100% familiar with this, but IIRC you need to create this in tha= t > database, too. > No I don't. I'm not using FTS in that database. I undertand that setting this FTS option cluster-wide might be not a good solution, but nobody prohibits doing so. > > I first created the tables as you did, and then: > > postgres=3D# SELECT version(); > version > ------------------------------------------------------------ > --------------------------------------------- > PostgreSQL 9.6.0 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 6.2.1 > 20160916 (Red Hat 6.2.1-2), 64-bit > (1 row) > > postgres=3D# CREATE TEXT SEARCH CONFIGURATION ru (COPY=3Drussian); > CREATE TEXT SEARCH CONFIGURATION > postgres=3D# EXPLAIN ANALYZE SELECT min(ts) FROM test; > QUERY PLAN > ------------------------------------------------------------ > ----------------------------------------------------- > Aggregate (cost=3D2266.00..2266.01 rows=3D1 width=3D8) (actual > time=3D15.740..15.740 rows=3D1 loops=3D1) > -> Seq Scan on test (cost=3D0.00..2016.00 rows=3D100000 width=3D8) (= actual > time=3D0.006..8.032 rows=3D100000 loops=3D1) > Planning time: 0.212 ms > Execution time: 15.782 ms > (4 rows) > > postgres=3D# SET default_text_search_config TO ru; > SET > postgres=3D# EXPLAIN ANALYZE SELECT min(ts) FROM test; > QUERY PLAN > ------------------------------------------------------------ > ----------------------------------------------------- > Aggregate (cost=3D2266.00..2266.01 rows=3D1 width=3D8) (actual > time=3D13.292..13.293 rows=3D1 loops=3D1) > -> Seq Scan on test (cost=3D0.00..2016.00 rows=3D100000 width=3D8) (= actual > time=3D0.009..6.572 rows=3D100000 loops=3D1) > Planning time: 0.069 ms > Execution time: 13.317 ms > (4 rows) > postgres=3D# SET max_parallel_workers_per_gather TO 8; > SET > postgres=3D# EXPLAIN ANALYZE SELECT min(ts) FROM test; > QUERY PLAN > ------------------------------------------------------------ > ----------------------------------------------------- > Aggregate (cost=3D2266.00..2266.01 rows=3D1 width=3D8) (actual > time=3D13.701..13.701 rows=3D1 loops=3D1) > -> Seq Scan on test (cost=3D0.00..2016.00 rows=3D100000 width=3D8) (= actual > time=3D0.009..6.882 rows=3D100000 loops=3D1) > Planning time: 0.063 ms > Execution time: 13.722 ms > (4 rows) > In your case parallel execution didn't happen. Try setting force_parallel_mode to on, for testing purposes. Also, to repeat my case, you need to try running this query in 'test' database, where 'ru' configuration doesn't exist. > > postgres=3D# \c test > You are now connected to database "test" as user "postgres". > test=3D# SET default_text_search_config TO ru; > ERROR: invalid value for parameter "default_text_search_config": "ru" > It's ok and runs like expected. But it by no means relates to my case, where the query wasn't meant to be touching FTS stuff.
On Tue, Oct 4, 2016 at 9:03 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 4 October 2016 at 13:38, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: >> test=# explain analyze select min(ts) from test; >> QUERY PLAN >> ------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=226505.45..226505.46 rows=1 width=8) (actual >> time=1460.227..1460.227 rows=1 loops=1) >> -> Seq Scan on test (cost=0.00..201504.76 rows=10000276 width=8) >> (actual time=0.013..697.676 rows=10000000 loops=1) >> Planning time: 0.098 ms >> Execution time: 1460.257 ms >> (4 rows) >> >> test=# set max_parallel_workers_per_gather to 8; >> SET >> test=# explain analyze select min(ts) from test; >> ERROR: invalid value for parameter "default_text_search_config": "ru" >> CONTEXT: parallel worker > > The reason that you receive this message is that the main process must > serialise all GUC settings into text before passing to the worker > processes, and when the worker receive those serialised settings from > the master, it must apply them so that workers behave the same way as > the main process. > > Probably the question here is why is the worker process validating the > GUC when it was not validated when postgresql.conf is loaded > The validation is done at postmaster startup time as well, but there are few differences. First is during postmaster startup, we are not in any transaction whereas while restoring GUC's in worker we are in a transaction, this error is mainly due to that reason (refer check_TSCurrentConfig()). Second difference is that during postmaster start for this case (where source is PGC_S_FILE) we use elevel as DEBUG3 or LOG depending on whether the process is under postmaster where as in a worker, it is always ERROR. So, I think here one argument is that we can use error level other than ERROR during worker startup, but not sure if that is improvement over current behaviour. Certainly, if we are not able to restore any guc, it is better not to proceed for execution in worker as I think that can lead to erroneous results. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > ... So, I think here one > argument is that we can use error level other than ERROR during worker > startup, but not sure if that is improvement over current behaviour. > Certainly, if we are not able to restore any guc, it is better not to > proceed for execution in worker as I think that can lead to erroneous > results. The real problem here, I think, is that this report indicates that it's possible for a worker to try to execute a query with GUC settings different from its parent. Which means it could deliver results different than the parent would get. That's unacceptable. In Nikolay's report, we get an error because postgresql.conf contains an invalid value, but what if it had contained a valid value that was different from the parent session's? That's entirely legit, in case the file has been edited but the DBA hasn't yet issued SIGHUP. The mechanism needs to be fixed so that the worker absorbs EXACTLY the settings that prevail in the parent, no matter what is currently in postgresql.conf. regards, tom lane
On Sat, Oct 8, 2016 at 11:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> ... So, I think here one >> argument is that we can use error level other than ERROR during worker >> startup, but not sure if that is improvement over current behaviour. >> Certainly, if we are not able to restore any guc, it is better not to >> proceed for execution in worker as I think that can lead to erroneous >> results. > > The real problem here, I think, is that this report indicates that it's > possible for a worker to try to execute a query with GUC settings > different from its parent. Which means it could deliver results different > than the parent would get. That's unacceptable. > > In Nikolay's report, we get an error because postgresql.conf contains > an invalid value, but what if it had contained a valid value that was > different from the parent session's? That's entirely legit, in case > the file has been edited but the DBA hasn't yet issued SIGHUP. > > The mechanism needs to be fixed so that the worker absorbs EXACTLY the > settings that prevail in the parent, no matter what is currently in > postgresql.conf. > Hmm, currently the leader process serialize the guc values that prevail before the launch of workers and then workers restore those values (refer SerializeGUCState and RestoreGUCState). I am not sure what makes you think that we are doing something else. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Sat, Oct 8, 2016 at 11:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The mechanism needs to be fixed so that the worker absorbs EXACTLY the >> settings that prevail in the parent, no matter what is currently in >> postgresql.conf. > Hmm, currently the leader process serialize the guc values that > prevail before the launch of workers and then workers restore those > values (refer SerializeGUCState and RestoreGUCState). I am not sure > what makes you think that we are doing something else. OK, I'd misunderstood the problem --- I supposed that the worker was getting the bad value directly from postgresql.conf, but I now see that that's not happening; the bad value is in place in the leader, and the worker is failing to accept it when it's passed over. This evidently goes back to the comment you made before: >> The validation is done at postmaster startup time as well, but there >> are few differences. First is during postmaster startup, we are not >> in any transaction whereas while restoring GUC's in worker we are in a >> transaction, this error is mainly due to that reason (refer >> check_TSCurrentConfig()). So I guess the question is why in the world are we doing that inside a transaction. Seems like a waste of cycles as well as a cause of inappropriate failures. Or in other words, my comment above still holds, mutatis mutandis: the worker should absorb EXACTLY the settings that prevail in the parent, even if they're nominally invalid. regards, tom lane
On Mon, Oct 10, 2016 at 9:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> On Sat, Oct 8, 2016 at 11:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The mechanism needs to be fixed so that the worker absorbs EXACTLY the >>> settings that prevail in the parent, no matter what is currently in >>> postgresql.conf. > >> Hmm, currently the leader process serialize the guc values that >> prevail before the launch of workers and then workers restore those >> values (refer SerializeGUCState and RestoreGUCState). I am not sure >> what makes you think that we are doing something else. > > OK, I'd misunderstood the problem --- I supposed that the worker was > getting the bad value directly from postgresql.conf, but I now see > that that's not happening; the bad value is in place in the leader, > and the worker is failing to accept it when it's passed over. > > This evidently goes back to the comment you made before: > >>> The validation is done at postmaster startup time as well, but there >>> are few differences. First is during postmaster startup, we are not >>> in any transaction whereas while restoring GUC's in worker we are in a >>> transaction, this error is mainly due to that reason (refer >>> check_TSCurrentConfig()). > > So I guess the question is why in the world are we doing that inside > a transaction. > I think that is required for some of the parameters like "session_authorization". If user has set this for leader backend and we want it to be reflected in worker backend, then without starting transaction it won't allow (refer check_session_authorization()). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Amit Kapila <amit.kapila16@gmail.com> writes: > On Mon, Oct 10, 2016 at 9:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So I guess the question is why in the world are we doing that inside >> a transaction. > I think that is required for some of the parameters like > "session_authorization". If user has set this for leader backend and > we want it to be reflected in worker backend, then without starting > transaction it won't allow (refer check_session_authorization()). Well, we're going to need to fix that. Using a transaction here is basically a bad idea IMO, and the current complaint shows that it creates its own set of problems. Pointing at check_session_authorization is no argument why it should be that way. Without having consumed any caffeine, I'd wonder whether looking at the "source" parameter would help resolve matters. regards, tom lane