Обсуждение: [HACKERS] per-sesson errors after interrupting CLUSTER pg_attrdef

Поиск
Список
Период
Сортировка

[HACKERS] per-sesson errors after interrupting CLUSTER pg_attrdef

От
Justin Pryzby
Дата:
This was briefly scary but seems to have been limited to my psql session (no
other errors logged).  Issue with catcache (?)

I realized that the backup job I'd kicked off was precluding the CLUSTER from
running, but that CLUSTER was still holding lock and stalling everything else
under the sun.

psql (10.0, server 9.6.5)
...

ts=# CLUSTER pg_attribute USING pg_attribute_relid_attnum_index ;
                                                                                           ^CCancel request sent

                                                             ERROR:  canceling statement due to user request
 

ts=# \df qci_add*
ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes
ts=# \dt+ pg_att

ts=# \dt+ pg_attrdef
ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes
ts=# ^C
ts=# \q

postgres=# SELECT session_line ln, user_name, error_severity sev, left(message,66) , left(query,66) FROM
postgres_log_2017_10_19_1700WHERE session_id='59e93953.20c9'  ORDER BY 1,2 DESC ;ln | user_name |  sev  |
                left                                |                                left

 

----+-----------+-------+--------------------------------------------------------------------+--------------------------------------------------------------------
1| pryzbyj   | LOG   | statement: CLUSTER pg_attribute USING pg_attribute_relid_attnum_in |  2 | pryzbyj   | ERROR |
cancelingstatement due to user request                            | CLUSTER pg_attribute USING
pg_attribute_relid_attnum_index; 3 | pryzbyj   | LOG   | statement: SELECT n.nspname as "Schema",
  +|    |           |       |   p.proname as "Name",                                            +|    |           |
 |                                                                    |  4 | pryzbyj   | ERROR | could not read block 8
infile "base/16400/999225102": read only 0 | SELECT n.nspname as "Schema",                                     +   |
      |       |                                                                    |   p.proname as "Name",
                              +   |           |       |
  |   pg_catalog. 5 | pryzbyj   | LOG   | statement: SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalo |  6 |
pryzbyj  | ERROR | could not read block 1 in file "base/16400/999225102": read only 0 | SELECT
pg_catalog.quote_ident(c.relname)FROM pg_catalog.pg_class  7 | pryzbyj   | LOG   | statement: SELECT
pg_catalog.quote_ident(c.relname)FROM pg_catalo |  8 | pryzbyj   | ERROR | could not read block 1 in file
"base/16400/999225102":read only 0 | SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class  9 | pryzbyj   |
LOG  | statement: SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalo | 10 | pryzbyj   | ERROR | could not read
block1 in file "base/16400/999225102": read only 0 | SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class
11| pryzbyj   | LOG   | statement: SELECT n.nspname as "Schema",                          +|    |           |       |
c.relnameas "Name",                                            +|    |           |       |
                                     | 12 | pryzbyj   | ERROR | could not read block 8 in file "base/16400/999225102":
readonly 0 | SELECT n.nspname as "Schema",                                     +   |           |       |
                                                   |   c.relname as "Name",
+  |           |       |                                                                    |   CASE c.relk
 
(12 rows)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] per-sesson errors after interrupting CLUSTER pg_attrdef

От
Michael Paquier
Дата:
On Fri, Oct 20, 2017 at 9:01 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> This was briefly scary but seems to have been limited to my psql session (no
> other errors logged).  Issue with catcache (?)
>
> I realized that the backup job I'd kicked off was precluding the CLUSTER from
> running, but that CLUSTER was still holding lock and stalling everything else
> under the sun.
>
> ts=# \df qci_add*
> ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes
> ts=# \dt+ pg_att
>
> ts=# \dt+ pg_attrdef
> ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes

Perhaps that's too late, but to which relation is this relfilenode
actually referring to?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] per-sesson errors after interrupting CLUSTERpg_attribute (not attrdef)

От
Justin Pryzby
Дата:
On Tue, Oct 24, 2017 at 04:56:27PM -0700, Michael Paquier wrote:
> On Fri, Oct 20, 2017 at 9:01 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > This was briefly scary but seems to have been limited to my psql session (no
> > other errors logged).  Issue with catcache (?)
> >
> > I realized that the backup job I'd kicked off was precluding the CLUSTER from
> > running, but that CLUSTER was still holding lock and stalling everything else
> > under the sun.
> >
> > ts=# \df qci_add*
> > ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes
> > ts=# \dt+ pg_att
> >
> > ts=# \dt+ pg_attrdef
> > ERROR:  could not read block 8 in file "base/16400/999225102": read only 0 of 8192 bytes
> 
> Perhaps that's too late, but to which relation is this relfilenode
> actually referring to?

Sorry for the late response :)

..but I've had mixed success reproducign this, and wasn't sure if it even an
issue under current postgres (the original issue was a 9.6 server+10.0client).

Now I know, this is still an issue under PG10.2:

[pryzbyj@united-telsasoft ~]$ psql ts
psql (10.2)
Type "help" for help.

ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
INFO:  clustering "pg_catalog.pg_attribute" using sequential scan and sort
INFO:  "pg_attribute": found 18074 removable, 2488511 nonremovable row versions in 80769 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 9.66 s, system: 1.92 s, elapsed: 22.29 s.
^CCancel request sent
ERROR:  canceling statement due to user request
ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
INFO:  clustering "pg_catalog.pg_attribute" using sequential scan and sort
INFO:  "pg_attribute": found 18102 removable, 2488511 nonremovable row versions in 80769 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 10.00 s, system: 1.59 s, elapsed: 27.60 s.
^CCancel request sent
ERROR:  canceling statement due to user request
ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
ERROR:  could not open file "base/16400/948150297": No such file or directory
ts=# SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150297;
ERROR:  column "pg_relation_filenode" does not exist
LINE 1: SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150...
                                         ^
ts=# SELECT * FROM pg_attribute WHERE pg_relation_filenode()=948150297;
ERROR:  function pg_relation_filenode() does not exist
LINE 1: SELECT * FROM pg_attribute WHERE pg_relation_filenode()=9481...
                                         ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
ts=# SELECT * FROM pg_class WHERE pg_relation_filenode(oid)=948150297;
ERROR:  could not open file "base/16400/948150297": No such file or directory
ts=# 

In another session:
ts=# SELECT * FROM pg_class WHERE pg_relation_filenode(oid)=948150297;
 relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples |
relallvisible| reltoastrelid | relhasindex | relisshared | relpersistence | relkind | relnatts | relchecks | relhasoids
|rel
 
haspkey | relhasrules | relhastriggers | relhassubclass | relrowsecurity | relforcerowsecurity | relispopulated |
relreplident| relispartition | relfrozenxid | relminmxid | relacl | reloptions | relpartbound
 

---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+----

--------+-------------+----------------+----------------+----------------+---------------------+----------------+--------------+----------------+--------------+------------+--------+------------+--------------
(0 rows)

postgres=# SELECT session_line, message, query FROM postgres_log_2018_03_20_1200 WHERE pid=29855 AND
session_id='5ab147d9.749f'ORDER BY 1;
 
session_line|message|query
1|statement: SELECT pg_catalog.quote_ident(c2.relname)   FROM pg_catalog.pg_class c1, pg_catalog.pg_class c2,
pg_catalog.pg_indexi WHERE c1.oid=i.indrelid and i.indexrelid=c2.oid       and (0 = pg_catalog.length(''))       and
pg_catalog.quote_ident(c1.relname)='pg_attribute'      and pg_catalog.pg_table_is_visible(c2.oid)
 
LIMIT 1000|
2|statement: SELECT pg_catalog.quote_ident(c2.relname)   FROM pg_catalog.pg_class c1, pg_catalog.pg_class c2,
pg_catalog.pg_indexi WHERE c1.oid=i.indrelid and i.indexrelid=c2.oid       and (24 =
pg_catalog.length('pg_attribute_relid_attnu'))      and pg_catalog.quote_ident(c1.relname)='pg_attribute'       and
pg_catalog.pg_table_is_visible(c2.oid)
LIMIT 1000|
3|statement: CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;|
4|clustering "pg_catalog.pg_attribute" using sequential scan and sort|
5|temporary file: path "base/pgsql_tmp/pgsql_tmp29855.0", size 374013952|CLUSTER VERBOSE pg_attribute USING
pg_attribute_relid_attnum_index;
 
6|"pg_attribute": found 18074 removable, 2488511 nonremovable row versions in 80769 pages|
7|canceling statement due to user request|CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
8|statement: CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;|
9|clustering "pg_catalog.pg_attribute" using sequential scan and sort|
10|temporary file: path "base/pgsql_tmp/pgsql_tmp29855.1", size 374013952|CLUSTER VERBOSE pg_attribute USING
pg_attribute_relid_attnum_index;
 
11|"pg_attribute": found 18102 removable, 2488511 nonremovable row versions in 80769 pages|
12|canceling statement due to user request|CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
13|statement: CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;|
14|could not open file "base/16400/948150297": No such file or directory|CLUSTER VERBOSE pg_attribute USING
pg_attribute_relid_attnum_index;
 
15|statement: SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150297;|
16|column "pg_relation_filenode" does not exist|SELECT * FROM pg_attribute WHERE pg_relation_filenode=948150297;
17|statement: SELECT * FROM pg_attribute WHERE pg_relation_filenode()=948150297;|
18|function pg_relation_filenode() does not exist|SELECT * FROM pg_attribute WHERE pg_relation_filenode()=948150297;
19|statement: SELECT * FROM pg_class WHERE pg_relation_filenode(oid)=948150297;|
20|could not open file "base/16400/948150297": No such file or directory|SELECT * FROM pg_class WHERE
pg_relation_filenode(oid)=948150297;
(20 rows)

I believe lines 1-2 were tab completion, which was also present in my original report.

In my original report, I believe \df wasn't working (or maybe tab completion
for a function?).  This is different, as the CLUSTER isn't working.

Also note the subject of my original report was wrong (pg_attrdef not pg_attribute).

I'll leave the broken session opened for awhile in cases someone has questions
about it other than relfilenode, which apparently isn't working..

I don't know much but..is this related to pg_filenode.map?

Justin


Re: [HACKERS] per-sesson errors after interrupting CLUSTERpg_attribute (not attrdef)

От
Justin Pryzby
Дата:
On Tue, Mar 20, 2018 at 01:03:57PM -0500, Justin Pryzby wrote:
> On Tue, Oct 24, 2017 at 04:56:27PM -0700, Michael Paquier wrote:
> > On Fri, Oct 20, 2017 at 9:01 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > This was briefly scary but seems to have been limited to my psql session (no
> > > other errors logged).  Issue with catcache (?)
> 
> Now I know, this is still an issue under PG10.2:
> 
> [pryzbyj@united-telsasoft ~]$ psql ts
> psql (10.2)
> ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
> INFO:  clustering "pg_catalog.pg_attribute" using sequential scan and sort
> INFO:  "pg_attribute": found 18102 removable, 2488511 nonremovable row versions in 80769 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> CPU: user: 10.00 s, system: 1.59 s, elapsed: 27.60 s.
> ^CCancel request sent
> ERROR:  canceling statement due to user request
> ts=# CLUSTER VERBOSE pg_attribute USING pg_attribute_relid_attnum_index ;
> ERROR:  could not open file "base/16400/948150297": No such file or directory

> I don't know much but..is this related to pg_filenode.map?

I gather this is the "prefer that it not happen" case here:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/cache/relmapper.c;hb=HEAD#l449

So I think this is a known, accepted/expected behavior and no issue.

Justin