Segmentation Fault

Поиск
Список
Период
Сортировка
От Robert Sanford
Тема Segmentation Fault
Дата
Msg-id CABa+nRvwZy_5t1QF9NJNGwAf03tv_PO_Sg1FsN1+-3Odb1XgBA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Segmentation Fault  (Robert Sanford <wobbet@gmail.com>)
Re: Segmentation Fault  (Laurenz Albe <laurenz.albe@cybertec.at>)
Список pgsql-bugs
PostgreSQL v15.4
Running on Ubuntu Jammy
ARM64 x8
64 GB RAM

I have been trying to tune the performance of a query, and wasn't having a lot of luck.I decided to run VACUUM ANALYZE to see if that would help.

The Postgres server crashed.

I ran through each of the tables individually and ran VACUUM ANALYZE one by one. That way completed successfully.

I ran my query to see if that had helped performance. Segmentation fault. The server restarted itself and came back to life. Reran the query and it crashed again. Every time I run the query I get the same segmentation fault. Different queries run just fine. 

I tried attaching GDB to the PID that the log file says is the main pid for startup. That seems to have worked in that it tells me that there were various libraries that were loaded. I'm not sure if it is sufficient or not. I'm pretty sure it isn't.

I tried getting GDB based on DebuggingProgramCrash - Ubuntu Wiki which says "If you are on Ubuntu Jammy or later, you do not need to worry about installing debug symbol packages anymore." I am running on Jammy so that should be good.

I then tried getting a Valgrind trace from PostgreSQL. I used "systemctl stop postgresql" to stop the service from running. I launched Valgrind using the command line below my sig. And everything worked just fine except the query took 10x the time that it was taking before I started the VACUUM ANALYZE. 

So I restarted the service, reran my query, and it failed with the same log entries showing a Signal 11: Segmentation fault.

Running it inside of Valgrind stopped the error from happening which is strange to me. 

What can I do to provide enough information to figure out why this is crashing?

rjsjr



Valgrind Command
valgrind --leak-check=no --gen-suppressions=all \
--suppressions=src/tools/valgrind.supp --time-stamp=yes \
--error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END \
--log-file=/datadrive/%p.log --trace-children=yes \
/usr/lib/postgresql/15/bin/postgres --config-file=/etc/postgresql/15/main/postgresql.conf \
--log_line_prefix="%m %p " \
--log_statement=all --shared_buffers=64MB 2>&1 | tee /datadrive/postmaster.log


GDB Backtrace command
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0x0000ffffb9c6e844 in __GI___select (nfds=9, readfds=0xffffd0849280, writefds=0x0, exceptfds=0x0, timeout=0xffffd08491e8) at ../sysdeps/unix/sysv/linux/select.c:69
69      ../sysdeps/unix/sysv/linux/select.c: No such file or directory.
(gdb) bt
#0  0x0000ffffb9c6e844 in __GI___select (nfds=9, readfds=0xffffd0849280, writefds=0x0, exceptfds=0x0, timeout=0xffffd08491e8) at ../sysdeps/unix/sysv/linux/select.c:69
#1  0x0000aaaab91b4a1c in ?? ()
#2  0x0000aaaab91b63cc in PostmasterMain ()
#3  0x0000aaaab8f053d8 in main ()


PostgreSQL Log File
2023-09-13 20:20:21.198 UTC [295031] LOG:  starting PostgreSQL 15.4 (Ubuntu 15.4-1.pgdg20.04+1) on aarch64-unknown-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
2023-09-13 20:20:21.198 UTC [295031] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-09-13 20:20:21.198 UTC [295031] LOG:  listening on IPv6 address "::", port 5432
2023-09-13 20:20:21.203 UTC [295031] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-09-13 20:20:21.218 UTC [295037] LOG:  database system was shut down at 2023-09-13 20:20:20 UTC
2023-09-13 20:20:21.228 UTC [295031] LOG:  database system is ready to accept connections
2023-09-13 20:21:21.604 UTC [295031] LOG:  server process (PID 295102) was terminated by signal 11: Segmentation fault
2023-09-13 20:21:21.604 UTC [295031] DETAIL:  Failed process was running: --explain analyze
        select
                a.equipmenttype
                , a.technology
                , count(*)
        from
                newfaultentrylineitems a
        where
                a.databaseguid = '9a277d83-30eb-4736-99f2-d028ff84611e'
                and a.assessmentyearmonth >= 202209
                and a.assessmentyearmonth < 202309
        group by a.equipmenttype, a.technology
        order by count(*) desc
2023-09-13 20:21:21.604 UTC [295031] LOG:  terminating any other active server processes
2023-09-13 20:21:21.610 UTC [295031] LOG:  all server processes terminated; reinitializing
2023-09-13 20:21:22.074 UTC [295109] LOG:  database system was interrupted; last known up at 2023-09-13 20:20:21 UTC
2023-09-13 20:21:22.137 UTC [295112] FATAL:  the database system is in recovery mode
2023-09-13 20:21:22.187 UTC [295109] LOG:  database system was not properly shut down; automatic recovery in progress
2023-09-13 20:21:22.195 UTC [295109] LOG:  redo starts at 17/6AFACF40
2023-09-13 20:21:22.195 UTC [295109] LOG:  invalid record length at 17/6AFACF78: wanted 24, got 0
2023-09-13 20:21:22.195 UTC [295109] LOG:  redo done at 17/6AFACF40 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-09-13 20:21:22.208 UTC [295110] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-09-13 20:21:22.383 UTC [295113] FATAL:  the database system is not yet accepting connections
2023-09-13 20:21:22.383 UTC [295113] DETAIL:  Consistent recovery state has not been yet reached.
2023-09-13 20:21:22.428 UTC [295110] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.199 s, sync=0.004 s, total=0.224 s; sync files=2, longest=0.003>
2023-09-13 20:21:22.435 UTC [295031] LOG:  database system is ready to accept connections
2023-09-13 20:40:24.416 UTC [295031] LOG:  server process (PID 295140) was terminated by signal 11: Segmentation fault
2023-09-13 20:40:24.416 UTC [295031] DETAIL:  Failed process was running: --explain analyze
        select
                a.equipmenttype
                , a.technology
                , count(*)
        from
                newfaultentrylineitems a
        where
                a.databaseguid = '9a277d83-30eb-4736-99f2-d028ff84611e'
                and a.assessmentyearmonth >= 202209
                and a.assessmentyearmonth < 202309
        group by a.equipmenttype, a.technology
        order by count(*) desc
2023-09-13 20:40:24.416 UTC [295031] LOG:  terminating any other active server processes
2023-09-13 20:40:24.666 UTC [295523] FATAL:  the database system is in recovery mode
2023-09-13 20:40:24.922 UTC [295524] FATAL:  the database system is in recovery mode
2023-09-13 20:40:25.182 UTC [295525] FATAL:  the database system is in recovery mode
2023-09-13 20:40:25.379 UTC [295526] FATAL:  the database system is in recovery mode
2023-09-13 20:40:25.574 UTC [295527] FATAL:  the database system is in recovery mode
2023-09-13 20:40:25.775 UTC [295528] FATAL:  the database system is in recovery mode
2023-09-13 20:40:25.977 UTC [295529] FATAL:  the database system is in recovery mode
2023-09-13 20:40:26.179 UTC [295530] FATAL:  the database system is in recovery mode
2023-09-13 20:40:26.377 UTC [295531] FATAL:  the database system is in recovery mode
2023-09-13 20:40:26.578 UTC [295532] FATAL:  the database system is in recovery mode
2023-09-13 20:40:26.777 UTC [295533] FATAL:  the database system is in recovery mode
2023-09-13 20:40:26.973 UTC [295534] FATAL:  the database system is in recovery mode
2023-09-13 20:40:27.166 UTC [295535] FATAL:  the database system is in recovery mode
2023-09-13 20:40:27.372 UTC [295536] FATAL:  the database system is in recovery mode
2023-09-13 20:40:27.568 UTC [295537] FATAL:  the database system is in recovery mode
2023-09-13 20:40:27.766 UTC [295538] FATAL:  the database system is in recovery mode
2023-09-13 20:40:27.961 UTC [295539] FATAL:  the database system is in recovery mode
2023-09-13 20:40:28.163 UTC [295540] FATAL:  the database system is in recovery mode
2023-09-13 20:40:28.389 UTC [295541] FATAL:  the database system is in recovery mode
2023-09-13 20:40:28.589 UTC [295542] FATAL:  the database system is in recovery mode
2023-09-13 20:40:28.784 UTC [295543] FATAL:  the database system is in recovery mode
2023-09-13 20:40:28.978 UTC [295544] FATAL:  the database system is in recovery mode
2023-09-13 20:40:29.114 UTC [295031] LOG:  issuing SIGKILL to recalcitrant children
2023-09-13 20:40:29.178 UTC [295545] FATAL:  the database system is in recovery mode
2023-09-13 20:40:29.403 UTC [295546] FATAL:  the database system is in recovery mode
2023-09-13 20:40:29.603 UTC [295547] FATAL:  the database system is in recovery mode
2023-09-13 20:40:29.798 UTC [295548] FATAL:  the database system is in recovery mode
2023-09-13 20:40:30.017 UTC [295549] FATAL:  the database system is in recovery mode
2023-09-13 20:40:30.214 UTC [295550] FATAL:  the database system is in recovery mode
2023-09-13 20:40:58.414 UTC [295031] LOG:  all server processes terminated; reinitializing
2023-09-13 20:40:58.876 UTC [295553] LOG:  database system was interrupted; last known up at 2023-09-13 20:21:22 UTC
2023-09-13 20:40:58.986 UTC [295553] LOG:  database system was not properly shut down; automatic recovery in progress
2023-09-13 20:40:58.992 UTC [295553] LOG:  redo starts at 17/6AFACFF0
2023-09-13 20:40:58.992 UTC [295553] LOG:  invalid record length at 17/6AFAD028: wanted 24, got 0
2023-09-13 20:40:58.992 UTC [295553] LOG:  redo done at 17/6AFACFF0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2023-09-13 20:40:59.006 UTC [295554] LOG:  checkpoint starting: end-of-recovery immediate wait
2023-09-13 20:40:59.228 UTC [295554] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.198 s, sync=0.004 s, total=0.228 s; sync files=2, longest=0.003>
2023-09-13 20:40:59.237 UTC [295031] LOG:  database system is ready to accept connections





В списке pgsql-bugs по дате отправления:

Предыдущее
От: Gokul Krishnan
Дата:
Сообщение: Re: BUG #18098: Console code page issue Postgresql 14.7
Следующее
От: Robert Sanford
Дата:
Сообщение: Re: Segmentation Fault