Re: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)
| От | Laurenz Albe |
|---|---|
| Тема | Re: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC) |
| Дата | |
| Msg-id | c4e25cb1ce97f866915a9451b9e15fbe7672457f.camel@cybertec.at обсуждение исходный текст |
| Ответ на | [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC) ("第108次明天" <87326549@qq.com>) |
| Список | pgsql-bugs |
On Thu, 2025-12-18 at 14:42 +0800, 第108次明天 wrote: > Environment Information > * PostgreSQL Version: 14.5 > * OS: Debian GNU/Linux 11 (bullseye) > * Architecture: aarch64 (8-core CPU) > * Memory: 8GB > * Storage Medium: eMMC > Problem Description > Data corruption occurred in the tb_workstation_message table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649). > Queries targeting this record fail with: > ERROR: invalid memory alloc request size 18446744073709551613 > Corruption details of the record: > * Only the guid field remains valid; > * All int-type fields (e.g., upload_status) are set to 0; > * All timestamp-type fields (e.g., create_time) are reset to 2000-01-01 00:00:00; > * String-type fields (e.g., operator, content) are inaccessible (queryable only after forcing UPDATE to null). > Investigation Details 1. Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities found). > 2. Dynamic Library Check: libpq.so had environment variable misconfiguration (not pointing to PG install dir), > but the business app does NOT depend on this library — low corruption probability. > 3. eMMC Storage Analysis: > - Used dd to extract the corrupted block; hex analysis shows only guid bytes are valid (others are 0, > confirming physical data corruption). > 4. fstrim Check: > - fstrim executed at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this. > - Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced. > Impact * Business failure: Query errors trigger business alerts. > * Backup failure: Database backups are affected by corrupted data. > Additional Notes * No system/app/PG restarts during the anomaly window. > * No writes to tb_workstation_message after 2025-12-01 05:34:00. > * No online operations/terminals between 05:34:00 ~ 09:02:31. > Request: > We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community > help analyze the root cause, or guide further troubleshooting? "fstrim" should not destroy data... I am wondering how this problem can lead to backup failure - unless you are using "pg_dump" for backups. You probably figured out that you will have to delete the row to get rid of the problem. In addition, you should dump and restore the database to a new cluster - who knows what other data corruption is lurking there. I don't know what caused your problem, but I would suspect hardware failure. Perhaps you should check or replace the disk. Yours, Laurenz Albe
В списке pgsql-bugs по дате отправления: