Обсуждение: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)
To: pgsql-bugs@postgresql.org
Subject: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)
Environment Information
- PostgreSQL Version: 14.5
- OS: Debian GNU/Linux 11 (bullseye)
- Architecture: aarch64 (8-core CPU)
- Memory: 8GB
- Storage Medium: eMMC
Problem Description
Data corruption occurred in the
tb_workstation_message table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649). Queries targeting this record fail with:plaintext
ERROR: invalid memory alloc request size 18446744073709551613
Corruption details of the record:
- Only the
guidfield remains valid; - All int-type fields (e.g.,
upload_status) are set to 0; - All timestamp-type fields (e.g.,
create_time) are reset to2000-01-01 00:00:00; - String-type fields (e.g.,
operator,content) are inaccessible (queryable only after forcingUPDATEtonull).
Timeline
- Record insertion: 2025-11-25 21:23:51
- Last modification: 2025-11-27 09:41:10
- Anomaly window: 2025-12-01 03:00:00 ~ 09:02:31 (record was intact before 03:00:00)
- First error detected: 2025-12-01 09:02:31 (triggered by a 30s-interval business cron job querying
upload_status = 0)
Error Output (Query Example)
sql
postgres=# select * from tb_workstation_message where guid = 'ce04bd3f-232d-4c7f-9a91-540d0e581649';ERROR: invalid memory alloc request size18446744073709551613
Investigation Details
- Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities found).
- Dynamic Library Check:
libpq.sohad environment variable misconfiguration (not pointing to PG install dir), but the business app does NOT depend on this library — low corruption probability. - eMMC Storage Analysis:
- Used
ddto extract the corrupted block; hex analysis shows onlyguidbytes are valid (others are 0, confirming physical data corruption).
- Used
- fstrim Check:
fstrimexecuted at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this.- Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced.
Impact
- Business failure: Query errors trigger business alerts.
- Backup failure: Database backups are affected by corrupted data.
Additional Notes
- No system/app/PG restarts during the anomaly window.
- No writes to
tb_workstation_messageafter 2025-12-01 05:34:00. - No online operations/terminals between 05:34:00 ~ 09:02:31.
Request:
We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community help analyze the root cause, or guide further troubleshooting?

postgresql_2025-12-01_000000(1).csv (739.5KB, 2026年1月17日 14:42)
进入下载页面 :https://wx.mail.qq.com/ftn/download?func=3&k=c6cd4a393619073df4be1b3961663338e7bf3e3963663338111d585f4c530b5e0e05060d534b57090f54180d025650155805500f4e5f030f0153575d5a5302590b1135490c15475f4b5746480f3901080b071808514b03096602050953560310081b1b5a1010272c6b8d6cf59bcb17826e22a5c9c7abe665fb3a453863663338393235&key=c6cd4a393619073df4be1b3961663338e7bf3e3963663338111d585f4c530b5e0e05060d534b57090f54180d025650155805500f4e5f030f0153575d5a5302590b1135490c15475f4b5746480f3901080b071808514b03096602050953560310081b1b5a1010272c6b8d6cf59bcb17826e22a5c9c7abe665fb3a453863663338393235&code=9259cf38&from=
Вложения
On Thu, 2025-12-18 at 14:42 +0800, 第108次明天 wrote: > Environment Information > * PostgreSQL Version: 14.5 > * OS: Debian GNU/Linux 11 (bullseye) > * Architecture: aarch64 (8-core CPU) > * Memory: 8GB > * Storage Medium: eMMC > Problem Description > Data corruption occurred in the tb_workstation_message table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649). > Queries targeting this record fail with: > ERROR: invalid memory alloc request size 18446744073709551613 > Corruption details of the record: > * Only the guid field remains valid; > * All int-type fields (e.g., upload_status) are set to 0; > * All timestamp-type fields (e.g., create_time) are reset to 2000-01-01 00:00:00; > * String-type fields (e.g., operator, content) are inaccessible (queryable only after forcing UPDATE to null). > Investigation Details 1. Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities found). > 2. Dynamic Library Check: libpq.so had environment variable misconfiguration (not pointing to PG install dir), > but the business app does NOT depend on this library — low corruption probability. > 3. eMMC Storage Analysis: > - Used dd to extract the corrupted block; hex analysis shows only guid bytes are valid (others are 0, > confirming physical data corruption). > 4. fstrim Check: > - fstrim executed at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this. > - Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced. > Impact * Business failure: Query errors trigger business alerts. > * Backup failure: Database backups are affected by corrupted data. > Additional Notes * No system/app/PG restarts during the anomaly window. > * No writes to tb_workstation_message after 2025-12-01 05:34:00. > * No online operations/terminals between 05:34:00 ~ 09:02:31. > Request: > We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community > help analyze the root cause, or guide further troubleshooting? "fstrim" should not destroy data... I am wondering how this problem can lead to backup failure - unless you are using "pg_dump" for backups. You probably figured out that you will have to delete the row to get rid of the problem. In addition, you should dump and restore the database to a new cluster - who knows what other data corruption is lurking there. I don't know what caused your problem, but I would suspect hardware failure. Perhaps you should check or replace the disk. Yours, Laurenz Albe
Re: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)
От
Greg Sabino Mullane
Дата:
- PostgreSQL Version: 14.5
That is way too old. Upgrade to 14.20
or guide further troubleshooting?
https://www.postgresql.org/docs/14/app-pgchecksums.html
Database backups are affected by corrupted data.
I assume you are not using a good backup system then. pgBackRest, for example, will not only check for corrupted pages during the backup, but report on it and allow the backup to proceed.