Thread

  1. [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)

    第108次明天 <87326549@qq.com> — 2025-12-18T06:42:56Z

    To: pgsql-bugs@postgresql.org
    
    Subject: [BUG] PostgreSQL 14.5: Data corruption in table tb_workstation_message (possible fstrim-related on eMMC)
    
    Environment Information
    
    PostgreSQL Version: 14.5
    
    OS: Debian GNU/Linux 11 (bullseye)
    
    Architecture: aarch64 (8-core CPU)
    
    Memory: 8GB
    
    Storage Medium: eMMC
    
    Problem Description
    Data corruption occurred in the tb_workstation_message&nbsp;table (record guid: ce04bd3f-232d-4c7f-9a91-540d0e581649). Queries targeting this record fail with:
    
    plaintext
    
    
    
    ERROR: invalid memory alloc request size 18446744073709551613 
    
    
    
    
    
    Corruption details of the record:
    
    
    Only the guid&nbsp;field remains valid;
    
    All int-type fields (e.g., upload_status) are set to 0;
    
    All timestamp-type fields (e.g., create_time) are reset to 2000-01-01 00:00:00;
    
    String-type fields (e.g., operator, content) are inaccessible (queryable only after forcing UPDATE&nbsp;to null).
    
    Timeline
    
    Record insertion: 2025-11-25 21:23:51
    
    Last modification: 2025-11-27 09:41:10
    
    Anomaly window: 2025-12-01 03:00:00 ~ 09:02:31 (record was intact before 03:00:00)
    
    First error detected: 2025-12-01 09:02:31 (triggered by a 30s-interval business cron job querying upload_status = 0)
    
    Error Output (Query Example)
    sql
    
    
    
    postgres=# select * from tb_workstation_message where guid = 'ce04bd3f-232d-4c7f-9a91-540d0e581649'; ERROR:  invalid memory alloc request size 18446744073709551613 
    
    
    
    
    
    Investigation Details
    
    Database Logs: No output between 2025-12-01 05:37:05.932 ~ 09:02:31.533 (no abnormalities found).
    
    Dynamic Library Check: libpq.so&nbsp;had environment variable misconfiguration (not pointing to PG install dir), but the business app does NOT depend on this library — low corruption probability.
    
    eMMC Storage Analysis:
    
    Used dd&nbsp;to extract the corrupted block; hex analysis shows only guid&nbsp;bytes are valid (others are 0, confirming physical data corruption).
    
    fstrim Check:
    
    fstrim&nbsp;executed at 2025-12-01 00:27 (asynchronous, runs on disk idle); anomaly occurred after this.
    
    Reproduction attempt: Simulated data insertion + 10+ fstrim runs — corruption not reproduced.
    
    Impact
    
    Business failure: Query errors trigger business alerts.
    
    Backup failure: Database backups are affected by corrupted data.
    
    Additional Notes
    
    No system/app/PG restarts during the anomaly window.
    
    No writes to tb_workstation_message&nbsp;after 2025-12-01 05:34:00.
    
    No online operations/terminals between 05:34:00 ~ 09:02:31.
    
    Request:
    
    We suspect this may relate to fstrim on eMMC (with PG 14.5) but cannot reproduce it. Could the community help analyze the root cause, or guide further troubleshooting?
    
    
    
    
    
    
    第108次明天
    87326549@qq.com
    
    
    
     	
     	 		 			从QQ邮箱发来的超大附件 	
     	 		  				 					 						 							 						 					
     					 						 							postgresql_2025-12-01_000000(1).csv 							 (739.5KB, 2026年1月17日 14:42) 						 						 							进入下载页面 							:https://wx.mail.qq.com/ftn/download?func=3&k=c6cd4a393619073df4be1b3961663338e7bf3e3963663338111d585f4c530b5e0e05060d534b57090f54180d025650155805500f4e5f030f0153575d5a5302590b1135490c15475f4b5746480f3901080b071808514b03096602050953560310081b1b5a1010272c6b8d6cf59bcb17826e22a5c9c7abe665fb3a453863663338393235&key=c6cd4a393619073df4be1b3961663338e7bf3e3963663338111d585f4c530b5e0e05060d534b57090f54180d025650155805500f4e5f030f0153575d5a5302590b1135490c15475f4b5746480f3901080b071808514b03096602050953560310081b1b5a1010272c6b8d6cf59bcb17826e22a5c9c7abe665fb3a453863663338393235&code=9259cf38&from=