Thread

A problem with WAL

John Summerfield <summer@os2.ami.com.au> — 2001-08-29T02:05:13Z
I've read the documentation for PostgreSQL 7.1.3 and I don't see how to 
tell when I can remove these:
[root@dugite data]# ls pg_xlog/00000000000000* | wc -l
    168
[root@dugite data]#

I'd have more if this hadn't happened:
[summer@possum summer]$ cat pglog
2001-08-29 04:39:56 [21803]  DEBUG:  XLogWrite: new log file created - 
consider increasing WAL_FILES
2001-08-29 04:42:28 [21803]  DEBUG:  XLogWrite: new log file created - 
consider increasing WAL_FILES
2001-08-29 04:45:47 [21803]  DEBUG:  XLogWrite: new log file created - 
consider increasing WAL_FILES
2001-08-29 04:48:32 [4945]   FATAL 2:  ZeroFill(/var/lib/pgsql/data/pg_x
log/xlogtemp.4945) failed: No such file or directory
Server process (pid 4945) exited with status 512 at Wed Aug 29 04:48:33 
2001
Terminating any active server processes...
2001-08-29 04:48:33 [21803]  NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend  died 
abnormally and possibly corrupted shared
memory.
        I have rolled back the current transaction and am       going 
to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2001-08-29 04:48:33 [21722]  NOTICE:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend  died 
abnormally and possibly corrupted shared
memory.
        I have rolled back the current transaction and am       going 
to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
Server processes were terminated at Wed Aug 29 04:48:33 2001
Reinitializing shared memory and semaphores
2001-08-29 04:48:34 [4946]   DEBUG:  database system was interrupted at 
2001-08-29 04:48:32 WST
2001-08-29 04:48:34 [4946]   DEBUG:  CheckPoint record at (0, 
3099970512)
2001-08-29 04:48:34 [4946]   DEBUG:  Redo record at (0, 3099659748); 
Undo record at (0, 294566376); Shutdown FALSE
2001-08-29 04:48:34 [4946]   DEBUG:  NextTransactionId: 9951; NextOid: 
829728
2001-08-29 04:48:34 [4946]   DEBUG:  database system was not properly 
shut down; automatic recovery in progress...
2001-08-29 04:48:34 [4946]   DEBUG:  redo starts at (0, 3099659748)
2001-08-29 04:48:34 [4946]   DEBUG:  ReadRecord: record with zero len 
at (0, 3099987408)
2001-08-29 04:48:34 [4946]   DEBUG:  redo done at (0, 3099987344)
2001-08-29 04:48:36 [4946]   FATAL 2:  ZeroFill(/var/lib/pgsql/data/pg_x
log/xlogtemp.4946) failed: No such file or directory
/usr/bin/postmaster: Startup proc 4946 exited with status 512 - abort
 
[summer@possum summer]$

The problem arises when PG consumes all available disk space.

The problems I see
1) Documentation
2) Performance - there's no obvious need to use all this space
3) Performance - there's no (or insufficient) removal of old logs.
4) Recovery. I see no means of recovering short of either adding disk 
or deleting the entire database.




-- 
Cheers
John Summerfield

Microsoft's most solid OS: http://www.geocities.com/rcwoolley/

Note: mail delivered to me is deemed to be intended for me, for my 
disposition.