Thread

  1. Re: The ability of postgres to determine loss of files of the main fork

    Jakub Wartak <jakub.wartak@enterprisedb.com> — 2025-10-01T12:05:53Z

    On Wed, Oct 1, 2025 at 1:46 PM Aleksander Alekseev
    <aleksander@tigerdata.com> wrote:
    >
    > Hi Jakub,
    >
    > > IMHO all files should be opened at least on startup to check integrity,
    >
    > That might be a lot of files to open.
    
    I was afraid of that, but let's say modern high-end is 200TB big DB,
    that's like 200*1024 1GB files, but I'm getting such time(1) timings
    for 204k files on ext4:
    
    $ time ./createfiles                      # real    0m2.157s, it's
    open(O_CREAT)+close()
    $ time ls -l many_files_dir/ > /dev/null # real    0m0.734s
    $ time ./openfiles                          # real    0m0.297s , for
    already existing ones (hot)
    $ time ./openfiles                          # real    0m1.456s , for
    already existing ones (cold, echo 3 > drop_caches sysctl)
    
    Not bad in my book as a one time activity. It could pose a problem
    potentially with some high latency open() calls, maybe NFS or
    something remote I guess.
    
    > Even if you can open a file it doesn't mean it's not empty
    
    Correct, I haven't investigated that rabbithole...
    
    > or is not corrupted.
    
    I think checksums guard users well in this case as they would get
    notified that stuff is wonky (much better than wrong result/silent
    data loss)
    
    -J.