Thread

Compression (was Re: [HACKERS] varchar/char size)

Andrew Martin <martin@biochemistry.ucl.ac.uk> — 1998-01-12T14:36:20Z
> My CA/Ingres Admin manual points out that there is a tradeoff between
> compressing tuples to save disk storage and the extra processing work
> required to uncompress for use. They suggest that the only case where you
> would consider compressing on disk is when your system is very I/O bound,
> and you have CPU to burn.
> 
> The default for Ingres is to not compress anything, but you can specify
> compression on a table-by-table basis.
> 
> btw, char() is a bit trickier to handle correctly if you do compress it on
> disk, since trailing blanks must be handled correctly all the way through.
> For example, you would want 'hi' = 'hi   ' to be true, which is not a
> requirement for varchar().
> 
>                                                         - Tom

Anybody thought about real gzip style compression? There's a specialiased
RDBMS called Iditis (written specifically for one task) which, like
PostgreSQL stores data at the file level and uses a gzip-based library
to access the files. I gather this is transparent to the software. Has
anyone thought of anything equivalent for PG/SQL?

To be honest I haven't looked into how Iditis does it (it's a commercial
program and I don't have the source). I don't actually see how this
could be done for small writes of data - how does it build the lookup
tables for the compression? However, it might be worth considering for
use with the text field type.

Andrew
----------------------------------------------------------------------------
Dr. Andrew C.R. Martin                             University College London
EMAIL: (Work) martin@biochem.ucl.ac.uk    (Home) andrew@stagleys.demon.co.uk
URL:   http://www.biochem.ucl.ac.uk/~martin
Tel:   (Work) +44(0)171 419 3890                    (Home) +44(0)1372 275775