28 January 2010
Astronomers have long used Unix commands like gzip to compress their data. Gzip was designed to work with text files and does a poor job at best with astronomical data. Other programs like bzip2 may do a better job of compressing data, but are woefully slow. More fundamentally, a gzipped FITS file is no longer FITS and is largely unreadable to familiar FITS tools.
Given all this, the FITS working group created the “Tile Compression” standard in 2001. Tile Compression is a nifty way to handle the bookkeeping for compressing data within the FITS standard itself. This is similar to other graphics standards like jpeg, gif and png, which include compression as an integral part of the format. Tile Compression has long been supported by the widely used CFITSIO library.
The “.fz” extension indicates a FITS tile-compressed file, for example, “a001.fits.fz”.
One very useful feature is that the image headers remain readable for tile-compressed data. The division of the image into small rectangular tiles also permits rapid access on a line-by-line basis without having to uncompress the rest of the pixels. At the bottom of this document are pointers to several documents that describe other important features.
Users will find most “.fz” files will have been compressed using the “Rice” algorithm. Rice compression has been used for many years to achieve high compression ratios for space missions. It is also very fast. Rice has been benchmarked in the NOAO Archive at about 10 times as fast as gzip.
Around 2006 it was recognized that achieving more widespread adoption of “.fz” files would benefit from a stand-alone tool like gzip. That tool is FPACK. FPACK is available for all platforms supported by CFITSIO from:
A detailed user's guide is at:
If you have received tile-compress “.fz” files from the NOAO Archive, these may be uncompressed into the original FITS format using the FUNPACK command:
% funpack cp5001776.fits.fz
% ls -l cp*
-rw-r--r-- 1 bits bits 2396160 Jan 28 18:04 cp5001776.fits
-rw-r--r-- 1 bits bits 1198080 Jan 28 18:03 cp5001776.fits.fz
By default, FUNPACK (and FPACK) retains the original file in the same directory.
As usual, there are multiple options:
% funpack -H
funpack, decompress fpacked files. Version 1.4.1 (Jan 2010) CFITSIO version 3.240
usage: funpack [-E <HDUlist>] [-P <pre>] [-O <name>] [-Z] -v <FITS>
more: [-F] [-D] [-S] [-L] [-C] [-H] [-V]
Flags must be separate and appear before filenames:
-E <HDUlist> Unpack only the list of HDU names or numbers in the file.
-P <pre> Prepend <pre> to create new output filenames.
-O <name> Specify full output file name.
-Z Recompress the output file with host GZIP program.
-F Overwrite input file by output file with same name.
-D Delete input file after writing output.
-S Output uncompressed file to STDOUT file stream.
-L List contents, files unchanged.
-C Don't update FITS checksum keywords.
-v Verbose mode; list each file as it is processed.
-H Show this message.
-V Show version number.
<FITS> FITS files to unpack; enter '-' (a hyphen) to read from stdin.
Refer to the fpack User's Guide for more extensive help.
Since FPACK creates these files, there are even more options to select for different compression algorithms and so forth. Please see the User's Guide.
Support for tile-compression is built-in to the CFITSIO library. A program linked against a recent version of CFITSIO already has the possibility of reading (or writing) “.fz” files:
The IRAF FITSUTIL package now includes support for FITS Tile Compression:
As use of this format expands, we anticipate more community software packages will feature support for FITS Tile Compression. As with jpeg, the ultimate goal is not to separately compress and uncompress each file to restore an original FITS file, but rather to have the ability to maintain the data in its compressed state throughout a processing workflow.
This is explained in an article from the March 2010 NOAO Newsletter:
What is FITS tile compression?
As announced in the accompanying article, the NOAO archive is transitioning to a new flavor of the FITS (Flexible Image Transport System) format. Tile-compressed FITS is a way to represent compressed data within FITS itself, not through the use of some external compression program like gzip. This is similar to how the jpeg, gif or png standards contain built-in compression algorithms.
The IAU FITS working group has recognized the tile compression format since 2001 (http://fits.gsfc.nasa.gov/registry/tilecompression.html). Tile compression has numerous benefits. Images are encoded as FITS binary tables and many standard FITS tools can be used. For instance, image headers remain fully readable. Access is very rapid since each rectangular tile (default is one image line) can be accessed individually without having to uncompress any other pixels.
Multiple image compression algorithms are supported to allow each class of data to benefit from a tailored choice (both lossless and lossy options are supported). For most astronomical data, the lossless Rice algorithm appears to be the best trade-off between speed and compression factor. In fact, Rice is both significantly faster than gzip and produces higher compression ratios, thus smaller files.
By contrast, gzip is a dictionary based compression algorithm well designed for text files. Astronomical images are numerical and it is not surprising that gzip is not ideal for such data. Numerical compression algorithms like Rice have highly beneficial features such as compressing 16-bit and 32-bit pixels of the same data into the same absolute size. This is critical for efficiently representing data (as from 18-bit ADCs) that fall between these short and long integer sizes.
Support for on-the-fly tile compression is built-in to the widely used CFITSIO library, and is available for numerous computer platforms via the standalone FPACK and FUNPACK tools (http://heasarc.nasa.gov/fitsio/fpack). The FITSUTIL package provides support for IRAF users (see http://iraf.noao.edu/extern.html).
Compression is intimately related to the noise within an image. This is discussed in a recent paper, "Lossless Astronomical Image Compression and the Effects of Noise" (http://arxiv.org/abs/0903.2140), with Bill Pence (NASA/GSFC) and Rick White (STScI). FPACK via the underlying tile compression format provides a tool for properly managing that noise.
In particular, FPACK supports noise-sensitive scaling of floating point data to achieve high compression ratios while preserving the scientific content of data. Similar sigma-scaling benefits have been widely discussed recently, for example for the JDEM mission (http://arxiv.org/pdf/0910.4571) and by the Astrometry.net project (http://arxiv.org/pdf/0910.2375). The remarkable results from the Kepler mission rely on noise-scaled data (http://arxiv.org/pdf/1001.0216, section 3.2). To this, FPACK adds the beneficial feature of subtractive dithering (http://www.adass2009.jp/poster/files/PenceWilliam.pdf).
This has been a dense article even with several features and references omitted (eg., a truly gripping discussion of integrated FITS checksum support). Even so, I hope I have conveyed some of my personal excitement over the galvanizing opportunities facing astronomical data compression. This is a transformative technology that will be key to meeting the aggressive data handling requirements for near future projects relying on rapid-readout gigapixel cameras such as the Dark Energy Survey, the WIYN One Degree Imager, and the Large Synoptic Survey Telescope. The soul of data compression is not the static storage of data, but rather the dynamic optimization of throughput throughout Observatory data flow and the community O/IR System.