28 January 2010
Astronomers have long used Unix commands like gzip to
compress their data. Gzip was
designed to work with text files and does a poor job at best with astronomical
data. Other programs like bzip2
may do a better job of compressing data, but are woefully slow. More fundamentally, a gzipped FITS file
is no longer FITS and is largely unreadable to familiar FITS tools.
Given all this, the FITS working group created the “Tile
Compression” standard in 2001.
Tile Compression is a nifty way to handle the bookkeeping for
compressing data within the FITS standard itself. This is similar to other graphics standards like jpeg, gif
and png, which include compression as an integral part of the format. Tile Compression has long been
supported by the widely used CFITSIO library.
The “.fz” extension indicates a FITS tile-compressed file, for example, “a001.fits.fz”.
One very useful feature is that the image headers remain
readable for tile-compressed data.
The division of the image into small rectangular tiles also permits
rapid access on a line-by-line basis without having to uncompress the rest of
the pixels. At the bottom of this
document are pointers to several documents that describe other important
features.
Users will find most “.fz” files will have been compressed
using the “Rice” algorithm. Rice
compression has been used for many years to achieve high compression ratios for
space missions. It is also very fast. Rice has been benchmarked in the NOAO
Archive at about 10 times as fast as gzip.
Around 2006 it was recognized that achieving more widespread
adoption of “.fz” files would benefit from a stand-alone tool like gzip. That tool is FPACK. FPACK
is available for all platforms supported by CFITSIO from:
http://heasarc.nasa.gov/fitsio/fpack
A detailed user's guide is at:
http://heasarc.nasa.gov/fitsio/fpack/fpackguide.pdf
If you have received tile-compress “.fz” files from the NOAO
Archive, these may be uncompressed into the original FITS format using the
FUNPACK command:
% funpack cp5001776.fits.fz
% ls -l cp*
-rw-r--r-- 1 bits bits 2396160 Jan 28 18:04 cp5001776.fits
-rw-r--r-- 1 bits bits 1198080 Jan 28 18:03 cp5001776.fits.fz
By default, FUNPACK (and FPACK) retains the original file
in the same directory.
As usual, there are multiple options:
% funpack -H
funpack, decompress fpacked
files. Version 1.4.1 (Jan 2010)
CFITSIO version 3.240
usage: funpack [-E <HDUlist>]
[-P <pre>] [-O <name>] [-Z] -v <FITS>
more: [-F] [-D] [-S] [-L] [-C] [-H] [-V]
Flags must be separate and appear
before filenames:
-E <HDUlist> Unpack only the list of HDU names or
numbers in the file.
-P <pre> Prepend <pre> to create new output
filenames.
-O <name>
Specify full output file name.
-Z Recompress
the output file with host GZIP program.
-F Overwrite
input file by output file with same name.
-D Delete input
file after writing output.
-S Output
uncompressed file to STDOUT file stream.
-L List
contents, files unchanged.
-C Don't update
FITS checksum keywords.
-v Verbose
mode; list each file as it is processed.
-H Show this
message.
-V Show version
number.
<FITS> FITS
files to unpack; enter '-' (a hyphen) to read from stdin.
Refer to the fpack User's Guide for more extensive help.
Since FPACK creates these files, there are even more options
to select for different compression algorithms and so forth. Please see the User's Guide.
Support for tile-compression is built-in to the CFITSIO
library. A program linked against
a recent version of CFITSIO already has the possibility of reading (or writing)
“.fz” files:
http://heasarc.gsfc.nasa.gov/fitsio
The IRAF FITSUTIL package now includes support for FITS Tile
Compression:
http://iraf.noao.edu/extern.html
As use of this format expands, we anticipate more community
software packages will feature support for FITS Tile Compression. As with jpeg, the ultimate goal is not
to separately compress and uncompress each file to restore an original FITS
file, but rather to have the ability to maintain the data in its compressed
state throughout a processing workflow.
This is explained in an article from the March 2010 NOAO Newsletter:
What
is FITS tile compression?
As announced in the accompanying article, the NOAO
archive is transitioning to a new flavor of the FITS (Flexible Image Transport
System) format. Tile-compressed FITS is a way to represent compressed
data within FITS itself, not through the use of some external compression
program like gzip. This is similar to how the jpeg, gif or png standards
contain built-in compression algorithms.
The IAU FITS working group has recognized the tile
compression format since 2001 (http://fits.gsfc.nasa.gov/registry/tilecompression.html).
Tile compression has numerous benefits. Images are encoded as FITS
binary tables and many standard FITS tools can be used. For instance,
image headers remain fully readable. Access is very rapid since each
rectangular tile (default is one image line) can be accessed
individually without having to uncompress any other pixels.
Multiple image compression algorithms are supported
to allow each class of data to benefit from a tailored choice (both lossless
and lossy options are supported). For most astronomical data, the lossless Rice
algorithm appears to be the best trade-off between speed and compression
factor. In fact, Rice is both significantly faster than gzip and produces
higher compression ratios, thus smaller files.
By contrast, gzip is a dictionary based compression
algorithm well designed for text files. Astronomical images are numerical
and it is not surprising that gzip is not ideal for such data. Numerical
compression algorithms like Rice have highly beneficial features such as
compressing 16-bit and 32-bit pixels of the same data into the same absolute
size. This is critical for efficiently representing data (as from 18-bit
ADCs) that fall between these short and long integer sizes.
Support for on-the-fly tile compression is built-in
to the widely used CFITSIO library, and is available for numerous computer
platforms via the standalone FPACK and FUNPACK tools (http://heasarc.nasa.gov/fitsio/fpack).
The FITSUTIL package provides support for IRAF users (see http://iraf.noao.edu/extern.html).
Compression is intimately related to the noise within
an image. This is discussed in a recent paper, "Lossless
Astronomical Image Compression and the Effects of Noise" (http://arxiv.org/abs/0903.2140),
with Bill Pence (NASA/GSFC) and Rick White (STScI). FPACK via the
underlying tile compression format provides a tool for properly managing that
noise.
In particular, FPACK supports noise-sensitive scaling
of floating point data to achieve high compression ratios while preserving the
scientific content of data. Similar sigma-scaling benefits have been widely
discussed recently, for example for the JDEM mission (http://arxiv.org/pdf/0910.4571)
and by the Astrometry.net project (http://arxiv.org/pdf/0910.2375). The
remarkable results from the Kepler mission rely on noise-scaled data (http://arxiv.org/pdf/1001.0216,
section 3.2). To this, FPACK adds the beneficial feature of subtractive
dithering (http://www.adass2009.jp/poster/files/PenceWilliam.pdf).
This has been a dense article even with several
features and references omitted (eg., a truly gripping discussion of integrated
FITS checksum support). Even so, I hope I have conveyed some of my
personal excitement over the galvanizing opportunities facing astronomical data
compression. This is a transformative technology that will be key to
meeting the aggressive data handling requirements for near future projects
relying on rapid-readout gigapixel cameras such as the Dark Energy Survey, the
WIYN One Degree Imager, and the Large Synoptic Survey Telescope. The soul
of data compression is not the static storage of data, but rather the dynamic
optimization of throughput throughout Observatory data flow and the community
O/IR System.
Rob Seaman