Reading a GZIP file in Java

Classes are provided in the java.util.zip package to read and write files in GZIP format. The GZIP format, not to be confused with ZIP, is a popular file format used on UNIX systems to compress a single file. The underlying compression mechanism is the DEFLATE algorithm. Otherwise, the file format is relatively trivial, consisting of a header, the compressed data and a trailer that includes a CRC of the decompressed data. Note that, unlike ZIP files, a GZIP file per se has no concept of subfiles or individual "entries" in the archive; it is a single compressed stream of data. (In practice, it is common to GZIP another file which is a concatenation of different subfiles, such as a .tar file.)

To read the decompressed data from a GZIP file, we construct a GZIPInputStream around the corresponding FileInputStream:

InputStream in = new GZIPInputStream(new FileInputStream(f));
// ... read decompressed data from 'in' as usual

(Of course, the data needn't actually be in a file. We could pass in any old InputStream: for example the raw GZIP data could be cached in byte array and read from a ByteArrayInputStream.)

Decompressing to a file

To read the data from the GZIP file and write the decompressed data to another file is fairly trivial. We repeatedly read a block of decompressed data into a buffer before writing the contents of the buffer to file each time:

import java.io.*;
import java.util.zip.*;

public class Gunzipper {
  private InputStream in;

  public Gunzipper(File f) throws IOException {
    this.in = new FileInputStream(f);
  }  
  public void unzip(File fileTo) throws IOException {
    OutputStream out = new FileOutputStream(fileTo);
    try {
      in = new GZIPInputStream(in);
      byte[] buffer = new byte[65536];
      int noRead;
      while ((noRead = in.read(buffer)) != -1) {
        out.write(buffer, 0, noRead);
      }
    } finally {
      try { out.close(); } catch (Exception e) {}
    }
  }
  public void close() {
    try { in.close(); } catch (Exception e) {}
  }
}

Reading from a ZIP file

The GZIP file format is common particularly on UNIX systems. On other systems such as Windows, the ZIP file format is more common. The ZIP format is also used for Java archive (jar) files. On the next page, we look at how to read ZIP files in Java.


If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.