Data compression in Java

In this tutorial, we'll be looking at how to compress data in Java using the built-in compression library. The standard JDK includes the Deflater class for general-purpose compression. This is an implementation of the DEFLATE algorithm (actually a wrapper around the commonly-used zlib library), which can reduce many types of data to between 20% and 50% of its original size. We'll look at what types of data it favours in a moment.

If you need data compression and you have little development time, then passing your data through Deflater will give some compression for many types of data. On the next page, we'll delve straight in and look at how to use Deflater to compress data in Java "out of the box".

Reading common archive files

Java provides out-of-the-box support for reading GZIP files and ZIP files, which are both based on the DEFLATE algorithm. Unfortunately, for reading tar files in Java, reading encrypted ZIP files, and indeed other types of archive, Java doesn't give out-of-the-box support for these. For these latter types, you can use the Java archive reader Arcmexer, which you can download free of charge from this web site. This allows you to read encrypted ZIPs in Java, along with tar and gzipped tar archives (tarballs).

Advanced uses: understand the compression algorithm

For more advanced cases, it can help to know a little about how the compression algorithm works. We'll take an how the Deflater works in broad terms.

Then we'll also see how to apply this knowledge: if you have little bit more development time available, it may be possible to transform your input data to improve the compression ratio offered by the Deflater class. We look at the case of imroving text compression by using the FILTERED strategy on pre-transformed data.


If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.

Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.