So I recently found myself needing to take a set of files with a matching MD5 digest and determine if the files and the digest match. If you have not used MD5 digests before, here is the
Wikipedia definition describing what they are and how they are used.
Anyway, I had to write an automated process that would look for a set of files and compare those file to the included MD5 digest. I started with the Java MessageDigest, but it just seemed to be more of a pain in the ass than anything. So, I browsed around for an alternative and found the
Commons Codec DigestUtils. It was too easy. The DigestUtils includes the method DigestUtils.md5Hex(byte[] data). It was exactly what I needed. To use it all I needed to do was:
-
Read the file that was suppose to match the MD5 digest.
-
Run it through the String DigestUtils.md5Hex(byte[] data) method.
-
Compare the generated MD5 String to the MD5 String included in the mathcing MD5 digest file.
Here is a simple example of how this works.
First, to make this example work we need a simple file that will be used to generate a digest and a tool to generate the digest itself. I am on a Mac, which includes the md5 application. On Linux there is an equivelent application, md5sum. I don’t have access to a PC, but there is a utility
md5sums that looks easy enough to use.
So, if have the following file:
A simple file to be run through the digester.
and we call this file, test.txt. To generate an MD5 digest from this file, we would do the following:
md5 test.txt > test.md5
This would generate an MD5 digest file that (depending upon you digest tool) would look something like the following:
MD5 (test.txt) = 5c87da642b2286cfb6ac8e0ad0d8e036
Note: This file is what would be included with the test.txt file in something like a zip or a tar.gz. When you received the file containing both of these files, you would uncompress them and perform the next comparison.
The String, 5c87da642b2286cfb6ac8e0ad0d8e036, is the actual digest. The next thing we need to do write a class that will read in the same file and generate an MD5 digest to compare. I have a simple FileUtils class that will do this for you.
package org.gs.util;
import org.apache.commons.codec.digest.DigestUtils;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.FileInputStream;
public class FileUtil {
public static byte[] readFile(File file) throws IOException {
InputStream is = new FileInputStream(file);
byte[] bytes = new byte[(int)file.length()];
int offset = 0;
int numRead = 0;
while (offset < bytes.length &&
(numRead = is.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
is.close();
return bytes;
}
public static boolean isFileValid(String file, String digest) {
try {
byte[] buffer = readFile(new File(file));
String md5sum = DigestUtils.md5Hex(buffer);
return md5sum.equalsIgnoreCase(digest);
}
catch (IOException e) {
throw new RuntimeException("Unable to process file for MD5 comparison.", e);
}
}
}
As you can see, there are only two methods in this class. The first takes a File and returns its contents as a byte array-pretty simple. The second method, isFileValid(), is where the DigestUtils are actually used. It takes the name of the file to be digested and the digest it is being compared to. In this case the values are testData/test.txt and 5c87da642b2286cfb6ac8e0ad0d8e036.
It begins with a call to the previously mentioned readFile() method.
byte[] buffer = readFile(new File(file));
The version of the DigestUtils.md5Hex() method we are using requires the value to be a byte array. After we have the byte array, we invoke the static md5Hex() method. This method returns a hex String representation of the generated MD5 digest. If you step through this code with a debugger, you would see the value:
5c87da642b2286cfb6ac8e0ad0d8e036
returned from the md5Hex() method. The final line, exlcuding the catch block, returns a boolean value indicating whetther the passed in digest is equal to the digest generated when md5Hex() creates a digest from the file.
To test this code, I put a simple unit test together.
package org.gs.util;
import junit.framework.TestCase;
public class FileUtilTests extends TestCase {
public void testIsFileComplete() throws Exception {
String digest = "5c87da642b2286cfb6ac8e0ad0d8e036";
String file = "testData/test.txt";
assertTrue(FileUtil.isFileValid(file, digest));
}
}
This test takes the digest value generated by the utility, md5 or md5sum, and the path to the file used to generate the digest and passes them to the FileUtils.isFileValid() method. In this instance our test will obviously pass with no troubles.
I hope this helps.