Homework 7: Steganography

0. Background

A. Goals

The purpose of this assignment is to gain practice with object-oriented programming, bitwise operators, number systems, and cryptography. The specific goals are to:

continue to familiarize yourself with objects in Java
understand the differences between static libraries and non-static objects
implement practical examples of bits and bitwise functions
get a glimpse into the world of modern cryptography

At the end of the assignment, you will be able to encrypt and decrypt text using a pseudo-random number stream, and hide and reveal messages into images.

B. Background

Encryption is the process of encoding messages or information that can only be decoded by people with authorization (usually a passkey or the like). Good encryption algorithms produce output that looks random to a bystander but is easily decipherable with the correct passkey.

In Hail, Caesar!, the key was a single character which shifted all characters in the message up the alphabet. This algorithm was far from perfect, and you demonstrated this by writing a function to crack the cipher without having authorization. This exploited the fact that the outputted cipher wasn’t entirely random – some letters appeared an abnormal amount, and this allowed you to find the offset passkey.

Over the two millennia since Julius Caesar, cryptography has gone a long way. Computers have played an enormous role in developing information communications as well as keeping those communications safe. In this assignment, you will implement and use a linear feedback shift register (LFSR) to create a stream of pseudo-random bits. This will lead to a significantly more random cipher.

Once you have written a crypto library, you will use it for a practical application: steganography -- the practice of hiding secret messages in images in plain view.

C. Understanding the Problem

First you will write an LFSR object. Once seeded, it produces a boundless stream of seemingly random bits (1s and 0s). The key feature is that given the same seed, the LFSR will produce the same stream of bits. This means that if Alice and Bob both know the same seed, they can produce identical random bit streams.

To encrypt the message, each bit in the message will be exclusive ored with the LFSR’s sequence of pseudo-random bits. The resulting cipher will appear to be nonsense, like "Kao y{u(x kp }`1rkz~|g2rj`f@r)", but when it is XORed by the same sequence of bits, the encrypted message can be decrypted into the original. Since an LFSR’s bit stream is completely determined by its given parameters, Alice can produce the same bit stream that Bob used to encrypt a cipher if she knows the seed, and so can decrypt the message through the same process. For this reason, an LFSR encryption scheme is symmetric key encryption technique (the alternative is an asymmetric scheme, in which different passwords are used for encrypting and decrypting).

D. Designing the Requirements and Interface

You will be writing four classes. Their APIs are listed below, but each will be elaborated on in their section.

public class LFSR 
-----------------
public        LFSR(String seed, int tapPosition);
public String toString()
public int    getTapPosition()
public int    nextBit()

public class Codec
------------------
public static int[]  encode(String str)
public static String decode(int[] bits)
public static void  encrypt(int[] message, String seed, int tapPosition)
public static void  decrypt(int[] cipher,  String seed, int tapPosition)

public class RetrieveMessage
----------------------------
public static void main(String[] args)

public class HideMessage
----------------------------
public static void main(String[] args)

In any class, you may add any functions you’d like, but they must be private. No additional public functions may be added.

1. Linear Feedback Shift Register

A. Understanding the Problem

In the first part of this assignment, you will be creating a linear feedback shift register (LFSR). An LFSR is a structure that can produce a stream of pseudo-random bits, which has many practical uses, particularly in cryptography.

The LFSR consists of a register of bits and a tap position. The register is simply a list of bits that has a fixed size (which should suggest to you a good data structure to implement it). The tap position is simply an index in the register that will be used to create the pseudo-random bits. When we create an LFSR, we must seed it by providing the initial values in the register.

There are two steps in producing a pseudo-random bit with an LFSR:

Shift all bits by one place towards the most significant bit (towards the left in the diagram).
Replace the least significant bit (the rightmost bit) with the exclusive or of the most significant bit that was shifted off and the bit previously in the tap position.

The new least significant bit will be the pseudo-random bit produced by the LFSR.

Consider an example. The following figure shows an LFSR seeded with the initial seed 01101000010 and tap position 8 during the process of producing one pseudo-random bit. Keep in mind that the tap position is counted from the rightmost (least significant) bit! This is opposite from the way positions are counted in arrays and strings, but is consistent with the way bit/digit positions are counted in numbers.

B. Shift Register Implementation

Before you begin writing your LFSR.java class, you will want to decide how you want to represent the shift register within your class. We are leaving this decision up to you, and any working implementation will be accepted. Here are our suggestions:

an int[] containing only 1s and 0s;
a char[] containing only '1's and '0's;
a boolean[] containing only true and false;
a single int, where each bit represents a position in the shift register. This is a more challenging implementation, but will give you a chance to play around with and understand bits and binary better. In this case, your shift register will be limited to 32 bits, and you will need to use the bitwise operators & (bitwise AND), | (bitwise OR), << (shift left), and >> (shift right).

C. Constructor and Methods

Your LFSR class will implement the API described above. This section will provide the details for the constructor and methods:

public LFSR(String seed, int tapPosition) takes a String parameter seed whose characters are a sequence of 0s and 1s, and an int parameter tapPosition specifying which position in the register to use as the tap. The constructor should throw a RuntimeException with a useful error message if seed contains any characters other than 0 or 1, if tapPosition refers to an impossible position in the register (for example, -1), or if seed is null.

Note: remember the String.charAt() method, which will be helpful when parsing through seed.

public String toString() returns the current bit sequence in the shift register as a String of 1s and 0s. For example,

LFSR lfsr = new LFSR(“110011", 3);
System.out.println(lfsr.toString());

Should print 110011. This method will help you when debugging.

public int getTapPosition() returns the tap position, as given by the constructor.

public int nextBit() performs one step of the LFSR, as described in section 1A and returns the least significant bit (the rightmost bit) in the shift register after the step has been performed as an int with the value 0 or 1.

For testing, you should ensure you can run

LFSR lfsr = new LFSR("01101000010", 8);
for (int i = 0; i < 10; i++) {
    int bit = lfsr.nextBit();
    System.out.println(lfsr.toString() + " " + bit);
}

should print

11010000101 1
10100001011 1
01000010110 0
10000101100 0
00001011001 1
00010110010 0
00101100100 0
01011001001 1
10110010010 0
01100100100 0

Before moving on, you should have tested all methods in LFSR.java and be confident that it is correctly implemented.

2. Codec Library

A. ASCII Conversion

Before we can begin encrypting and decrypting using our LFSR, we need to implement two functions to encode and decode our messages into ASCII. This process is very similar to converting into the symbol arrays in Hail, Caesar! In that assignment, you were encrypting character by character, so you converted CONSUL into [ 2, 14, 13, 18, 20, 11 ]. In this assignment, you will be encrypting bit by bit, so instead you will need to convert CONSUL into 100001110011111001110101001110101011001100.

Recall the ASCII table from Hail, Caesar! ASCII maps all English letters, digits, and punctuation to numbers between 0 and 127. For example, A is encoded as 65. If you’d like a refresher, read back over homework 3’s section 1A.

B. Encode and Decode

For your Codec.java library, begin by writing two functions:

public static int[] encode(String str) takes in a String and will return an int[] where each element is a single bit in the ASCII encoding of the String. Each character in str will encode into 7 bits, so the output array should be 7 times larger than str.length().

For example, encode(“C") should output the array

[ 1, 0, 0, 0, 0, 1, 1 ]

Note: if str is null, encode() should return null

Hint: once you have the int ASCII representation of a char, you will need to find the 7 bits representing that number. You can do this using standard arithmetic operations or bit-level operators. You’ll need to think about how to do this exactly, but here are some hints:

You will need to perform two actions in order to encode,
First, determining the least significant (rightmost) bit. For example, 0000011 has a 1 as the least significant bit. Consider what kind of numbers have 1 and what kind of numbers have 0.
Second, shifting all the bits to the right. Consider the relationship between 0011000 and 0001100 (begin by figuring out what these are in decimal!).

You should test encode() and be confident before you move on to the next function.

public static String decode(int[] bits) will accept an array of int representing a message in ASCII, and will return the decoded String. This is simply a reversal of the encode() process. Remember, every 7 bits corresponds to one char.

Once you have written encode() and decode(), you should be able to encode any String into binary and decode it back into the original string.

C. Encrypt and Decrypt

Your next task is to write functions to encrypt and decrypt messages using a sequence of pseudo-random bits generated by the LFSR.

In Hail, Caesar!, you encrypted each character by shifting it up the alphabet by a certain offset. This time, you will be encrypting bit by bit, and instead of shifting, you will be using exclusive or. For instance, consider the letter “C" with binary representation 1000011. Say we construct an LFSR with 01101000010 as the seed and 8 as the tap. This will produce a bit stream of 1100100100. Encrypting the message “C" with this LFSR would be done by:

message 1000011
random  1100100
cipher  0100111

Note that the ith bit in the cipher is the XOR of the ith bit in the message and the ith bit in the random bit stream.

Write public static void encrypt(int[] message, String seed, int tapPosition) to perform this encryption. message is the message in binary to encrypt, and seed and tapPosition are the parameters for the LFSR to be used when encrypting. You should not be returning anything. Instead, you should be changing the message array in place.

Once you have tested and are confident in encrypt() you should write public static void decrypt(int[] cipher, String seed, int tapPosition). Similarly to the Caesar Cipher, decrypt() should only be one line.

Hint: Exclusive or is symmetric. In practice, this means that if c == r ^ m then m == r ^ c.

3. Image Steganography

A. Understanding the Problem

Now that you have written a more modern encryption library, you will use it for a practical application: steganography. Image steganography is the science of hiding secret messages inside of images. Think of it as 21st century disappearing ink. The casual observer simply sees an ordinary image; only someone who knows how to look for it will notice or find the message.

Image steganography has many practical uses, particularly for digital watermarking, where a message is hidden so that the image source can be tracked or verified. The FBI has even alleged that Russian intelligence services have used steganography to communicate with agents abroad.

You will implement a simple, but very effective form of image steganography. The idea is to fiddle with each pixel's color in a way that isn't perceivable to the human eye, but that the computer can interpret. Since the human eye is least sensitive to blue wavelengths, we'll slightly adjust the amount of blue in each pixel in a way that encodes a single bit of information.

As far as the computer is concerned, an image is just a 2-D array of pixels. The color of each pixel is an integer between 0 and 16.8 million (2²⁴ - 1 to be precise), with 8 bits dedicated to each of the red, green, and blue primaries.

Flipping the ones bit of this number (i.e. subtracting 1 from an odd number or adding 1 to an even number) changes the amount of blue in the color by a minute amount that is indistinguishable to the human eye. These two blues are 001001011000010110101110 and 001001011000010110101111:

Using your Codec.java library, you will be encoding a message into binary, encrypting the message using an LFSR, and hiding the message in the least significant bits of an image.

B. `ImageData.java`

We have provided you with the ImageData.java library for reading and writing images as 2D integer arrays. ImageData.java provides three functions:

 public static int[][] load(String filename)  // load image filename and return it as a 2-D array of integers
public static void    show(int[][] img)      // display img in a window
public static void    save(int[][] img, String filename) // save img to filename

Each element in the 2D integer array corresponds to one pixel in the image. Refer to section 3uA to see how each pixel is represented as an int.

C. Retrieve Message

First, write RetrieveMessage.java that should accept three command-line arguments:

the name of the image file (jpg or png format),
the seed for the LFSR as a string of 0s and 1s
the tap position for the LFSR

RetrieveMessage.java should load the specified image using ImageData.load(), extract the embedded cipher, and print out the decrypted message.

Read the bits from left to right, top to bottom, meaning start with the upper-left pixel, and work across rows. You should be extracting one bit per pixel (the least significant bit). If the number of pixel in the image is not a multiple of 7, ignore the extra pixels at the end (i.e. for a 5x10 pixel image, extract from the first 49 pixels and ignore the last).

Once you have extracted and decrypted the message, if there is a character whose numeric code is 0, only print out the characters up to, but not including that character. The 0 character is called the NULL character or ASCII NULL, and can be written in Java as ’\0’. Do not confuse the NULL character with the character ’0’, which corresponds to the ASCII code 48 and represents the digit zero. You may decrypt characters one by one until you reach your NULL character, or you may decode and decrypt all possible characters, then search for a NULL in the result. Using String.indexOf() and String.substring() will be useful if you choose the second route.

Notes:

If no seed or tapPosition command line arguments are given (there is only one arguments), retrieve the message and print without decrypting.
If no NULL character is found, simply print the entire retrieved message.

In order to test your RetrieveMessage.java, we have provided stegosaur_embedded.png which has a hidden message which is not encrypted. We also have stegosaur_encrypted.png which contains the same hidden message, but this time encrypted with the seed 01101000010 and tap 8.

You will be able to test more after completing the next part.

D. Hide Message

Next, you should write HideMessage.java that should accept four command-line arguments:

the name of the image file (jpg or png format),
the name of the text file containing your message,
the seed for the LFSR as a string of 0s and 1s
the tap position for the LFSR

HideMessage should load the specified image using ImageData.load(), encrypt the message using your Codec.java library, embed the encrypted cipher into the image, and display the image result using ImageData.show().

Before the message is encrypted, it should be NULL terminated. This simply means adding the ASCII NULL character to the end, which can be done by adding "\0" to the end of the message String.

Notes:

If no seed or tapPosition command line arguments are given (there are only two arguments), simply embed the message without encrypting.
If the message is too long for the image, you can simply print an error message and abort.

Once you are displaying embedded messages, you should be able to save the image and retrieve the message using RetrieveMessage.java.

4. Readme and Submission

A. README

Complete readme_steg.txt in the same way you have done for previous assignments.

B. Course Survey

If you have not already done so, complete the Mid-semester Feedback Form, which helps us make improvements to the course.

C. Submission

Submit LFSR.java, Codec.java, RetrieveMessage.java, HideMessage.java, and readme_steg.txt on the course website.

Before submission remove any print statements that were used for debugging or testing your functions.

Be sure that every method has an appropriate header comment, and that your code is well-documented.

Homework 7: Steganography

0. Background

A. Goals

B. Background

C. Understanding the Problem

D. Designing the Requirements and Interface

1. Linear Feedback Shift Register

A. Understanding the Problem

B. Shift Register Implementation

C. Constructor and Methods

2. Codec Library

A. ASCII Conversion

B. Encode and Decode

C. Encrypt and Decrypt

3. Image Steganography

A. Understanding the Problem

B. ImageData.java

C. Retrieve Message

D. Hide Message

4. Readme and Submission

A. README

B. Course Survey

C. Submission

B. `ImageData.java`