summaryrefslogtreecommitdiff
path: root/25_break_encr/README
blob: fdbc68d780139ecef96f16b037caaacb6b7bb779 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
 In the last problem, you saw an implementation of a simple
"encryption" program.  In this problem, you will write a program
that breaks that encryption---that is, it will take as input
a text file encrypted with the encryption program you just
saw, and prints out the key used to encrypt it!

Breaking the Caesar Cipher uses a technique called "frequency counting."
This technique relies on the fact that the distribution of letters
in the English alphabet is far from uniform: 'e' is by far the most
common letter (~13%), followed by 't' (9%), and 'a' (8%).  Note
that the average frequency is 100/26 ~= 4%. 

This frequency distribution means that if you know (or suspect)
that a file contains English text encrypted with a Caesar Cipher,
you can simply count the frequency of all letters in it, and guess
that the letter which occurs most often is 'e'.   Once you know
which letter is 'e', you can backsolve for the key and decrypt
the file.  Note that in practice this requires a large enough
text that "the law of large numbers" applies---and while it is
not guaranteed to work, it typically does.


Requirements:
=============

 - Your program will take one command line argument: the name
   of a file to read as input
 - Your program will then perform frequency counting on the letters
   in that text file.  Your program should ignore all non-letter
   characters (those un-modified by your problem 1 program).
 - Your program should use the frequency count information to 
   determine which letter is 'e', and solve for the key.
 - On success, your program should print a single line of output to
   stdout, which should contain one decimal (base 10) integer (followed by a
   newline character).  This number should be the encryption key used
   on the text.  It should be in the range [0,26).  That is, the
   number you print should obey 0 <= answer < 26.
 - On failure, your program should print an appropriate error
   message to stderr, then exit with EXIT_FAILURE.
 - Provide a Makefile which compiles your program into
   a binary called "breaker"
Hints:
 - Divide this problem into sub-problems.  You should probably
   write at least two functions (other than main) to do this.
 - Think about how you want to represent your data.  An array
   might be handy somewhere.
 - You may find a function you wrote in a previous classwork 
   useful for this assignment.  Feel free to use or adapt it.
 - Remember that everything (including characters) are numbers.
   You might look back at the "Read encryption" assignment
   to see an example of doing math on characters.