_                           _    _____                      _ _           ____  _             
      | |                         ( )  / ____|                    (_) |         |  _ \| |            
      | | __ _ _ __ ___   ___  ___|/  | (___   ___  ___ _   _ _ __ _| |_ _   _  | |_) | | ___   __ _ 
  _   | |/ _` | '_ ` _ \ / _ \/ __|    \___ \ / _ \/ __| | | | '__| | __| | | | |  _ <| |/ _ \ / _` |
 | |__| | (_| | | | | | |  __/\__ \    ____) |  __/ (__| |_| | |  | | |_| |_| | | |_) | | (_) | (_| |
  \____/ \__,_|_| |_| |_|\___||___/   |_____/ \___|\___|\__,_|_|  |_|\__|\__, | |____/|_|\___/ \__, |
                                                                          __/ |                 __/ |
                                                                         |___/                 |___/ 
    Home    Twitter: @Xtrato    Email: james[at]jamesdotcom.com    PGP Key    GitHub: Xtrato    Youtube: Xtratozorz    RSS Feed

28th June 2012 || Frequency Analysis with Python

Frequency Analysis is commonly used in cryptanalysis on classical ciphers as a step in deducing the plain text from cipher text. It works on the principle that certain letters on average appear more frequently than others. For example the letters “E” and “T” in the English language are most common. This means in monoalphabetic ciphers the most common letter found during frequency analysis is likely to be a common letter in the English language.

Frequency analysis works better the larger the text to be analysed is. This is because the results aren't as easily skewed with less common letters populating a short sentence.

This script allows for analysis of single letters or grouping of letters known as n-grams. This is useful when common English letter pairs like TH and ER can be matched to the corresponding letter pairs in the analysed cipher text.

This is my second python program. You enter some cipher text into the input. You then select what n-gram you want to analyse for and press enter. The program print the n-grams it finds along with the occurrences. The results are printed in order of value.

inputText = str(raw_input("Please enter the cipher text to be analysed:")).replace(" ", "") #Input used to enter the cipher text. replace used to strip whitespace.
ngramDict = {}
highestValue = 0
def ngram(n): #Function used to populate ngramDict with n-grams. The argument is the amount of characters per n-gram.
    count = 0
    for letter in inputText:
        if str(inputText[count : count + n]) in ngramDict: #Check if the current n-gram is in ngramDict
            ngramDict[str(inputText[count : count + n])] = ngramDict[str(inputText[count : count + n])] + 1 #increments its value by 1
        else:
            ngramDict[str(inputText[count : count + n])] = 1 #Adds the n-gram and assigns it the value 1
        count = count + 1
    for bigram in ngramDict.keys(): #Iterates over the Bigram dict and removes any values which are less than the adaquate size (< n argument in function)
        if len(bigram) < n:
            del ngramDict[bigram]
ngram(int(raw_input("Please enter the n-gram value. (eg bigrams = 2 trigrams = 3)")))
ngramList = [ (v,k) for k,v in ngramDict.iteritems() ] #iterates through the ngramDict. Swaps the keys and values and places them in a tuple which is in a list to be sorted.
ngramList.sort(reverse=True) #Sorts the list by the value of the tuple
for v,k in ngramList: #Iterates through the list and prints the ngram along with the amount of occurrences
    print("There are " + str(v) + " " + str(k))