A shift cipher is a primitive version of a substitution cipher. A substitution cipher uses a translation map for characters. Each character in the text gets translated into another character. The substitution could be into letters, or into numbers or symbols. The result may be visually unreadable, but still very easily crackable.

In this lab, you will be using Python to encrypt and decrypt messages using shift and substitution ciphers.

Shift Cipher

Shift ciphers work by using the modulo operator to encrypt and decrypt messages. The shift cipher has a key, which is a character represented by an integer from 0 to 25.

You will write a program that encrypts a plaintext file to a ciphertext file with a shift cipher. You will also write a program to decrypt ciphertext that had been encrypted with a shift cipher.

Part 1: Using a Shift Cipher

  1. Inspect the file plain.txt, which is the plaintext that will be encrypted.
  2. Write a program shift.py, which encrypts the plaintext file using a shift cipher and outputs it to cipher.txt. The user must be able to either input a key, or a key be chosen at random by the script. Begin with the script provided below, but it could be hardened.

Part 2: Breaking a Shift Cipher

  1. Write a program unshift.py which, given an encrypted English text file cipher.txt, outputs a list of 5 candidate encryption keys that could have been used assuming a shift cipher.
    1. Include a function frequency() that calculates the frequency of each character appearing in the ciphertext file saving this frequency in an array of size 26.
    2. Find the keys k1, k2, k3, k4, and k5 such that the distance between the English frequency distribution and the frequency distribution of the cipher text decrypted under ki is smaller than all other possible keys. The k1 key should be the smallest distance and k5 the fifth smallest distance. The distance function is the sum over the absolute values of the difference between the letter frequency for a given ki and the frequencies determined in the English language.

      Letter Frequency
      a 0.084627428
      b 0.016961567
      c 0.036707885
      d 0.035940521
      e 0.114373392
      f 0.021112892
      g 0.022535084
      h 0.041796001
      i 0.076631591
      J 0.002229119
      k 0.006418945
      l 0.044323935
      m 0.029782184
      n 0.070328274
      o 0.073646115
      p 0.024114353
      q 0.003789439
      r 0.065010218
      s 0.064936840
      t 0.090387643
      u 0.029310615
      v 0.010343799
      w 0.014445338
      x 0.003275467
      y 0.015229260
      z 0.001742096
    3. The program should output these keys, the value of the distance function, and the resulting plaintext.
  2. Please write up a paragraph answering the following questions.
    1. Did you have to edit plain.txt before it was input to shift.py?
    2. What hardening did you perform on the shift.py script?
    3. Was your unshift.py program able to find the correct key?

Substitution Cipher

Substitution ciphers are simple encryption methods that replace plaintext characters with ciphertext symbols. Shannon’s theory of secrecy shows that ciphers could be proven insecure given their entropy.

Part 1: Using a Substitution Cipher

  1. Write a program to encrypt plain.txt with a substitution cipher and m-gram of 2.
  2. Use the following in your program:
     import math
     import random
        
     filename = "plain.txt"
     outputfile1 = "cipher.txt"
     outputfile2 = "key_map.txt"
     file2 = open(outputfile1, "w")
     file3 = open(outputfile2, "w")
    
     alphabet2 = ['aa', 'ab', 'ac', 'ad', 'ae', 'af', 'ag', 'ah', 'ai', 'aj', 'ak', 'al', 'am', 'an', 'ao', 'ap', 'aq', 'ar', 'as', 'at', 'au', 'av', 'aw', 'ax', 'ay', 'az', 'ba', 'bb', 'bc', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bk', 'bl', 'bm', 'bn', 'bo', 'bp', 'bq', 'br', 'bs', 'bt', 'bu', 'bv', 'bw', 'bx', 'by', 'bz', 'ca', 'cb', 'cc', 'cd', 'ce', 'cf', 'cg', 'ch', 'ci', 'cj', 'ck', 'cl', 'cm', 'cn', 'co', 'cp', 'cq', 'cr', 'cs', 'ct', 'cu', 'cv', 'cw', 'cx', 'cy', 'cz', 'da', 'db', 'dc', 'dd', 'de', 'df', 'dg', 'dh', 'di', 'dj', 'dk', 'dl', 'dm', 'dn', 'do', 'dp', 'dq', 'dr', 'ds', 'dt', 'du', 'dv', 'dw', 'dx', 'dy', 'dz', 'ea', 'eb', 'ec', 'ed', 'ee', 'ef', 'eg', 'eh', 'ei', 'ej', 'ek', 'el', 'em', 'en', 'eo', 'ep', 'eq', 'er', 'es', 'et', 'eu', 'ev', 'ew', 'ex', 'ey', 'ez', 'fa', 'fb', 'fc', 'fd', 'fe', 'ff', 'fg', 'fh', 'fi', 'fj', 'fk', 'fl', 'fm', 'fn', 'fo', 'fp', 'fq', 'fr', 'fs', 'ft', 'fu', 'fv', 'fw', 'fx', 'fy', 'fz', 'ga', 'gb', 'gc', 'gd', 'ge', 'gf', 'gg', 'gh', 'gi', 'gj', 'gk', 'gl', 'gm', 'gn', 'go', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu', 'gv', 'gw', 'gx', 'gy', 'gz', 'ha', 'hb', 'hc', 'hd', 'he', 'hf', 'hg', 'hh', 'hi', 'hj', 'hk', 'hl', 'hm', 'hn', 'ho', 'hp', 'hq', 'hr', 'hs', 'ht', 'hu', 'hv', 'hw', 'hx', 'hy', 'hz', 'ia', 'ib', 'ic', 'id', 'ie', 'if', 'ig', 'ih', 'ii', 'ij', 'ik', 'il', 'im', 'in', 'io', 'ip', 'iq', 'ir', 'is', 'it', 'iu', 'iv', 'iw', 'ix', 'iy', 'iz', 'ja', 'jb', 'jc', 'jd', 'je', 'jf', 'jg', 'jh', 'ji', 'jj', 'jk', 'jl', 'jm', 'jn', 'jo', 'jp', 'jq', 'jr', 'js', 'jt', 'ju', 'jv', 'jw', 'jx', 'jy', 'jz', 'ka', 'kb', 'kc', 'kd', 'ke', 'kf', 'kg', 'kh', 'ki', 'kj', 'kk', 'kl', 'km', 'kn', 'ko', 'kp', 'kq', 'kr', 'ks', 'kt', 'ku', 'kv', 'kw', 'kx', 'ky', 'kz', 'la', 'lb', 'lc', 'ld', 'le', 'lf', 'lg', 'lh', 'li', 'lj', 'lk', 'll', 'lm', 'ln', 'lo', 'lp', 'lq', 'lr', 'ls', 'lt', 'lu', 'lv', 'lw', 'lx', 'ly', 'lz', 'ma', 'mb', 'mc', 'md', 'me', 'mf', 'mg', 'mh', 'mi', 'mj', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt', 'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nb', 'nc', 'nd', 'ne', 'nf', 'ng', 'nh', 'ni', 'nj', 'nk', 'nl', 'nm', 'nn', 'no', 'np', 'nq', 'nr', 'ns', 'nt', 'nu', 'nv', 'nw', 'nx', 'ny', 'nz', 'oa', 'ob', 'oc', 'od', 'oe', 'of', 'og', 'oh', 'oi', 'oj', 'ok', 'ol', 'om', 'on', 'oo', 'op', 'oq', 'or', 'os', 'ot', 'ou', 'ov', 'ow', 'ox', 'oy', 'oz', 'pa', 'pb', 'pc', 'pd', 'pe', 'pf', 'pg', 'ph', 'pi', 'pj', 'pk', 'pl', 'pm', 'pn', 'po', 'pp', 'pq', 'pr', 'ps', 'pt', 'pu', 'pv', 'pw', 'px', 'py', 'pz', 'qa', 'qb', 'qc', 'qd', 'qe', 'qf', 'qg', 'qh', 'qi', 'qj', 'qk', 'ql', 'qm', 'qn', 'qo', 'qp', 'qq', 'qr', 'qs', 'qt', 'qu', 'qv', 'qw', 'qx', 'qy', 'qz', 'ra', 'rb', 'rc', 'rd', 're', 'rf', 'rg', 'rh', 'ri', 'rj', 'rk', 'rl', 'rm', 'rn', 'ro', 'rp', 'rq', 'rr', 'rs', 'rt', 'ru', 'rv', 'rw', 'rx', 'ry', 'rz', 'sa', 'sb', 'sc', 'sd', 'se', 'sf', 'sg', 'sh', 'si', 'sj', 'sk', 'sl', 'sm', 'sn', 'so', 'sp', 'sq', 'sr', 'ss', 'st', 'su', 'sv', 'sw', 'sx', 'sy', 'sz', 'ta', 'tb', 'tc', 'td', 'te', 'tf', 'tg', 'th', 'ti', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tq', 'tr', 'ts', 'tt', 'tu', 'tv', 'tw', 'tx', 'ty', 'tz', 'ua', 'ub', 'uc', 'ud', 'ue', 'uf', 'ug', 'uh', 'ui', 'uj', 'uk', 'ul', 'um', 'un', 'uo', 'up', 'uq', 'ur', 'us', 'ut', 'uu', 'uv', 'uw', 'ux', 'uy', 'uz', 'va', 'vb', 'vc', 'vd', 've', 'vf', 'vg', 'vh', 'vi', 'vj', 'vk', 'vl', 'vm', 'vn', 'vo', 'vp', 'vq', 'vr', 'vs', 'vt', 'vu', 'vv', 'vw', 'vx', 'vy', 'vz', 'wa', 'wb', 'wc', 'wd', 'we', 'wf', 'wg', 'wh', 'wi', 'wj', 'wk', 'wl', 'wm', 'wn', 'wo', 'wp', 'wq', 'wr', 'ws', 'wt', 'wu', 'wv', 'ww', 'wx', 'wy', 'wz', 'xa', 'xb', 'xc', 'xd', 'xe', 'xf', 'xg', 'xh', 'xi', 'xj', 'xk', 'xl', 'xm', 'xn', 'xo', 'xp', 'xq', 'xr', 'xs', 'xt', 'xu', 'xv', 'xw', 'xx', 'xy', 'xz', 'ya', 'yb', 'yc', 'yd', 'ye', 'yf', 'yg', 'yh', 'yi', 'yj', 'yk', 'yl', 'ym', 'yn', 'yo', 'yp', 'yq', 'yr', 'ys', 'yt', 'yu', 'yv', 'yw', 'yx', 'yy', 'yz', 'za', 'zb', 'zc', 'zd', 'ze', 'zf', 'zg', 'zh', 'zi', 'zj', 'zk', 'zl', 'zm', 'zn', 'zo', 'zp', 'zq', 'zr', 'zs', 'zt', 'zu', 'zv', 'zw', 'zx', 'zy', 'zz']
        
     key2 = random.sample(range(676),676)
     file3.write("Plain Cipher"+'\n')
        
     for j in range(0,676):
         plain = str(alphabet2[j])
         cipher = str(alphabet2[key2[j]])
         file3.write(plain+' 'cipher+'\n')
        
     print (key2)
    
     for length in range(2,3):
         a = {}
         file1 = open(filename, "r")
         count = 0
         string1 = file1.read(length).lower()
         k = key2[alphabet2.index(string1)] - 1
         substring2 = alphabet2[k]
         file2.write(substring2)
         while len(string1) >= length:
             if string1 in a:
                 a[string1][0] += 1
             else:
                 a[string1] = [1,0]
             count += 2
             file1.seek(count,0)
             string1 = file1.read(length).lower()
             if len(string1) >= length:
                 k = key2[alphabet2.index(string1)] - 1
                 substring2 = alphabet2[k]
                 file2.write(substring2)
             else:
                 gap = random.sample(range(676),1)
                 substring2 = alphabet2[gap[0] - 1]
                 file2.write(substring2)
        
     file1.close()
     file2.close()
     file3.close()
    

Part 2: Using Shannon Entropy

  1. Write a program that can calculate the Shannon entropy of the plaintext file and its corresponding ciphertext for a block size of 1, 2, 3 and 4 characters for both the plaintext file and the ciphertext generated from the substitution cipher with m-gram of 2.
  2. Alter the programs to handle m-gram of 1 (single characters).
     import math
        
     filename = "cipher.txt"
     outputfile = "entropy_blocks.txt"
     file2 = open(outputfile, "w")
        
     for length in range(1,5):
         a = {}
         file1 = open(filename,"r")
         count = 0
         counter = 0
         string1 = filel.read(length).lower()
         while len(string1) >= length:
             if stringl in a:
                 a[string1] [0] = a(string1][0] + 1
             else:
                 a[string1] = [1,0]
             count += length
             counter += 1
             file1.seek(count,0)
             string1 = file1.read(length).lower()
    
         entropy = 0.0                         
         for x in a:
             a[x](1] = float(a[x][0])/float(counter)
             entropy -= a(x][1] * math.log(a[x][1],2)
            
         def get_count(entry):
             return entry[1]
                
         # print a. values()
         b = sorted(a.items(), key = get_count, reverse = True)
    
         c = len(b)
    
         d = str(c)
         e = str(entropy)
         f = str(counter)
    
         file2.write(f+' '+d+' '+e+' \n' )
        
     file2.close()
    
  3. Please write up a paragraph answering the following questions.
    1. Was the Shannon entropy program able to find similarities in the entropies of the plaintext and ciphertext for m-gram of 2? What were the differences?
    2. How did decreasing m-gram to 1 change affect this?

More Info