##### Document Text Contents

Page 1

Mathematics SL and HL teacher support material 1

Example 1: Student work

Breaking the Code

“To keep your secret is wisdom; but to expect others to keep it is folly.”

Samuel Johnson

“Secrets are made to be found out with time”

Charles Sanford

Codes have been used by the military to keep secrets from the enemy for thousands

of years. In the information age, when many people use the internet for banking and

shopping, they do not want the information they enter to be available to unauthorised

people so such information is encrypted on secure websites.

This report looks at some of the codes which have been used over time and how

mathematics is used in making and breaking them.

The Caesar Shift Cipher

The first documented use of codes for military purposes was by Julius Caesar (100-

44BC). The type of code he used is commonly known as a Caesar shift. In the

following table the middle row gives plain text and the bottom row gives the

corresponding cipher text for a Caesar shift of 2 places. It is conventional to write

uncoded plain text using lower case letters and coded text using upper case.

Position 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Plain a b c d e f g h i j k l m n o p q r s t u v w x y z

Code C D E F G H I J K L M N O P Q R S T U V W X Y Z A B

This code can be represented using the mapping x → x + 2 where x stands for the

position of the letter in the alphabet. So k, with postion number 10, becomes M with

position number 12. Towards the end of the alphabet, y would go to the letter with

position number 24 + 2 = 26 but 25 is the maximum position number as there are

only 26 letters in the alphabet and we used 0 for a. 26 is subtracted to give 0. This is

called addition modulo 26.

Positions 0 to 25 have been used (rather than 1 to 26) because another way to think

of modular arithmetic is as a remainder from division. 26 is equivalent to 0 modulo 26

because 26 ÷ 26 = 1 rem 0 .

Using this code, the second quote at the start of this report would be encoded as

follows:

secretsaremadetobefoundoutwithtime

UGETGVUCTGOCFGVQDGHQWPFQWVYKVJVKOG

Spaces between words have been removed to make it more difficult to break the

code but if someone found this message and knew that it was a Caesar shift code it

would not take long to break. There are only 25 possible Caesar shifts; 26 if we

include the one that does not move the letters at all but that one is not very secret!

Page 2

Mathematics SL and HL teacher support material 2

Example 1: Student work

It is possible to write out all 25 possibilities using nothing more complicated than pen

and paper and once you have a message that makes sense you know you have

decoded it. Using a computer to do all 25 possible decodings would make the

process of decoding very quick.

Substitution ciphers

In a substitution cipher each letter is replaced by another letter. A Caesar shift is one

type of substitution cipher but the pattern of substitutions in a Caesar shift makes it

easy to break. A general subsitution cipher can be made by rearranging the 26 letters

of the alphabet and writing this random permutation below the ordinary alphabet.

There would be 26! ≈ 4 ×1026 possible codes. It would take a long time to go through

all the possible decodings to find the right one! However, some of the possible

substitution codes are too easy to break; the one where each letter is encoded by the

same letter is no use but nor are those where a large number of letters do not

change.

Permutations where nothing stays in its original place are called derangements. The

number of derangements of 26 letters is the subfactorial of 26, denoted !26. To see

how to work this out, consider a much shorter alphabet of 4 letters. The total number

of permutations is 4! = 24. These are shown below.

abcd abdc acbd acdb adbc adcb

bacd badc bcad bcda bdac bdca

cabd cadb cbad cbda cdab cdba

dabc dacb dbac dbca dcab dcba

The 9 derangements are shown in bold, with a box round them. To calculate the

number of derangements, start with the total number of permutations and take away

the ones which have at least one letter in its original position.

There are 6 arrangements with a in its original position, these are on a grey

background. This is because, with a in its original position, there are 3 other letters

which can be arranged in 3! = 6 ways. Likewise, there are 3! permutations with b in its

original place and the same for c and d. However, subtracting 4 × 3! will be too much

as arrangements with both a and b in their original places have been counted twice.

So 4!− 4 × 3! (1.1) is too small.

If a and b are both in their places, there are 2 letters left which can be arranged in 2!

ways. The same is true for other pairs of letters. There are 4C2 pairs of letters and so

we need to add on 4C2 × 2! to (1.1) to give

4!− 4 × 3!+ 4C2 × 2! (1.2)

However, arrangements which have 3 letters in the same places have been counted

more than once. In fact, it is not possible for 3 letters to be in the same place and the

4th letter to be in a different place so it is only the one arrangement with all four letters

in the same place that has been counted too many times. abcd has been counted 4

times but it should only be counted once.

Page 3

Mathematics SL and HL teacher support material 3

Example 1: Student work

The number of derangements is

4!− 4×3!+4C2 × 2!− 4 +1 (1.3)

This can be written as

4! 4! 4! 4! 1 1 1 1

!4 4! 4! 1

1! 2! 3! 4! 1! 2! 3! 4!

= − + − + = − + − +

(1.4)

In general

1 1 1 1

! ! 1 ....... ( 1)

1! 2! 3! !

nn n

n

= − + − + + −

(1.5)

The Maclaurin series for ex is

2 3

e 1 ......... .........

2! 3! !

r

x x x xx

r

= + + + + + +

so 1

1 1 1 1

1 ...... ( 1) e

1! 2! 3! !

n

n

−− + − + + − ≈

!

!

e

n

n ≈

In fact, rounding

!

e

n

gives !n so the number of derangements of the alphabet is

2626!!26 1.5 10

e

≈ ≈ × . This is still far more possibilities than could be run through in a

reasonable time but substitution codes can be broken quickly and easily.

Breaking a substitution cipher

In English, some letters are used more often than others. The following graph shows

how often the different letters are used.

% frequency in English

14

12

10

8

6

4

2

0

a b c d e f g h i j k l m n o p q r s t u v w x y z

If a substitution code is used then the letter that has been used to encode e will be

the most frequent letter. Using computers, counting frequencies of letters is quick and

accurate so substitution codes are not secure. The earliest known description of

frequency analysis for breaking codes dates from the 9th century and is by the Arabic

Page 4

Mathematics SL and HL teacher support material 4

Example 1: Student work

scholar Al-Kindi. In addition to looking at frequencies of single letters, the frequencies

of pairs of letters can be compared with commonly occuring pairs such as th, er, on

and so on.

The Vigenère Cipher

In the 16th century Blaise de Vigenère built on the work of others to develop a cipher

which was not vulnerable to frequency analysis but which was simple to use. It

makes use of Caesar shifts but does not use the same shift all the time. To use the

cipher, a keyword is needed. The person doing the coding and the person receiving

the message need to know the keyword but that is all they need to remember. By

contrast, for a substitiution cipher which is not a Caesar shift, the whole table for

coding needs to be either remembered or written down.

The method is illustrated using the keyword STELLA. Caesar shift alphabets which

start with S, T, E, L and A will be used.

Plain a b c d e f g h i j k l m n o p q r s t u v w x y z

Code S T U V W X Y Z A B C D E F G H I J K L M N O P Q R

Code T U V W X Y Z A B C D E F G H I J K L M N O P Q R S

Code E F G H I J K L M N O P Q R S T U V W X Y Z A B C D

Code L M N O P Q R S T U V W X Y Z A B C D E F G H I J K

Code A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

To code the initial quote from this report, the keyword is written repeatedly above the

text to be coded:

s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t

t o k e e p y o u r s e c r e t i s w i s d o m b u t t o e x p e c t o t h e r s t o k e e p i t i s f o l l y

The first letter of the message is encoded using the Caesar alphabet which starts

with S so it becomes L, the second letter of the message is encoded using the

alphabet starting with T so it becomes H. The encoded message is as follows:

LHOPPPQHYCDEUKIETSOBWOZMTNXEZEPIINEOLAICDTGDIPAILBWQZLDR

The Vigenère cipher is not susceptible to breaking by frequency analysis as each

letter is coded in different ways. However, it was not widely used because to code

each message manually is time consuming as you have to choose the alphabet to

use and then find the appropriate encoded letter. Mechanising the process would

make it much easier but the use of technology was also what led to the code being

broken around 400 years after it was first developed.

Breaking the Vigenère cipher

Charles Babbage, who developed an early computer, realised that to break the

Vigenère code the first thing to do was to decide the length of the keyword.

Consider the following encoded text:

JERFIWFLXFIWFLXVHXEAMCNWVHXHIWFLX

Repeated sets of letters in the encoded message are shown in red. They are likely to

come from the same letters in the original message being encoded in the same way

Page 5

Mathematics SL and HL teacher support material 5

Example 1: Student work

and, in this case, it is likely that the keyword is 3 letters long because the distance

between repeats is a multiple of 3.

123123123123123123123123123123123

JERFIWFLXFIWFLXVHXEAMCNWVHXHIWFLX

If the code word has 3 letters then all the letters numbered 1 above have been coded

with the same Caesar shift. The same applies for letters numbered 2 and letters

numbered 3. Frequency analysis can now be used separately for each set of letters

with the same number to break the code. There is a useful computerised tool for

breaking Vigenère codes at

http://www.simonsingh.net/The_Black_Chamber/cracking_tool.html

The Enigma Machine

To make adaptations of the Vigenère cipher more secure, a longer keyword needs to

be used and it should not be a real word so that guesswork cannot be used in working

it out. This makes it much harder to use the Vigenère cipher unless the process can

be automated. In 1918, a German inventor, Arthur Scherbius, invented the Enigma

machine. It was further developed and used by the German army during the second

world war.

The Enigma machine had a keyboard. Operators typed in a message and the

machine encoded it. The basic idea is that a rotor is used to move on and encode the

next letter using a different substitution alphabet. With one rotor and 26 positions, this

would give 26 possible initial starting postions and so 26 possible codes. However 3

rotors were used and each was set to a different starting position every day giving

26 × 26 × 26 = 17 576 possible codes.

Operators also chose 3 rotors from 5 possible rotors, ie 5P3 = 60 ways of positioning

the rotors so this now gives 60×17 576 = 1 054 560 possible codes.

In addition to the 3 rotors, pairs of letters could be connected on the keyboard. If a

was connected to b, this would swap the encryption of a with the encryption of b. 10

connections were made, choosing 10 pairs of letters from 26.

The number of ways of choosing the first pair then the second pair and so on is

26 24 22 8

2 2 2 2

10

26! 24! 22! 8!

.... .....

24! 2! 22! 2! 20! 2! 6! 2!

26!

6! 2

× × × × = × × × ×

× × × ×

=

×

C C C C

(1.6)

However, the order of selecting the pairs does not matter so each set of 10 pairs has

been counted too many times; the answer in (1.6) needs to be divided by 10!

14

10

26!

1.5 10

6! 2 10!

≈ ×

× ×

Taken together with the options for the rotors, this gives a total of

1.5 ×1014 ×1 054 560 ≈ 1.59 ×1020 possible codes. With no possibility of using

frequency analysis, the Enigma code was considered to be unbreakable.

Page 6

Mathematics SL and HL teacher support material 6

Example 1: Student work

Breaking the Enigma Code

Operators of Enigma machines were issued with books which gave them the settings

for the day. They were told which letters to connect on the keyboard, which rotors to

use and the initial setting for the rotors. If the code breakers had an Enigma machine

and could find out the settings for the day, they would be able to decipher all the

messages for that day.

Messages would start with the Enigma machine in this position and then the setting of

the rotors for the message would be transmitted; the rotors were then adjusted before

transmitting the rest of the message. This meant that even if the code breakers had

all the messages for the day, each one would use a different code (apart from the first

3 letters of each message). This did not give the code breakers very much to go on.

To break the Enigma code, the code breakers used a copy of an Enigma machine and

constructed machines, called bombes, which could work through a large number of

possible codes. The machines could not work through anything like all possible codes

in one day so further information and deduction was necessary.

The Enigma machine never encoded a letter as itself. This helped with the decoding

of some messages. One day a message came through which had no letter L in it.

The operator had been sending a test message and had just pressed the letter L

every time so it was the only letter that did not feature in the encoded message.

Knowing that messages which were sent at particular times of day were about the

weather enabled the code breakers to guess part of the message. Knowing that a

letter could not be encoded as itself, helped them narrow down the possibilities for

which of the possible codes was being used that day and the bombes could then

search for that day’s settings.

Suppose the word “weather” is known to be part of the plain text for the following

coded message. The word can be tried in different positions, as follows

Coded message: QZWPM ZHVVG YGQOZ UQZX

Positioning plain text: weath er

weat her

wea ther

The third of these possible positions cannot be the place where the word “weather”

was in the text because w cannot code to W and h cannot code to H.

Public key cryptography

All the above systems of encoding rely on the key being kept secret from anyone

unauthorised who might try to decode the message. The key might be the shift used

for a Caesar shift, the keyword for a Vigenère cipher or the initial settings for Enigma

machine but once someone knows it they can decode a message.

In the 1970s, Rivest, Shamir and Adleman came up with a way to encode messages

which did not rely on the key being kept secret. The system they invented is

sometimes known as RSA. The system encodes numbers rather than letters but it is

possible to have a system for making letters or words into numbers. The table below

shows the steps in the process by using a simple example.

Page 7

Mathematics SL and HL teacher support material 7

Example 1: Student work

Step Simple example

Choose 2 prime numbers, p and q. For

security, these need to be large.

p = 2 , q = 11

m = pq m = 22

Work out A = ( p −1)(q −1) and choose a

number E which is smaller than this and

has no factors in common with it.

A = (p −1)(q −1) = 1×10 = 10

E = 3

Find a number D such that DE − 1 is a

multiple of A

3 D −1 must be a muliple of 10

D = 7

To encipher a number M, work out

C = M E (modulo m)

Examples are given below

To decipher a number C, work out

M = C D (modulo m)

E and m need not be kept secret as long as D is kept secret.

Enciphering

M C

1 13 (mod 22) = 1

2 23 (mod 22) = 8 (mod 22) = 8

3 33 (mod 22) = 27 (mod 22) = 5

4 43 (mod 22) = 64 (mod 22) = 20

Remember from the section on Caesar ciphers that to find 64 (mod 22) you can find

the remainder when 64 is divided by 22. However, working with powers of numbers

can lead to calculations with very large numbers. Fortunately mathematics can help.

Suppose you want to work out ab (modulo 15). Suppose a = r (modulo 15) and

b = R (modulo 15) .

So a = 15x + r and b = 15 y + R where x and y are integers.

ab = (15x + r )(15 y + R ) = 152 xy + 15xR + 15r + rR

The remainder when dividing ab by 15 is the same as the remainder when dividing

rR by 15. So to find what a large number is modulo 15, it is easier to write the large

number as a product of numbers and find each of those modulo 15 and then

multiply the answers. This can be summed up as follows:

ab (modulo m) = ((a modulo m )(b modulo m )) modulo m .

So

3 215 (mod 22) (15 15)(mod 22) (225mod 22) 15(mod 22)

(5 15) mod 22 75mod 22 9

= × = ×

= × = =

Page 8

Mathematics SL and HL teacher support material 8

Example 1: Student work

Deciphering

M C

1 17 (mod 22) = 1

8 87 (mod 22) = (82 mod22) × (82 mod22) × (82 mod22) × (8 mod 22) (mod 22)

= ((64 mod 22)3 × 8) mod 22

= (203 × 8) mod 22

= 64 000 mod 22

= 2

5 57 (mod 22) = (52 mod22) × (52 mod22) × (52 mod22) × (5 mod 22) (mod 22)

= ((33 mod 22) × 5) mod 22

= ((27 mod 22) × 5) mod 22

= (5 × 5) mod 22

= 3

20 207 ≡ 202 × 202 × 202 × 20

≡ 4 × 4 × 4 × 20

≡ 64 × 20

≡ 20 × 20

≡ 400 ≡ 4

To keep writing “mod 22” is making this look more complicated than necessary so, for

deciphering 20, a different notation has been used. ≡ means “has the same

remainder when divided by 22”.

The tables above demonstrate that deciphering takes you back to the original number

for the particular examples tested. This will always work. The proof depends on

Euler’s theorem, which states that

M φ(m) ≡ 1mod m

φ(m) is the number of integers smaller than m and having no common factor with with

m. If m = pq where p and q are prime, ϕ(m) = ( p −1)(q −1) and this is what we called

A in the description of the RSA algorithm.

So Euler’s theorem implies that M A ≡ 1mod m

Using ≡ to mean “has the same remainder when divided by m” and remembering the

formula for deciphering is M = C D (modulo m)

C ≡ M E

C D ≡ (M E )D

≡ M ED = M ED−1 × M

ED −1 is a multiple of A so ED −1 = kA where k is an integer.

So

C D ≡ M kA × M = (M A )k × M

≡ 1k × M = M

Page 9

Mathematics SL and HL teacher support material 9

Example 1: Student work

Breaking the RSA code

m = pq is known; to break the code needs the individual primes, p and q to be known.

m = 22 does not give a secure code as it is obvious that the primes were 2 and 11.

The primes need to be much larger for the code to be secure.

655 427 is the product of two primes. To find the primes a spreadsheet could be

used. This is the start of a spreadsheet showing one way to find the primes:

n 655427/n Integer part of 655427/n Factor?

1 655427 655427 yes

2 327713.5 327713 no

3 218475.6667 218475 no

4 163856.75 163856 no

5 131085.4 131085 no

6 109237.8333 109237 no

7 93632.42857 93632 no

Rather than just trying to divide by primes, all integers are tested so 1 does go into

655 427 but this does not help. The prime factors are found further down:

438 1496.408676 1496 no

439 1493 1493 yes

440 1489.606818 1489 no

Constructing the spreadsheet and finding the factors took about 1 minute so 655 427

would not give a secure code.

The factorising can be speeded up by using Fermat’s factorising method. This

depends on the following result.

2 2

2 2

p q p q

m pq

+ −

= = −

Multiplying out the brackets on the right hand side gives

2 2 2 22 2

4 4

p q pq p q pq

pq

+ + + −

− =

If p and q are large prime numbers, they will both be odd so

2

p q+

and

2

p q−

are

whole numbers.

Suppose m = x2 − y2 then x2 − m = y 2. This means x2 ≥ m as square numbers cannot

be negative so the smallest m could be is m .

We are trying to factorise m = 655 427 . 655 427 809.58...= so the smallest possible

value of x is 810.

For each possible value of x find the value of x2 − m . When this is a square number,

y2 , the factors are ( x − y )( x + y ) . Doing the calculations on a spreadsheet gives the

following:

Mathematics SL and HL teacher support material 1

Example 1: Student work

Breaking the Code

“To keep your secret is wisdom; but to expect others to keep it is folly.”

Samuel Johnson

“Secrets are made to be found out with time”

Charles Sanford

Codes have been used by the military to keep secrets from the enemy for thousands

of years. In the information age, when many people use the internet for banking and

shopping, they do not want the information they enter to be available to unauthorised

people so such information is encrypted on secure websites.

This report looks at some of the codes which have been used over time and how

mathematics is used in making and breaking them.

The Caesar Shift Cipher

The first documented use of codes for military purposes was by Julius Caesar (100-

44BC). The type of code he used is commonly known as a Caesar shift. In the

following table the middle row gives plain text and the bottom row gives the

corresponding cipher text for a Caesar shift of 2 places. It is conventional to write

uncoded plain text using lower case letters and coded text using upper case.

Position 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Plain a b c d e f g h i j k l m n o p q r s t u v w x y z

Code C D E F G H I J K L M N O P Q R S T U V W X Y Z A B

This code can be represented using the mapping x → x + 2 where x stands for the

position of the letter in the alphabet. So k, with postion number 10, becomes M with

position number 12. Towards the end of the alphabet, y would go to the letter with

position number 24 + 2 = 26 but 25 is the maximum position number as there are

only 26 letters in the alphabet and we used 0 for a. 26 is subtracted to give 0. This is

called addition modulo 26.

Positions 0 to 25 have been used (rather than 1 to 26) because another way to think

of modular arithmetic is as a remainder from division. 26 is equivalent to 0 modulo 26

because 26 ÷ 26 = 1 rem 0 .

Using this code, the second quote at the start of this report would be encoded as

follows:

secretsaremadetobefoundoutwithtime

UGETGVUCTGOCFGVQDGHQWPFQWVYKVJVKOG

Spaces between words have been removed to make it more difficult to break the

code but if someone found this message and knew that it was a Caesar shift code it

would not take long to break. There are only 25 possible Caesar shifts; 26 if we

include the one that does not move the letters at all but that one is not very secret!

Page 2

Mathematics SL and HL teacher support material 2

Example 1: Student work

It is possible to write out all 25 possibilities using nothing more complicated than pen

and paper and once you have a message that makes sense you know you have

decoded it. Using a computer to do all 25 possible decodings would make the

process of decoding very quick.

Substitution ciphers

In a substitution cipher each letter is replaced by another letter. A Caesar shift is one

type of substitution cipher but the pattern of substitutions in a Caesar shift makes it

easy to break. A general subsitution cipher can be made by rearranging the 26 letters

of the alphabet and writing this random permutation below the ordinary alphabet.

There would be 26! ≈ 4 ×1026 possible codes. It would take a long time to go through

all the possible decodings to find the right one! However, some of the possible

substitution codes are too easy to break; the one where each letter is encoded by the

same letter is no use but nor are those where a large number of letters do not

change.

Permutations where nothing stays in its original place are called derangements. The

number of derangements of 26 letters is the subfactorial of 26, denoted !26. To see

how to work this out, consider a much shorter alphabet of 4 letters. The total number

of permutations is 4! = 24. These are shown below.

abcd abdc acbd acdb adbc adcb

bacd badc bcad bcda bdac bdca

cabd cadb cbad cbda cdab cdba

dabc dacb dbac dbca dcab dcba

The 9 derangements are shown in bold, with a box round them. To calculate the

number of derangements, start with the total number of permutations and take away

the ones which have at least one letter in its original position.

There are 6 arrangements with a in its original position, these are on a grey

background. This is because, with a in its original position, there are 3 other letters

which can be arranged in 3! = 6 ways. Likewise, there are 3! permutations with b in its

original place and the same for c and d. However, subtracting 4 × 3! will be too much

as arrangements with both a and b in their original places have been counted twice.

So 4!− 4 × 3! (1.1) is too small.

If a and b are both in their places, there are 2 letters left which can be arranged in 2!

ways. The same is true for other pairs of letters. There are 4C2 pairs of letters and so

we need to add on 4C2 × 2! to (1.1) to give

4!− 4 × 3!+ 4C2 × 2! (1.2)

However, arrangements which have 3 letters in the same places have been counted

more than once. In fact, it is not possible for 3 letters to be in the same place and the

4th letter to be in a different place so it is only the one arrangement with all four letters

in the same place that has been counted too many times. abcd has been counted 4

times but it should only be counted once.

Page 3

Mathematics SL and HL teacher support material 3

Example 1: Student work

The number of derangements is

4!− 4×3!+4C2 × 2!− 4 +1 (1.3)

This can be written as

4! 4! 4! 4! 1 1 1 1

!4 4! 4! 1

1! 2! 3! 4! 1! 2! 3! 4!

= − + − + = − + − +

(1.4)

In general

1 1 1 1

! ! 1 ....... ( 1)

1! 2! 3! !

nn n

n

= − + − + + −

(1.5)

The Maclaurin series for ex is

2 3

e 1 ......... .........

2! 3! !

r

x x x xx

r

= + + + + + +

so 1

1 1 1 1

1 ...... ( 1) e

1! 2! 3! !

n

n

−− + − + + − ≈

!

!

e

n

n ≈

In fact, rounding

!

e

n

gives !n so the number of derangements of the alphabet is

2626!!26 1.5 10

e

≈ ≈ × . This is still far more possibilities than could be run through in a

reasonable time but substitution codes can be broken quickly and easily.

Breaking a substitution cipher

In English, some letters are used more often than others. The following graph shows

how often the different letters are used.

% frequency in English

14

12

10

8

6

4

2

0

a b c d e f g h i j k l m n o p q r s t u v w x y z

If a substitution code is used then the letter that has been used to encode e will be

the most frequent letter. Using computers, counting frequencies of letters is quick and

accurate so substitution codes are not secure. The earliest known description of

frequency analysis for breaking codes dates from the 9th century and is by the Arabic

Page 4

Mathematics SL and HL teacher support material 4

Example 1: Student work

scholar Al-Kindi. In addition to looking at frequencies of single letters, the frequencies

of pairs of letters can be compared with commonly occuring pairs such as th, er, on

and so on.

The Vigenère Cipher

In the 16th century Blaise de Vigenère built on the work of others to develop a cipher

which was not vulnerable to frequency analysis but which was simple to use. It

makes use of Caesar shifts but does not use the same shift all the time. To use the

cipher, a keyword is needed. The person doing the coding and the person receiving

the message need to know the keyword but that is all they need to remember. By

contrast, for a substitiution cipher which is not a Caesar shift, the whole table for

coding needs to be either remembered or written down.

The method is illustrated using the keyword STELLA. Caesar shift alphabets which

start with S, T, E, L and A will be used.

Plain a b c d e f g h i j k l m n o p q r s t u v w x y z

Code S T U V W X Y Z A B C D E F G H I J K L M N O P Q R

Code T U V W X Y Z A B C D E F G H I J K L M N O P Q R S

Code E F G H I J K L M N O P Q R S T U V W X Y Z A B C D

Code L M N O P Q R S T U V W X Y Z A B C D E F G H I J K

Code A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

To code the initial quote from this report, the keyword is written repeatedly above the

text to be coded:

s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t e l l a s t

t o k e e p y o u r s e c r e t i s w i s d o m b u t t o e x p e c t o t h e r s t o k e e p i t i s f o l l y

The first letter of the message is encoded using the Caesar alphabet which starts

with S so it becomes L, the second letter of the message is encoded using the

alphabet starting with T so it becomes H. The encoded message is as follows:

LHOPPPQHYCDEUKIETSOBWOZMTNXEZEPIINEOLAICDTGDIPAILBWQZLDR

The Vigenère cipher is not susceptible to breaking by frequency analysis as each

letter is coded in different ways. However, it was not widely used because to code

each message manually is time consuming as you have to choose the alphabet to

use and then find the appropriate encoded letter. Mechanising the process would

make it much easier but the use of technology was also what led to the code being

broken around 400 years after it was first developed.

Breaking the Vigenère cipher

Charles Babbage, who developed an early computer, realised that to break the

Vigenère code the first thing to do was to decide the length of the keyword.

Consider the following encoded text:

JERFIWFLXFIWFLXVHXEAMCNWVHXHIWFLX

Repeated sets of letters in the encoded message are shown in red. They are likely to

come from the same letters in the original message being encoded in the same way

Page 5

Mathematics SL and HL teacher support material 5

Example 1: Student work

and, in this case, it is likely that the keyword is 3 letters long because the distance

between repeats is a multiple of 3.

123123123123123123123123123123123

JERFIWFLXFIWFLXVHXEAMCNWVHXHIWFLX

If the code word has 3 letters then all the letters numbered 1 above have been coded

with the same Caesar shift. The same applies for letters numbered 2 and letters

numbered 3. Frequency analysis can now be used separately for each set of letters

with the same number to break the code. There is a useful computerised tool for

breaking Vigenère codes at

http://www.simonsingh.net/The_Black_Chamber/cracking_tool.html

The Enigma Machine

To make adaptations of the Vigenère cipher more secure, a longer keyword needs to

be used and it should not be a real word so that guesswork cannot be used in working

it out. This makes it much harder to use the Vigenère cipher unless the process can

be automated. In 1918, a German inventor, Arthur Scherbius, invented the Enigma

machine. It was further developed and used by the German army during the second

world war.

The Enigma machine had a keyboard. Operators typed in a message and the

machine encoded it. The basic idea is that a rotor is used to move on and encode the

next letter using a different substitution alphabet. With one rotor and 26 positions, this

would give 26 possible initial starting postions and so 26 possible codes. However 3

rotors were used and each was set to a different starting position every day giving

26 × 26 × 26 = 17 576 possible codes.

Operators also chose 3 rotors from 5 possible rotors, ie 5P3 = 60 ways of positioning

the rotors so this now gives 60×17 576 = 1 054 560 possible codes.

In addition to the 3 rotors, pairs of letters could be connected on the keyboard. If a

was connected to b, this would swap the encryption of a with the encryption of b. 10

connections were made, choosing 10 pairs of letters from 26.

The number of ways of choosing the first pair then the second pair and so on is

26 24 22 8

2 2 2 2

10

26! 24! 22! 8!

.... .....

24! 2! 22! 2! 20! 2! 6! 2!

26!

6! 2

× × × × = × × × ×

× × × ×

=

×

C C C C

(1.6)

However, the order of selecting the pairs does not matter so each set of 10 pairs has

been counted too many times; the answer in (1.6) needs to be divided by 10!

14

10

26!

1.5 10

6! 2 10!

≈ ×

× ×

Taken together with the options for the rotors, this gives a total of

1.5 ×1014 ×1 054 560 ≈ 1.59 ×1020 possible codes. With no possibility of using

frequency analysis, the Enigma code was considered to be unbreakable.

Page 6

Mathematics SL and HL teacher support material 6

Example 1: Student work

Breaking the Enigma Code

Operators of Enigma machines were issued with books which gave them the settings

for the day. They were told which letters to connect on the keyboard, which rotors to

use and the initial setting for the rotors. If the code breakers had an Enigma machine

and could find out the settings for the day, they would be able to decipher all the

messages for that day.

Messages would start with the Enigma machine in this position and then the setting of

the rotors for the message would be transmitted; the rotors were then adjusted before

transmitting the rest of the message. This meant that even if the code breakers had

all the messages for the day, each one would use a different code (apart from the first

3 letters of each message). This did not give the code breakers very much to go on.

To break the Enigma code, the code breakers used a copy of an Enigma machine and

constructed machines, called bombes, which could work through a large number of

possible codes. The machines could not work through anything like all possible codes

in one day so further information and deduction was necessary.

The Enigma machine never encoded a letter as itself. This helped with the decoding

of some messages. One day a message came through which had no letter L in it.

The operator had been sending a test message and had just pressed the letter L

every time so it was the only letter that did not feature in the encoded message.

Knowing that messages which were sent at particular times of day were about the

weather enabled the code breakers to guess part of the message. Knowing that a

letter could not be encoded as itself, helped them narrow down the possibilities for

which of the possible codes was being used that day and the bombes could then

search for that day’s settings.

Suppose the word “weather” is known to be part of the plain text for the following

coded message. The word can be tried in different positions, as follows

Coded message: QZWPM ZHVVG YGQOZ UQZX

Positioning plain text: weath er

weat her

wea ther

The third of these possible positions cannot be the place where the word “weather”

was in the text because w cannot code to W and h cannot code to H.

Public key cryptography

All the above systems of encoding rely on the key being kept secret from anyone

unauthorised who might try to decode the message. The key might be the shift used

for a Caesar shift, the keyword for a Vigenère cipher or the initial settings for Enigma

machine but once someone knows it they can decode a message.

In the 1970s, Rivest, Shamir and Adleman came up with a way to encode messages

which did not rely on the key being kept secret. The system they invented is

sometimes known as RSA. The system encodes numbers rather than letters but it is

possible to have a system for making letters or words into numbers. The table below

shows the steps in the process by using a simple example.

Page 7

Mathematics SL and HL teacher support material 7

Example 1: Student work

Step Simple example

Choose 2 prime numbers, p and q. For

security, these need to be large.

p = 2 , q = 11

m = pq m = 22

Work out A = ( p −1)(q −1) and choose a

number E which is smaller than this and

has no factors in common with it.

A = (p −1)(q −1) = 1×10 = 10

E = 3

Find a number D such that DE − 1 is a

multiple of A

3 D −1 must be a muliple of 10

D = 7

To encipher a number M, work out

C = M E (modulo m)

Examples are given below

To decipher a number C, work out

M = C D (modulo m)

E and m need not be kept secret as long as D is kept secret.

Enciphering

M C

1 13 (mod 22) = 1

2 23 (mod 22) = 8 (mod 22) = 8

3 33 (mod 22) = 27 (mod 22) = 5

4 43 (mod 22) = 64 (mod 22) = 20

Remember from the section on Caesar ciphers that to find 64 (mod 22) you can find

the remainder when 64 is divided by 22. However, working with powers of numbers

can lead to calculations with very large numbers. Fortunately mathematics can help.

Suppose you want to work out ab (modulo 15). Suppose a = r (modulo 15) and

b = R (modulo 15) .

So a = 15x + r and b = 15 y + R where x and y are integers.

ab = (15x + r )(15 y + R ) = 152 xy + 15xR + 15r + rR

The remainder when dividing ab by 15 is the same as the remainder when dividing

rR by 15. So to find what a large number is modulo 15, it is easier to write the large

number as a product of numbers and find each of those modulo 15 and then

multiply the answers. This can be summed up as follows:

ab (modulo m) = ((a modulo m )(b modulo m )) modulo m .

So

3 215 (mod 22) (15 15)(mod 22) (225mod 22) 15(mod 22)

(5 15) mod 22 75mod 22 9

= × = ×

= × = =

Page 8

Mathematics SL and HL teacher support material 8

Example 1: Student work

Deciphering

M C

1 17 (mod 22) = 1

8 87 (mod 22) = (82 mod22) × (82 mod22) × (82 mod22) × (8 mod 22) (mod 22)

= ((64 mod 22)3 × 8) mod 22

= (203 × 8) mod 22

= 64 000 mod 22

= 2

5 57 (mod 22) = (52 mod22) × (52 mod22) × (52 mod22) × (5 mod 22) (mod 22)

= ((33 mod 22) × 5) mod 22

= ((27 mod 22) × 5) mod 22

= (5 × 5) mod 22

= 3

20 207 ≡ 202 × 202 × 202 × 20

≡ 4 × 4 × 4 × 20

≡ 64 × 20

≡ 20 × 20

≡ 400 ≡ 4

To keep writing “mod 22” is making this look more complicated than necessary so, for

deciphering 20, a different notation has been used. ≡ means “has the same

remainder when divided by 22”.

The tables above demonstrate that deciphering takes you back to the original number

for the particular examples tested. This will always work. The proof depends on

Euler’s theorem, which states that

M φ(m) ≡ 1mod m

φ(m) is the number of integers smaller than m and having no common factor with with

m. If m = pq where p and q are prime, ϕ(m) = ( p −1)(q −1) and this is what we called

A in the description of the RSA algorithm.

So Euler’s theorem implies that M A ≡ 1mod m

Using ≡ to mean “has the same remainder when divided by m” and remembering the

formula for deciphering is M = C D (modulo m)

C ≡ M E

C D ≡ (M E )D

≡ M ED = M ED−1 × M

ED −1 is a multiple of A so ED −1 = kA where k is an integer.

So

C D ≡ M kA × M = (M A )k × M

≡ 1k × M = M

Page 9

Mathematics SL and HL teacher support material 9

Example 1: Student work

Breaking the RSA code

m = pq is known; to break the code needs the individual primes, p and q to be known.

m = 22 does not give a secure code as it is obvious that the primes were 2 and 11.

The primes need to be much larger for the code to be secure.

655 427 is the product of two primes. To find the primes a spreadsheet could be

used. This is the start of a spreadsheet showing one way to find the primes:

n 655427/n Integer part of 655427/n Factor?

1 655427 655427 yes

2 327713.5 327713 no

3 218475.6667 218475 no

4 163856.75 163856 no

5 131085.4 131085 no

6 109237.8333 109237 no

7 93632.42857 93632 no

Rather than just trying to divide by primes, all integers are tested so 1 does go into

655 427 but this does not help. The prime factors are found further down:

438 1496.408676 1496 no

439 1493 1493 yes

440 1489.606818 1489 no

Constructing the spreadsheet and finding the factors took about 1 minute so 655 427

would not give a secure code.

The factorising can be speeded up by using Fermat’s factorising method. This

depends on the following result.

2 2

2 2

p q p q

m pq

+ −

= = −

Multiplying out the brackets on the right hand side gives

2 2 2 22 2

4 4

p q pq p q pq

pq

+ + + −

− =

If p and q are large prime numbers, they will both be odd so

2

p q+

and

2

p q−

are

whole numbers.

Suppose m = x2 − y2 then x2 − m = y 2. This means x2 ≥ m as square numbers cannot

be negative so the smallest m could be is m .

We are trying to factorise m = 655 427 . 655 427 809.58...= so the smallest possible

value of x is 810.

For each possible value of x find the value of x2 − m . When this is a square number,

y2 , the factors are ( x − y )( x + y ) . Doing the calculations on a spreadsheet gives the

following: