The Book of Gehn

In XOR we trust

March 1, 2018

This is the first set of exercises for the Matasano Challenge (also known as the Cryptopals Challenge)

It starts from the very begin, really easy, but it goes up to more challenging exercises quickly.

– Spoiler Alert! –

Ready? Go!

Warming up

During this challenge I will be using and implementing a set of tools to break crypo: cryptonita

Working with bytes can be a mess so let’s use some nice object that would help us in our journey.

This unlocks the Convert hex to base 64 challenge.

We can use bytestring or just B to convert strings encoded in base 16 or 64 into bytes.

>>> from cryptonita import B         # byexample: +timeout=10

>>> b = B('49276d206b696c6c696e6720796f75'
...       '7220627261696e206c696b65206120'
...       '706f69736f6e6f7573206d757368726f6f6d', encoding=16)

>>> b.encode(64)
b'SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t'

But bytestring is a little more than a decoder: it has a convenient interface to manipulate bytes.

For example, you can perform a xor between two strings in one instruction:

>>> a = B('1c0111001f010100061a024b53535009181c', encoding=16)
>>> b = B('686974207468652062756c6c277320657965', encoding=16)

>>> c = a ^ b
>>> c.encode(16)
b'746865206B696420646F6E277420706C6179'

These last two examples solve the challenges Fixed XOR and Implement repeating-key XOR

Even you can perform the xor of two strings of different lengths: you just say that the shorter string will be repeated to infinitum and everything will work.

>>> plaintext = B("Burning 'em, if you ain't quick and nimble\n"
...               "I go crazy when I hear a cymbal")

>>> key = B("ICE")

>>> c = plaintext ^ key.inf()
>>> print(c.encode(16))
b'0B3637272A2B2E63622C2E69692A23693A2A3C6324202D623D63343C2A26226324272765272A282B2F20430A652E2C652A3124333A653E2B2027630C692B20283165286326302E27282F'

Break 1-byte key XOR

Break it by Brute Force

With a so small key space (1 byte means 256 different keys) we can brute force the decryption of the ciphertext just trying all the possible keys.

If we want to automate the process we will need a scoring function to rank how likely the decrypted text is the real plaintext.

The scoring function will depend of the our knowledge about the real plaintext.

If we assume that the text is written in human ascii we could assign a higher value to the plaintexts that have only printable symbols (letters, numbers, punctuation symbols and whitespaces).

A plain text with a byte 0xf1 is unlikely to be a human ascii text. (Such weird bytes could be part of a human text using another encoding like utf-8)

>>> from cryptonita.scoring import all_ascii_printable         # byexample: +timeout 10

>>> all_ascii_printable(B('hello!'))
1

>>> all_ascii_printable(B('hi\x00!'))
0

Now, the attack

>>> from cryptonita.attacks import brute_force

>>> ciphertext = B('1b37373331363f78151b7f2b783'
...                '431333d78397828372d363c7837'
...                '3e783a393b3736', encoding=16)

>>> keys = brute_force(ciphertext, all_ascii_printable, key_space=1)
>>> len(keys)
21

Not bad, but we are smarter than this.

Frequency attack

Brute forcing is expensive even for a small key space. And it is not very cleaver either as we are not using any information about the plaintext to our favor.

ETAOIN SHRDLU Achievement Unlocked

If we assume that the plaintext is in English, it is likely that one of the most common bytes in the ciphertext is actually one of the most common bytes in English but encrypted.

The xor between them will give us the key or at least we will narrow to a small subset of possible keys.

>>> from cryptonita.attacks import freq_attack

>>> most_common_plain_ngrams = [B(b) for b in 'etaoin shrdlu']
>>> cipher_ngram_top = 1

>>> keys = freq_attack(ciphertext, most_common_plain_ngrams, cipher_ngram_top)

>>> len(keys)
13

We got 13 different possible keys, doing a small brute force we can reduce the set further:

>>> keys = brute_force(ciphertext, score_func=all_ascii_printable,
...                                 key_space=keys)

>>> keys
{'X' -> 1.0000}

Single-byte XOR cipher challenge done.

Finally, the plaintext is

>>> ciphertext ^ B('X').inf()
"Cooking MC's like a pound of bacon"

Related tags: cryptography, matasano, cryptonita, repeating key

In XOR we trust - March 1, 2018 - Martin Di Paola