Cryptoanalysis & Decypher of Vigenere code
1. Cypher principle
The Vigenere code is a simple alphabet substitution code.
It uses a matrix that contains 26 different lines.
Each
line of the matrix is a complete alphabet but the
alphabet is shifted right by one each level so that at level
2 (the
second line) the alphabet was shifted twice.
0 - A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1 - B C
D E F G H I J K L M N O P Q R S T U V W X Y Z A
2 - C D E F G H I J K L M N O P Q R S T U V W X Y Z A B
3
- D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
4 - E F G H I J K L M N O P Q R S T U V W X Y Z A B C
D
5 - F G H I J K L M N O P Q R S T U V W X Y Z A B C D E
6 - G H I J K L M N O P Q R S T U
V W X Y Z A B C D E F
7 - H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
8 - I J K L M
N O P Q R S T U V W X Y Z A B C D E F G H
9 - J K L M N O P Q R S T U V W X Y Z A B C D E F G H I
10
- K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
11 - L M N O P Q R S T U V W X Y Z A B C D E F G H I J K
12
- M N O P Q R S T U V W X Y Z A B C D E F G H I J K L
13 - N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
14
- O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
15 - P Q R S T U V W X Y Z A B C D E F G H I J K L M N O
16
- Q R S T U V W X Y Z A B C D E F G H I J K L M N O P
17 - R S T U V W X Y Z A B C D E F G H I J K L M N O P Q
18
- S T U V W X Y Z A B C D E F G H I J K L M N O P Q R
19 - T U V W X Y Z A B C D E F G H I J K L M N O P Q R S
20
- U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
21 - V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
22
- W X Y Z A B C D E F G H I J K L M N O P Q R S T U V
23 - X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
24
- Y Z A B C D E F G H I J K L M N O P Q R S T U V W X
25 - Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
It uses a key of n integers, each of these integer having a
value between 1 and 26. there is no constraint on the length
of the key. Each integer of the key being standing for one
line of the matrix.
Let's take an example of the encypher principle:
Le'ts choose the following key : {12 20 3}
and the text to encypher
: "HELLO"
Let's consider all the characters of the original text one by one :
character
1: H
character 2: E
character 3: L
character 4: L
character 5: O
Now as the key's length is 3, we will have to map the key onto the string to
encypher as follows :
01201
HELLO
Which means that the character 'H' is assigned the first integer of the key.
the 'E' is assigned the second integer
of the key, and so on.
and if we consider that in order we have :
key (0) = 12
key (1) = 20
key (2) = 3
So the actual association becomes :
12 20 3 12 20
H E L L O
So now every character of the string to encypher was assigned an integer using
the chosen key, we said previously that
these integers were actually line numbers
of the matrix, so now all we have to do to encypher is to find the crossing point
between
the line associated with a given character, and this character.
Let's follow up with our example :
character 1: 'H',
line associated with it : 12 => 'H' encyphered into 'T'
character 2: 'E', line associated with it : 20 => 'E' encyphered
into 'Y'
character 3: 'L', line associated with it : 3 => 'L' encyphered into 'O'
character 4: 'L', line associated
with it : 12 => 'L' encyphered into 'X'
character 5: 'O', line associated with it : 20 => 'O' encyphered into 'I'
To
find the encyphered value, consiser the integer associated with the character
to encypher, this integer is the line of
the matrix you must use.
Then look at the top line of the matrix where you can see the character to
encypher. From
this column go down until you reach the line previously found.
The character located at the intersection between this line
and this column is
the encyphered character.
So the text "HELLO" is encyphered as "TYOXI" with the key {12
20 3}
2. Decypher principle
To decypher the text "TYOXI" using the secret key {12 20 3} you must do as
follows :
The key's length is 3, so we will do as when we encyphered: we will associate a part of
the key with each
character of the text to encypher :
0 1 2 0 1
T Y O X I
and if we consider that in order we have :
key (0) = 12
key (1) = 20
key (2) = 3
So the actual association becomes :
12 20 3 12 20
T Y O X I
So
now every character of the string to decypher was assigned an integer using
the chosen key, we said previously that these
integers were actually line numbers
of the matrix. The only thing to do is to consider for each line of the matrix
associated
with the encyphered character which character of the top first line
of the matrix gives this encyphered character: for
example for the first character:
'T', consider the line 20 of the matrix: follow this line until you find a 'T', now
follow
upward the column containing this 'T' until you reach the top of the matrix.
The character at the top line of the matrix
in this column is the decyphered character.
So in our example :
following line 12, the column that gives a 'T' has a 'H' as top first character.
following line
20, the column that gives a 'Y' has a 'E' as top first character.
following line 3, the column that gives a 'O' has
a 'L' as top first character.
following line 12, the column that gives a 'X' has a 'L' as top first character.
following
line 20, the column that gives a 'I' has a 'O' as top first character.
3. Cryptoanalysis & automatic decypher
First of all let's consider the conditions of the analysis :
- we have an encyphered text
- we know it is either
a french or an english text
- we know it was encyphered using the vigenere code.
- we know neither the key length, nor
the key value.
The goal is to decypher any text found in these conditions.
But to achieve the goal we must first develop
a technique that
will be first validated by hand before being automated using a
C++ program.
To develop a valid cryptoanalysis we must first create an example:
we must choose a text, a key, and we must encrypt
this text:
For example the text to encypher will be :
ALORS QUE LE NOMBRE DE RESEAUX NE CESSE D AUGMENTER LA PROTECTION
DES DONNEES NUMERIQUES ET LA SECURISATION
DES COMMUNICATIONS DEVIENNENT
DES ENJEUX POUR TOUTES LES ORGANISATIONS CET OUVRAGE VOUS EXPLIQUE LES
CONCEPTS ET LES
TECHNIQUES DE LA CRYPTOGRAPHIE EN REPRESENTANT VISUELLEMENT
LES NOTIONS LES PLUS DIFFICILES ET EN VOUS FOURNISSANT DES
INFORMATIONS HISTORIQUES
INSTRUCTIVES DES EXPLICATIONS FONCTIONNELLES SUR LA TERMINOLOGIE ET DE NOMBREUX
EXEMPLES.
VOUS
VOUS FAMILIARISEREZ AVEC LES COMPOSANTS CRYPTOGRAPHQIUES ELEMENTAIRES,
PUIS VOUS DECOUVRIREZ LES SYSTEMES CRYPTOGRAPHIQUES
EXISTANTS CERTAINES
ATTAQUES POSSIBLES SUR CES SYSTEMES ET LES MOYENS DE PROTEGER VOS CLES.
Using the key {22 11 17} for example, this French encyphered text is :
WWFND HQP CA YFIMIA OV NPJALLT YV YPJOP U WFXIPEPPI HL GNZKANKEZE
ZPJ ZZEJPVO YLIPIEBLAD VP WR OPTQCZOLKEZE
ZPJ YZDIFEENRPTFJD UAGZAYEAYK
ZPJ AYAAFO LZLN EFQEVO WVO ZICLEEDRPTFJD TAE FQGIWRV RZLO POLWZMFV HPJ
YZEYPGPD VP WVO
EVYSEEBLAD UA WR YCPLEFCCRLSZA PE NPGNPJAYKWYK RTJQPCHPDAYK
HPJ JZKEZEO WVO ACQD UEQWENZHPJ AE VJ GFQD WKFIJTJOLEP OVO
TEBZIILKEZEO SZOEFNTHQPJ
EYJPCLYEZRPJ ZPJ AIGHTTWEZKYJ BZEYEZKYEAWCAD JQC CW EVNXZJZCKRZA PK ZP EKXSNPLT
POAXGHPJ.
RZLO
GFQD WWXZHTRNTJACVV LMAN CAD TKXGKDRJEJ YCPLEFCCRLSHEFVO PCAXVJERECVO,
ALED MKFJ ZPTKFMNTIAK CAD JUDKAXVO NIUAKKRIWAYEBLAD
VTTJPLEPD TACKWTEAD
RPERMFVO AFODZXWVO DLN NVO DPOEVIPJ AE CAD DKJVJD UA AIKEVCPI RZJ YWVO.
The first idea is to have a look at the encyphered text for a while.
You can observe that some part of the encyphered
text are present several
times. For example some repetitions have been embraced with '[' and ']' :
WWFND HQP CA YFIMIA OV NPJALLT YV YPJOP U WFXIPEPPI HL GNZKANKEZE
[ZPJ]
ZZEJPVO YLIPIEBLAD VP WR OPTQCZOLKEZE ZPJ YZDIFEENRPTFJD UAGZAYEAYK
[ZPJ] AYAAFO LZLN EFQEVO
WVO ZICLEEDRPTFJD TAE FQGIWRV RZLO POLWZMFV [HPJ]
YZEYPGPD VP [WVO]
EVYSEEBLAD UA WR YCPLEFCCRLSZA PE NPGNPJAYKWYK RTJQPCHPDAYK
[HPJ] JZKEZEO [WVO]
ACQD UEQWENZHPJ [AE] VJ [GFQD] WKFIJTJOLEP OVO TEBZIILKEZEO SZOEFNTHQPJ
EYJPCLYEZRPJ
[ZPJ] AIGHTTWEZKYJ BZEYEZKYEAWCAD JQC CW EVNXZJZCKRZA PK ZP EKXSNPLT
POAXGHPJ.
RZLO [GFQD] WWXZHTRNTJACVV LMAN [CAD] TKXGKDRJEJ YCPLEFCCRLSHEFVO PCAXVJERECVO,
ALED
MKFJ ZPTKFMNTIAK [CAD] JUDKAXVO NIUAKKRIWAYEBLAD VTTJPLEPD TACKWTEAD
RPERMFVO AFODZXWVO DLN
NVO DPOEVIPJ [AE] CAD DKJVJD UA AIKEVCPI RZJ YWVO.
Why on earth do we have these patterns appearing repeatedly ?
Just because in any language (here in French) we have
several words
that repeat very often, and as the key has a finite length, the probability
to have the same word being
encrypted with the same lines of the matrix is high if
the text is long enough.
Now the idea is the following one : we will consider only the words of 2 and 3 letters
to
decypher the text. Why ? because the words of 2 characters have the highest
probability of appearance, then in
order the words of 3 letters have also a high probability.
In a language there are a lot of words of 2 and 3 letters, we
won't consider them all, instead
we will consider only of 2 letters, and 1 of 3 letters: those whose appearance probability
is
highest :
in French we will consider "LE" and "DES"
in
English we will consider "IS" and "AND"
This first step will help us guess the length of the key.
Now that we have chosen these words, we will first consider all digrams
(words of two letters, a trigram being a word
of three letters) of the
encyphered text, if one or more of them match with each other
(for example we have "AE" repeated
several times)
then we will measure the distance between these words. Each distance will
be collected in a set of distances.
For example the encyphered text in our example is composed of :
=== 13 digrams ===
CA
OV
YV
HL
VP
WR
UA
PE
AE
VJ
CW
PK
ZP
===
11 trigrams ===
HQP
ZPJ
WVO
TAE
HPJ
OVO
JQC
CAD
DLN
NVO
RZJ
Notice that not all these digrams and trigrams have several occurences
in the encyphered text. If we list all the possible
distances that
separate the words that appear several times in the encyphered text,
then we obtain the following list
of distances :
{ 36, 27, 111, 51, 126, 69, 66, 210, 231, 123, 60, 285, 78, 366 }
Now to determine the key length (that we are supposed to ignore of course)
we must calculate the biggest common divisor
of all these figures (PGCD)
taking them two by two :
pgcd (36, 27) = 9
pgcd (27, 111)
= 3
pgcd (111, 51) = 3
pgcd (51, 126) = 3
pgcd
(126, 69) = 3
pgcd (69, 66) = 3
pgcd (66, 210) = 6
pgcd
(210, 231) = 21
pgcd (231, 123) = 3
pgcd (123, 60) = 3
pgcd
(60, 285) = 15
pgcd (285, 78) = 3
pgcd (78, 366) = 6
Now to determine the key's length we just choose the smallest value that is
common to all these number : we find 3.
Great ! we have guessed the length of the key using simple probability
conjecture !
Now we must find the key's value using the key's length.
For that we will first numerate the encyphered text using the
index of each integer
of the key, for example if we consider the first line of the encyphered text :
01201 201 20 120120 12 0120120 12 01201 2 012012012 01 2012012012
WWFND
HQP CA YFIMIA OV NPJALLT YV YPJOP U WFXIPEPPI HL GNZKANKEZE
You can notice that we use the fact that the key's length is 3 to
assign either 0,1 or 2 to all characters of the encyphered
text.
Of course we must do it for the entire encyphered text (I show only
one line for clarity purpose).
So now what
have we got ? an encyphered text with each character being
assigned an integer being the position of an integer in the
key.
The next step will consist in grouping all digrams (words of 2 letters)
with their assigned integer sequence and group
them by integer sequence:
For the digrams :
sequence 01: { HL, AE, ZP }
sequence 12: {
OV, YV, WR, PE PK }
sequence 20: { CA, VP, UA, VJ CW }
The
first sequence encountered of the digrams is 01, all digrams that
are associated with the 01 sequence are HL, AE and ZP.
Now to use this result, we will need to remember that in any given
language some words have a very high probability
of appearance :
in French we will consider "LE" and "DES"
in
English we will consider "IS" and "AND"
Let's call them most probable
digram and trigram.
So to be able to find the key's value we will make the following
assumption : "all the digrams found have a high probability
to be
the most probable digram, and thus we will assume each of them IS
the most probable digram". That is to say we
assume that:
HL, when decyphered becomes : ET
AE, when decyphered becomes : ET
ZP, when decyphered becomes : ET
OV,
when decyphered becomes : ET
YV, when decyphered becomes : ET
WR, when decyphered becomes : ET
PE, when decyphered
becomes : ET
PK, when decyphered becomes : ET
CA, when decyphered becomes : ET
VP, when decyphered becomes : ET
UA,
when decyphered becomes : ET
VJ, when decyphered becomes : ET
CW, when decyphered becomes : ET
You will tell me that it's crazy because they just can't be
all the most probable digram ! You are right, but the probablily
and
the set theory will help us to overcome this problem.
You must first remember the sequences :
sequence 01: { HL, AE, ZP }
sequence
12: { OV, YV, WR, PE PK }
sequence 20: { CA, VP,
UA, VJ CW }
for each sequence we can now find the value of the key
in the hypothesis that the digram is actually the most probable
digram:
For example for the first digram: HL, it's associated sequence is 01
and
it's associated decyphered value is : ET
So knowing the cyphered value, the
decyphered value, we can easily
determine the line which was used to encypher this character:
using the matrix, if you
want to encrypt 'L' into a 'H'
you
must use the line 22.
As the sequence associated with HL
is 01, then the integer of the key
associated with H
is 0, thus we can now associate 22
as a possible value
for the 0th integer of the key.
By performing the same
technique repeatedly, we obtain :
If you want to encrypt 'L' into a 'H' you must use the line 22
If you want to encrypt 'L' into a 'A' you must use the
line 15
If you want to encrypt 'L' into a 'Z' you must use the line 18
associate the possible values to key 0: 22 15 18
If you want to encrypt 'E' into a 'L' you must use the line 7
If you want to encrypt 'E' into a
'E' you must use the line 0
If you want to encrypt 'E' into a 'P' you must use the line 11
associate the possible values
to key
1:
7 0 11
If you want to encrypt 'L' into a 'O' you must use the line 3
If you want to encrypt 'L' into a 'Y' you must use
the line 13
If you want to encrypt 'L' into a 'W' you must use the line 11
If you want to encrypt 'L' into a 'P' you
must use the line 4
If you want to encrypt 'L' into a 'P' you must use the line 4
associate the possible values to key
1: 3 13 11 4 4
If you want to encrypt 'E' into a 'V' you must use the line 17
If you want to encrypt 'E' into a 'V' you must use
the line 17
If you want to encrypt 'E' into a 'R' you must use the line 13
If you want to encrypt 'E' into a 'E' you
must use the line 0
If you want to encrypt 'E' into a 'K' you must use the line 6
associate the possible values to key
2: 17 17 13 0 6
If you want to encrypt 'L' into a 'C' you must use the line 17
If you want to encrypt 'L' into a 'V' you must use
the line 10
If you want to encrypt 'L' into a 'U' you must use the line 9
If you want to encrypt 'L' into a 'V' you
must use the line 10
If you want to encrypt 'L' into a 'C' you must use the line 17
associate the possible values to
key 2: 17 10 9 10 17
If you want to encrypt 'E' into a 'A' you must use the line 22
If you want to encrypt 'E' into a 'P' you must use
the line 11
If you want to encrypt 'E' into a 'A' you must use the line 22
If you want to encrypt 'E' into a 'J' you
must use the line 5
If you want to encrypt 'E' into a 'W' you must use the line 18
associate the possible values to
key 0: 22 11 22 5 18
So what have we got now ?
the key length, the probable values for each integer that compose the
key. Now you see
that we created one or more set of possible values for
each keys. So we will reduce the possible values using the intersection
operator
of the set theory:
intersection of {22 11 22 5 18 } with {22
15 18} is {22 18}
intersection of {3
13 11 4 4 } with {7 0 11 } is {11
}
intersection of {17 10 9 10 17 } with {17
17 13 0 6 } is {17 }
So that possible values for the key are
{22 11 17 } {18 11 17}
We have now 2 possible solutions for the key, how to find the
good one ? Simple ! you remember the trigrams
? we will use the
most probable trigram :
in French it is "DES"
in
English it is "AND"
considering each possible key one by one, we will try to decypher
each trigram of the encyphered text using this key,
and the key that
generates the biggest number of appearance of the most probable trigram
after decyphering is the key
we are looking for :
using the possible key : {22 11 17 }
trigram converted using solution {22
11 17 } was : HQP => became: QUE
trigram converted using solution {22 11 17 } was : ZPJ => became: DES <--
trigram
converted using solution {22 11 17 } was : WVO => became: LES
trigram converted using solution {22 11 17 } was : TAE
=> became: CET
trigram converted using solution {22 11 17 } was : HPJ => became: LES
trigram converted using solution
{22 11 17 } was : OVO => became: DES <--
trigram converted using solution {22 11 17 } was : JQC => became: SUR
trigram
converted using solution {22 11 17 } was : CAD => became: LES
trigram converted using solution {22 11 17 } was : DLN
=> became: SUR
trigram converted using solution {22 11 17 } was : NVO => became: CES
trigram converted using solution
{22 11 17 } was : RZJ => became: VOS
key 1 has 2 matching
using the possible key : {18 11 17 }
trigram converted using solution
{18 11 17 } was : HQP => became: AUE
trigram converted using solution {18 11 17 } was : ZPJ => became: DWS
trigram
converted using solution {18 11 17 } was : WVO => became: LRS
trigram converted using solution {18 11 17 } was : TAE
=> became: SYT
trigram converted using solution {18 11 17 } was : HPJ => became: GGS
trigram converted using solution
{18 11 17 } was : OVO => became: KPS
trigram converted using solution {18 11 17 } was : JQC => became: SOR
trigram
converted using solution {18 11 17 } was : CAD => became: PPS
trigram converted using solution {18 11 17 } was : DLN
=> became: YUR
trigram converted using solution {18 11 17 } was : NVO => became: AWS
trigram converted using solution
{18 11 17 } was : RZJ => became: VTS
key 2 has 0 matching
You see that the key 1 : {22 11 17} generates 2 occurences
of the
most probable trigram, whereas the key 2 generates 0 occurences of this
most probable trigram,
thus the key to find is {22 11 17}.
Congratulations ! The job is done ! Why ? simply because we have been able to generate the secret key using only
the encyphered text, now the last step would be to use this secret key to decypher the cyphered text, but as it is easy I
let you do it for fun using the key, the cyphered text, and the matrix.
Epilogue:
- we have first found all the digrams that appear repeatedly in the
encypher text.
- we have measured the
distances that separate all the same digrams.
- we have computed the smallest divisor of these distances using the
Euclyde algorithm (PGCD) to find the key's length.
- using the key's length we have associated an integer index with each
character of the encyphered text.
- we have associated the most probable digram as the probable value for
all
encyphered digrams.
- having each encyphered digrams, and their decyphered probable value
we have deduced the
line of the matrix that produced this encypher for
each character. All these lines have been grouped into a set
by integer
index. Each integer being an integer position in the key.
- all the sets generated for a given index
were intersected to find only the
common values between several sets.
- the remaining premitted to create the
possible key solutions using the cross
product operator.
- now the cross product created one or more possible
solutions.
- we used the most probable trigram to guess the good key using each
key to decypher all trigrams
of the cyphered text. The key that produces
the biggest number of the most probable trigram in the decyphered text
is the key we are looking for.
3. C++ program to cypher / decypher any French/English text
///////////////////////////////////////////////////
// vigenere.cpp
// encypher /
decypher automatically any
// French or English text.
// (The text must use only upper case characters)
///////////////////////////////////////////////////
#include <iostream>
#include <cstdio>
#include
<cstdlib>
#include <vector>
#include <string>
#include <map>
#include <stack>
#include
<algorithm>
#include <sys/time.h>
#include <unistd.h>
using namespace std;
enum possible_language_enum {
ENGLISH_LANGUAGE=2344,
FRENCH_LANGUAGE,
};
void generate_key (int _min_length, int _max_length, vector<int>& _v_key) {
_v_key.clear ();
int key_length = (_min_length + rand ()) %
_max_length;
cerr << "key length: " << key_length << " -> ";
int chosen_index;
for
(int i=0 ; i<key_length ; i++) {
chosen_index = rand() % 26;
_v_key.push_back (chosen_index);
cerr
<< chosen_index << " ";
}
cerr << endl;
}
void prepare_matrice (vector<vector<int> >& _matrice) {
_matrice.clear ();
vector<int> v;
for
(int i=0 ; i<26 ; i++) {
for (int k=0 ; k<26 ; k++) {
v.push_back ((k + i) % 26);
}
_matrice.push_back
(v);
v.clear ();
}
char result;
for (int i=0 ; i<26 ; i++) {
cerr << i << " - ";
for
(int k=0 ; k<26 ; k++) {
result = 'A' + _matrice[i][k];
cerr << result
<< " ";
}
cerr << endl;
}
}
int encypher (int _c, int _index, vector<int>& _v_key, const vector<vector<int> >& _matrice)
{
if ((_c < 'A') || (_c > 'Z')) {
cerr << "ERROR: bad character: " << (char) _c
<< endl;
}
int his_index = _c - 'A';
cerr << "c=" << (char) _c << ", corrected
to index = " << his_index << endl;
int key_size = _v_key.size();
int line = _v_key[_index %
key_size];
int cyphered_result = 'A' + _matrice[line][his_index];
cerr << "line=" << line <<
", his_index="<< his_index << ", " << (char) _c << " => " << (char) cyphered_result <<
endl;
return (cyphered_result);
}
void ancilate_one_text (map<string, int>::iterator& iter, map<string, int>& m, vector<int>&
v_delta, string& current_word, int& current_offset, int& current_word_size, vector<string>& _v) {
iter = m.find (current_word);
if (iter == m.end()) {
_v.push_back (current_word);
m[current_word]
= current_offset;
cerr << "ngram : " << current_word << " @ offset " << current_offset
<< endl;
}
else {
cerr << "delta = " << current_offset - iter->second
<< endl;
v_delta.push_back (current_offset - iter->second);
iter->second = current_offset;
cerr
<< "[PRESENT] ngram : " << current_word << " @ offset " << current_offset << endl;
}
}
void numerate_ngrams (string& _full_text, int _key_length, vector<vector<int> >& _vv_digram, vector<vector<int>
>& _vv_trigram, vector<vector<int> >& _vv_other) {
_vv_digram.clear ();
_vv_trigram.clear
();
_vv_other.clear ();
int size = _full_text.size ();
int current_key = 0;
string current_word;
map<string, bool> m;
map<string, bool>::iterator iter;
vector<int> v;
int
his_size;
for (int i=0 ; i<size ; i++) {
if ((_full_text[i] < 'A') || (_full_text[i]
> 'Z')) {
his_size = v.size();
if (his_size > 0) {
iter
= m.find (current_word);
if (iter == m.end()) {
m[current_word]
= true;
if (his_size == 2) {
_vv_digram.push_back
(v);
}
else if (his_size == 3) {
_vv_trigram.push_back
(v);
}
else {
_vv_other.push_back
(v);
}
}
v.clear ();
current_word
= "";
}
}
else {
current_word += _full_text[i];
v.push_back
(current_key);
current_key++;
if (current_key >= _key_length) {
current_key
= 0;
}
}
}
// ancilate last word
his_size = v.size();
if (his_size > 0) {
iter = m.find
(current_word);
if (iter == m.end()) {
if (his_size == 2) {
_vv_digram.push_back
(v);
}
else if (his_size == 3) {
_vv_trigram.push_back
(v);
}
else {
_vv_other.push_back (v);
}
}
}
//
display
/*
int nb_v = vv_numerate.size();
int current_v = 0;
int v_size;
int
current_local_v = 0;
int nb2display;
for (int i=0 ; i<size ; i++) {
if ((_full_text[i]
< 'A') || (_full_text[i] > 'Z')) {
cerr << " ";
}
else {
if
(current_v < nb_v) {
if (current_local_v >= vv_numerate[current_v].size()) {
current_local_v
= 0;
current_v++;
}
if (current_v
< nb_v) {
cerr << vv_numerate[current_v][current_local_v];
current_local_v++;
}
}
}
}
cerr
<< endl;*/
}
void ancilate_full_text (FILE * _fp, string& _full_text, vector<string>& _v_digrames, vector<string>&
_v_trigrames, vector<string>& _v_others, vector<int>& _v_delta) {
_full_text = "";
_v_digrames.clear
();
_v_trigrames.clear ();
_v_others.clear ();
string current_word = "";
int current_word_size;
map<string, int> m_digrams;
map<string, int> m_trigrams;
map<string, int> m_others;
map<string,
int>::iterator iter;
int ic = fgetc (_fp);
int current_offset = 0;
while (ic != EOF) {
if ((ic
>= 'A') && (ic <= 'Z')) {
current_word += ic;
}
else {
current_word_size
= current_word.size();
if (current_word_size > 0) {
if (current_word_size
== 2) {
ancilate_one_text (iter, m_digrams, _v_delta, current_word, current_offset, current_word_size,
_v_digrames);
}
else if (current_word_size == 3) {
ancilate_one_text
(iter, m_trigrams, _v_delta, current_word, current_offset, current_word_size, _v_trigrames);
}
else
{
ancilate_one_text (iter, m_others, _v_delta, current_word, current_offset, current_word_size,
_v_others);
}
current_word = "";
}
}
_full_text
+= ic;
if ((ic >= 'A') && (ic <= 'Z')) {
current_offset++;
}
ic
= fgetc (_fp);
}
current_word_size = current_word.size();
if (current_word_size > 0) {
if
(current_word_size == 2) {
ancilate_one_text (iter, m_digrams, _v_delta, current_word, current_offset,
current_word_size, _v_digrames);
}
else
if (current_word_size == 3) {
ancilate_one_text (iter, m_trigrams, _v_delta, current_word, current_offset,
current_word_size, _v_trigrames);
}
else
{
ancilate_one_text (iter, m_others, _v_delta, current_word, current_offset, current_word_size, _v_others);
}
current_word
= "";
}
int nb_delta = _v_delta.size ();
cerr << "=== number of delta found : " << nb_delta <<
" ===" << endl;
for (int i=0 ; i<nb_delta ; i++) {
cerr << _v_delta[i] << endl;
}
}
void display_ancilate_result (const vector<string>& _v_digrames, const vector<string>& _v_trigrames,
const vector<string>& _v_others, string _full_text) {
int size = _v_digrames.size ();
cerr <<
"=== "<< size << " digrams ===" << endl;
for (int i=0 ; i<size ; i++) {
cerr
<< _v_digrames[i] << endl;
}
size = _v_trigrames.size ();
cerr
<< "=== "<< size << " trigrams ===" << endl;
for (int
i=0 ; i<size ; i++) {
cerr
<< _v_trigrames[i] << endl;
}
size = _v_others.size ();
cerr
<< "=== "<< size << " others ===" << endl;
for (int
i=0 ; i<size ; i++) {
cerr
<< _v_others[i] << endl;
}
cerr << "full text:" << endl << _full_text << endl;
}
int compute_pgcd (int _a, int _b) {
cerr << "pgcd (" << _a << ", " << _b <<
") = ";
int r;
while (_b > 0) {
r = _a % _b;
_a = _b;
_b
= r;
}
cerr << _a << endl;
return _a;
}
int compute_pgcd (const vector<int>& _v_delta) {
map<int, int> m_occurence;
map<int,
int>::iterator iter;
int pgcd = -1;
int his_pgcd;
int nb_elem = _v_delta.size ();
for (int i=0 ; i<nb_elem
; i++) {
if ((i+1) < nb_elem) {
his_pgcd = compute_pgcd (_v_delta[i], _v_delta[i+1]);
iter
= m_occurence.find (his_pgcd);
if (iter == m_occurence.end()) {
m_occurence[his_pgcd]
= 1;
}
else {
(iter->second)++;
}
}
}
int max_occurence = -1;
int max_occurence_value = -1;
for (iter=m_occurence.begin() ; iter != m_occurence.end()
; ++iter) {
if (iter->second > max_occurence) {
max_occurence = iter->second;
max_occurence_value =
iter->first;
}
}
pgcd = max_occurence_value;
return (pgcd);
}
int lookup_line (int _crypted, int _decrypted, vector<vector<int> >& _matrice) {
int line = -1;
int corrected_decrypted = _decrypted - 'A';
int corrected_crypted = _crypted
- 'A';
for (int i=0 ; i<26 ; i++) {
if (_matrice[i][corrected_decrypted] == corrected_crypted) {
line = i;
break;
}
}
cerr << "If you want to encrypt '" << (char) _decrypted <<
"' into a '" << (char) _crypted << "' you must use the line " << line << endl;
return (line);
}
int compute_key_length (vector<int> _v_delta) {
return (compute_pgcd (_v_delta));
}
string vector_2_string (const vector<int>& _v) {
string s;
int nb_elem = _v.size ();
char
tmp[32];
for (int i=0 ; i<nb_elem ; i++) {
snprintf (tmp, 32, "%d", _v[i]);
tmp[31]
= '\0';
s += tmp;
}
cerr << "vector_2_string -> " << s << endl;
return
(s);
}
void compute_intersection (vector<int> _v_challenger, vector<int>& _v_intersection) {
vector<int>
vtmp;
int nb_elem = _v_challenger.size ();
vector<int>::iterator iter;
// display
cerr
<< "intersection of {";
for (int i=0 ; i<nb_elem ; i++) {
cerr << _v_challenger[i]
<< " ";
}
cerr << "} with {";
nb_elem = _v_intersection.size ();
for
(int i=0 ; i<nb_elem ; i++) {
cerr << _v_intersection[i] << " ";
}
cerr <<
"} is {";
nb_elem = _v_challenger.size ();
for (int i=0 ; i<nb_elem ; i++) {
iter = find (_v_intersection.begin(),
_v_intersection.end(), _v_challenger[i]);
if (iter != _v_intersection.end()) {
iter
= find (vtmp.begin(), vtmp.end(), _v_challenger[i]);
if (iter == vtmp.end())
{
vtmp.push_back (_v_challenger[i]);
}
}
}
_v_intersection
= vtmp;
nb_elem = _v_intersection.size ();
for (int i=0 ; i<nb_elem ; i++) {
cerr
<< _v_intersection[i] << " ";
}
cerr << "}\n";
}
void generate_solution (vector<int> _current_path, int _index, vector<vector<int> >& _v_total, map<int,
vector<int> >& _mv_remaining, int _key_length) {
int nb_elem = _mv_remaining.size ();
if
(_index >= nb_elem) {
_v_total.push_back (_current_path);
}
else {
map<int,
vector<int> >::iterator iter = _mv_remaining.begin ();
for (int i=0 ; i<_index ; i++) {
++iter;
}
nb_elem
= (iter->second).size();
vector<int> tmp_path;
for (int i=0 ; i<nb_elem ; i++) {
tmp_path
= _current_path;
tmp_path.push_back ((iter->second)[i]);
generate_solution (tmp_path,
_index+1, _v_total, _mv_remaining, _key_length);
}
}
}
int lookup_original (int _crypted, int _line, vector<vector<int> >& _matrice) {
int uncrypted
= -1;
int converted_crypted = _crypted - 'A';
for (int i=0 ; i<26 ; i++) {
if (_matrice[_line][i] == converted_crypted) {
uncrypted
= 'A' + i;
break;
}
}
return (uncrypted);
}
void convert_one_trigram (int _key_length, vector<int>& current_solution, string& _trigram, vector<int>&
_numerate_trigram, string& _reference_trigram, vector<vector<int> >& _matrice, string& _result) {
_result
= "";
int his_line;
int original;
for (int i=0 ; i<3 ; i++) {
his_line = current_solution[_numerate_trigram[i]];
original
= lookup_original (_trigram[i], his_line, _matrice);
_result += original;
}
cerr
<< "trigram converted using solution {";
for (int i=0 ; i<current_solution.size() ; i++) {
cerr
<< current_solution[i] << " ";
}
cerr << "} was : " << _trigram << "
=> became: " << _result << endl;
}
bool use_trigrams (int _key_length, vector<vector<int> >& _matrice, vector<string>& _v_trigrames,
vector<vector<int> >& _vv_numerate_trigram, int _language, map<int, vector<int> >& _mv_remaining)
{
string reference_trigram;
if (_language == ENGLISH_LANGUAGE) {
reference_trigram
= "AND";
}
else if (_language == FRENCH_LANGUAGE) {
reference_trigram = "DES";
}
//
generate the possible solutions
vector<int> current_path;
vector<vector<int> > v_total;
generate_solution (current_path, 0, v_total, _mv_remaining, _key_length);
cerr << "== possible solutions : ==" << endl;
int nb_elem = v_total.size();
for (int i=0 ; i<nb_elem ; i++) {
cerr << "{";
for (int
k=0 ; k<_key_length ; k++) {
cerr << v_total[i][k] << " ";
}
cerr << "}\n";
}
nb_elem = v_total.size();
int nb_trigram = _v_trigrames.size ();
string result;
int
nb_match = 0;
vector<int> v_nb_match;
for (int i=0 ; i<nb_elem ; i++) { // try each possible arrangement
nb_match
= 0;
for (int k=0 ; k<nb_trigram ; k++) {
convert_one_trigram (_key_length,
v_total[i], _v_trigrames[k], _vv_numerate_trigram[k], reference_trigram, _matrice, result);
if (result
== reference_trigram) {
nb_match++;
}
}
v_nb_match.push_back
(nb_match);
}
// determine which solution was the best
nb_elem = v_nb_match.size ();
int
max_occurence = -1;
int max_id = -1;
for (int i=0 ; i<nb_elem
; i++) {
cerr << "key " << i << " has " << v_nb_match[i] << " matching" <<
endl;
if (v_nb_match[i] > max_occurence) {
max_occurence = v_nb_match[i];
max_id =
i;
}
}
cerr << "the best key is : {";
for (int i=0 ; i<_key_length;
i++) {
cerr << v_total[max_id][i] << " ";
}
cerr << "}\n";
}
void find_key_value (int _key_length, vector<vector<int> >& _matrice, vector<string>& _v_digrames,
vector<string>& _v_trigrames, vector<vector<int> >& _vv_numerate_digram, vector<vector<int>
>& _vv_numerate_trigram, int _language) {
string reference_digram;
if (_language == ENGLISH_LANGUAGE)
{
reference_digram = "IS";
}
else if (_language == FRENCH_LANGUAGE) {
reference_digram
= "LE";
}
int nb_digram = _v_digrames.size ();
int line;
if (nb_digram != _vv_numerate_digram.size())
{
cerr << "+++ ERROR: find_key_value: nb_digram != _vv_numerate_digram.size()" << endl;
cerr
<< "nb_digram = " << nb_digram << endl;
cerr << "_vv_numerate_digram.size() = " <<
_vv_numerate_digram.size() << endl;
exit (2);
}
map<string, vector<int> > m;
map<string, vector<int> >::iterator iter;
string
s;
for (int i=0 ; i<nb_digram ; i++) {
s = vector_2_string (_vv_numerate_digram[i]);
iter
= m.find (s);
if (iter == m.end()) {
vector<int> vtmp;
vtmp.push_back
(i);
m[s] = vtmp;
}
else {
(iter->second).push_back
(i);
}
}
// display
cerr << "=== associations by family ===" << endl;
int his_index;
for (iter=m.begin() ; iter != m.end() ; ++iter) {
cerr << iter->first
<< " ";
for (int i=0 ; i<(iter->second).size() ; i++) {
his_index =
(iter->second)[i];
cerr << _v_digrames[his_index][0] << _v_digrames[his_index][1] <<
" ";
}
cerr << endl;
}
// prepare the data structure
// that associate each key with it's possible sets
map<int, vector
<vector<int> > > mvv;
map<int, vector <vector<int> > >::iterator iter_mvv;
vector
<vector<int> > vv;
for (int i=0 ; i<_key_length ; i++) {
mvv[i] = vv;
}
int nb_elem;
int his_key;
vector<int> v;
for (iter=m.begin() ; iter != m.end()
; ++iter) {
for (int k=0 ; k<2 ; k++) {
nb_elem = (iter->second).size();
v.clear
();
for (int i=0 ; i<nb_elem ; i++) {
his_index = (iter->second)[i];
line
= lookup_line (_v_digrames[his_index][k], reference_digram[k], _matrice);
v.push_back (line);
}
his_key
= _vv_numerate_digram[his_index][k];
iter_mvv = mvv.find (his_key);
(iter_mvv->second).push_back
(v);
// display
cerr << "associate the possible values to key " << his_key
<< ": ";
for(int i=0 ; i<v.size() ; i++) {
cerr
<< v[i] << " ";
}
cerr << endl;
}
}
// now reduce the sets using the intersection operator for each key
vector<int> v_intersection;
map<int,
vector<int> > mv_remaining;
for (iter_mvv=mvv.begin() ; iter_mvv != mvv.end() ; ++iter_mvv) { // iterate
through all keys
nb_elem = (iter_mvv->second).size();
if (nb_elem > 0) {
v_intersection
= (iter_mvv->second)[0];
for (int i=1 ; i<nb_elem ; i++) {
compute_intersection
((iter_mvv->second)[i], v_intersection);
}
mv_remaining[iter_mvv->first] =
v_intersection;
}
}
// ensure that no key has an empty set
map<int, vector<int> >::iterator
iter_verif;
bool abort = false;
for (iter_verif=mv_remaining.begin() ; iter_verif
!= mv_remaining.end() ; ++iter_verif) {
if ((iter_verif->second).size()
== 0) {
cerr << "+++ WARNING: key " << iter_verif->first << " has no possible challengers !" << endl;
abort = true;
}
}
if (abort == true) {
cerr << "+++ one or more
keys have no challenger ! Aborting decyphering sequence." << endl;
exit (4);
}
else {
use_trigrams (_key_length, _matrice, _v_trigrames,
_vv_numerate_trigram, _language, mv_remaining);
}
}
void decypher (const char * _filename, vector<vector<int> >& _matrice, int _language) {
vector<string>
v_digrames;
vector<string> v_trigrames;
vector<string> v_others;
string full_text;
vector<int>
v_delta;
vector<vector<int> > vv_numerate_digram;
vector<vector<int> > vv_numerate_trigram;
vector<vector<int>
> vv_numerate_other;
FILE * fp = fopen (_filename, "rb");
if (! fp) {
cerr << "++ ERROR: decypher: can't
open file : " << _filename << endl;
}
else {
ancilate_full_text (fp, full_text,
v_digrames, v_trigrames, v_others, v_delta);
display_ancilate_result (v_digrames, v_trigrames, v_others, full_text);
fclose
(fp);
int key_length = compute_key_length (v_delta);
cerr << "key_length = " <<
key_length << endl;
numerate_ngrams (full_text, key_length, vv_numerate_digram, vv_numerate_trigram,
vv_numerate_other);
cerr << "after numeration: " << vv_numerate_digram.size() << " digrams,
";
cerr << vv_numerate_trigram.size() << " trigrams, "<<
vv_numerate_other.size() << " others" << endl;
find_key_value (key_length, _matrice, v_digrames,
v_trigrames, vv_numerate_digram, vv_numerate_trigram, _language);
}
}
void analyze (const char * _filename) {
vector<int> v;
int ic;
FILE * fp =
fopen (_filename, "rb");
if (fp) {
ic = fgetc (fp);
while (ic != EOF) {
v.push_back
(ic);
ic = fgetc (fp);
}
fclose (fp);
}
int nb_elem = v.size();
int current = 1;
for (int i=0 ; i<nb_elem ; i++) {
if
((v[i] < 'A') || (v[i] > 'Z')) {
cerr << " ";
}
else {
cerr
<< current;
current++;
if (current == 4) {
current
= 1;
}
}
}
cerr << endl;
for (int i=0 ; i<nb_elem
; i++) {
if ((v[i] < 'A') || (v[i] > 'Z')) {
cerr << " ";
}
else
{
cerr << (char) v[i];
}
}
cerr << endl;
}
int main (int argc, char **argv) {
if (argc != 3) {
cerr << "usage: ./a.out <file2encrypt>
{-english | -french}" << endl;
exit (1);
}
int language;
if (strcmp(argv[2], "-english") == 0) {
language = ENGLISH_LANGUAGE;
}
else
if (strcmp(argv[2], "-french") == 0) {
language = FRENCH_LANGUAGE;
}
else {
cerr
<< "usage: ./a.out <file2encrypt> {-english | -french}" << endl;
exit(1);
}
struct timeval ta;
gettimeofday (&ta, NULL);
srand (ta.tv_usec);
int iret
= 0;
vector<vector<int> > matrice;
FILE * fp = fopen (argv[1], "rb");
if (! fp)
{
cerr << "ERROR: can't open file : " << argv[1] << endl;
iret = 2;
}
else
{
string dest_filename = argv[1];
dest_filename += ".encyphered";
FILE * fp_dest
= fopen (dest_filename.c_str(), "wb");
if (! fp_dest) {
cerr << "ERROR: can't open
destination file: " << dest_filename << endl;
iret = 3;
}
else
{
prepare_matrice (matrice);
vector<int> v_key;
generate_key
(5, 15, v_key);
int current_index = 0;
int ic = fgetc (fp);
while
(ic != EOF) {
if ((ic >= 'A') && (ic <= 'Z')) {
fputc
(encypher (ic, current_index, v_key, matrice), fp_dest);
current_index++;
}
else
{
fputc (ic, fp_dest);
}
ic =
fgetc (fp);
}
fclose (fp_dest);
}
fclose (fp);
decypher
(dest_filename.c_str(), matrice, language);
}
return (iret);
}