[OTR-users] Compression

Mon Jul 25 17:29:53 EDT 2005

This is a post I made a few months ago to sci.crypt, it was inspired
by my musings on OTR, but lacking a more solid mathematical analysis
or an implementation I thought such dreaming was less suited for OTR
discussion and more suited for sci.crypt.

However, I guess they were all too busy dealing with factoring trolls
... or perhaps my musings were just too much conjecture and not enough
'show me the code'. :) Or perhaps I hit everyone's spam filters, in
any case there were no follow ups.

With talk of an OTR protocol change, I thought I'd bring it up here...
I don't think it's actionable today, but I think it is a neat idea
that would justify adding some bits into the protocol to allow later
versions of OTR to negotiate capabilities like compression.

-----------

Cryptography often finds use in textual messaging systems, for
example, email encryption and instant messenger communications.  In
this context, given most practical encryption systems, an attacker
will not
know exactly what a given message contains, but he can likely identify
the vast majority of possible decryptions as incorrect because the
vast majority of the messages will be not be remotely plausible (not
valid text in a language shared by the communicators).

Depending on the cipher used, its mode, its keysize, and the amount of
input encrypted with a key, it is quite possible that there is only a
single correct key which produces human readable text. The only
ciphers which I am aware of which provide an assurance of a large
quantity of plausible human text as potential decryptions are ones
where every key defines a unique bijective mapping from the input
space to the output space, and where the keysize is equal to the size
of the input message. We refer to such systems as one time pads, and
they are usually
achieved by simply xoring the input with the key. Although other
schemes are possible, they will all share the requirement of needing
too much key material for most uses.

This presents an annoying problem:  It is can often difficult for to a
sender to deny he sent a specific blob of encrypted output, unless he
has a secure channel (in which case, why does he need encryption?). 
In
some application the encrypted transmission is very public: there is
no doubt who created the message.

An attacker may somehow be able to find a key which decodes the
message by using 'rubberhose' cryptography against the recipient, by
brute-forcing the probable space of user-provided passwords, by bugs
in the software causing poor key choice, or via other means.

With the practical encryption schemes in use today, once an attacker
has obtained a key which decodes the message he is easily able to use
that key to extend whatever confidence there is in that the sender
sent the encrypted message into confidence that the sender sent that
particular plain-text.

Additionally, if a sender is forced to give up his key or face death,
he will be forced to disclose the real key because no other key will
produce a remotely plausible message. Thus, there is a substantial
incentive for an attacker to use that approach.

Below I will propose a system which addresses these issues within a
limited domain.

Let us consider a communication system which exchanges complete
messages of human readable text as encrypted fixed sized messages of N
bits.  Fixed sized messages are used in some private communication
because they are the only proven method of avoiding statistical
analysis on the message size. I am not sure if my idea can be easily
extended to non-fixed sized systems, so I will exclude them for here.
An example system using fixed sized messages is Mixminion
(http://www.mixminion.net/).

If we consider the set of all possible messages of N bits or less
(here after called set M), it follows that we can count each message
with a number representable in N bits or less in size.

We define function T to be a bijective mapping from the set M onto set
M such that messages are now ordered by their probability in the
context of the communication system (for example, if the system is
exchanging english text, "I went to the park." will be strictly before
"wr9ir4Pi;49 9irw9i3i$jRwe").  Because of the size of M it is not
possible to produce an optimal function T, but we can approximate it.
More below.

We define function S to be a surjective mapping on the size in bits of
the message, such that sizes of around perhaps 40 bits have much
shorter, and thus more probable output values and that longer messages
have outputs of substantially lower probability. (Think of a huffman
code where sizes around 40 bits are very likely).

Function E(m,k) encrypts message m with key k. The only requirement is
that the  encryption function does not make it easy to build a
dictionary of S_output,key -> first few bytes of ciphertext. Since S's
output should be fairly small, most block ciphers even in ECB mode
will suffice.

Function R(x) produces N - sizeof(x) random bits where N is the fixed
message size.

We now compute the encryption of message m by computing E_m =
E(S(m)+T(m)+R(T(m)+S(m))). Where + is concatenation.

We see that this encryption process is somewhat expansive (depending
on the size of the S output.) We ensure that no input message can be
longer than N minus the maximum size of S, and shrink T's output
accordingly, to account for  this.

If we decrypt E_m with an incorrect key, the resulting S will likely
indicate a fairly short message. Because of the nature of the T
mapping, the short messages will all be 'high probability' and thus
tend to be plausible english text.

Because of this property, it will be much more difficult to construct
an automated bruteforcing system since the goal will be to look for
the key giving the least probable decoding, and the attacker will be
out of
luck if the actual transmission happened to be a high probability one.

The sender has an increased ability to deny a specific key because
there are so many other keys which also produce english text. It could
be argued that a key is correct because it produced a fairly low
probability message, but the issue is certainly more clouded.

Furthermore, after encrypting a message, a sender could test various
keys for decryption and find some decryption which provides a basic
level of plausibility. This key could be provided to an attacker under
duress.

Obviously any authentication must be on the encrypted output, not on
the plaintext.

The suggestion of 40bits as the tuning size for S was only a guess, S
should be tuned so that it's output is long enough to produces
messages of a believable length, but not so long that random decodings
yield
implausible content. Tending towards shorter bad-decodings also helps
the sender find reasonable cover messages.

Now let us consider the construction of the T function. The T function
is actually an idealized version of what compression software tries to
achieve. So finding T would also find us the perfect compression
algorithm, which is obviously not a trivial task.

However, T does not need to be optimal in order to be usable. Our goal
in T is that every possible input gives a valid decoding, and that
that smaller outputs will tend to map to reasonable english text and
that it
is not too unlikely for short human readable inputs to map to also map
to short outputs. A dictionary based PPM compressor with the right
backend (one which ensures all inputs decode, and that the most
significant bits come first) will be sufficient.

So, is this idea useful or am I missing something obvious?