[OTR-users] Stylometry?

Gregory Maxwell gmaxwell at gmail.com
Sat Jan 1 23:30:54 EST 2005


Not directly related to OTR, but is anyone on the OTR list aware of
any papers on techniques to disguise/obscure writing style?

Right now a potential hole in the protection that OTR provides is that
the remote party that you are communicating with could record a log of
the conversation then he could potentially convince a third party that
his log was you by a compelling analysis of writing style, especially
if he could show that he did not have access to the analysis corpus
until some time after having possession of the log. (For example, he
has a timestamp service record the hash of the log, then later a court
order against AOL produces a log of your last 10 years of IM traffic
to use as a basis for analysis)

It might be interesting to provide a plugin that can help you suppress
text which may be strongly indicative of the identity of the sender
(like a spell checker).  Because of the relatively low volume of data
provided by IMs, it's likely that a small amount of style masking will
be sufficient to prevent a convincing argument. Further the mere
availability of style masking and style morphing tools will reduce the
credibility of such arguments.

The intention of such a plugin would not be to convert all of my IM
text into "da im spk of a 13yr old LOL!" but rather help me
reduce/confuse the few minor traits in my communication that separate
me from the large set of people who send well formed english sentences
and thereby make it easier for an attacker with only access to a small
am out of my text (a single conversation) to generate an equal amount
of plausible alternative text.  Such a plugin would also increase the
range of my output, making far more texts plausible.
 
Unfortunately to develop such a plugin, it would be useful (required?)
to have access to large corpuses of IM text and right now the only
groups with access to such data (AOL, etc) have a substantial
interested in preventing privacy improving technology. ..

If stylometry were a mature science it would be interesting for each
user to begin every conversation with a stylometric vector
that,combined with a few messages, would be sufficient for helping an
attacker form additional messages that appeared via stylometric
analysis to be from that user... (i.e. an efficient representation of
the important stylometric factors from a corpus of non-sensitive
messages).   I don't think that we are to that point yet.

A long term approach for defense against stylometric analysis for IM
systems would be to frustrate any attempts to collect a corpus of your
IM writing for analysis.  This defense would likely be infeasable with
email (due to public message boards), but is more realistic for
realtime systems which are often more private.  Pervasive use of OTR
could go a long way to furthering this goal, but it would do much for
those of us  who have already likely had a decade or more of IM
conversation (previous analysis of my own logs suggests that I may
transmit around 50k/messages per year) archived in unknown places
beyond our control.



More information about the OTR-users mailing list