[OTR-dev] html in otr messages

Scott Ellis mail at scottellis.com.au
Thu Mar 22 23:20:06 EDT 2007


First I'll explain how encryption works in Miranda, then I'll address a
couple of your points.

Miranda has a 'message chain'. When a message is received, it gets converted
on the network level by the protocol into a standard internal format. Then
it is passed through the message chain, and back to the protocol before
being displayed to the user. When sending, it also goes through the chain
but in reverse.

Any plugin can register as a 'filter' which means, for the contacts it
specifies, it will receive the messages that go through the message chain
and it can do with them what it wants. There are several predefined types of
filters - encryption, translation, etc - and the type controls the order of
the 'chain' - so that e.g. decryption is done before translation.

Any properly designed mirnada protocol plugin will do any processing of text
within recv'd messages after the chain has been called,  and within sent
message before the chain is called - like AIM does. Protocols can optionally
install their own filters to do processing at other points in the chain.

IMO this is the correct way to do this - it allows for proper interoperation
of different plugins that affect the contents of messages - and it allows
plugins to concentrate on their job, e.g. ecryption, without worrying about
any protocol issues. It allows for different encryption plugins to be used
with a standard interface - the protocol does not need to know anything
about the encryption, and visa versa. But it does mean that encryption
plugins are just that - and not protocols that can specify their own rules
about message content in the plaintext. The protocol has a right to expect
valid protocol messages after decryption. I could just pass the message on
to the user directly without continuing through the message chain, but then
the message would not be processed by other filters registered for that
contact.

Miranda doesn't do tag stripping or parsing, because only AIM supports tags
in messages. Alright, in a way Jabber does too - but the protocol itself,
not the miranda jabber plugin, provides a way to ignore them. Most protocols
do not support html in messages. As is proper, it is the protocol's job to
convert messages between the network encoding and the internal encoding.
Since this is being done already by protocols, and happens automatically in
miranda even for OTR encrypted messages, it really should not be
re-implemented in the OTR plugin.

Why do you allow markup in otr messages? OTR doesn't need it. I'm arguing
that the spec should be changed in this respect - I think OTR messages
should be allowed to contain anything valid on the underlying protocol, and
nothing else. That means OTR can stick to the job of encryption, in any
client.

I'm not arguing because I'm too lazy to make changes to the plugin - in fact
I'm planning a 'html stripper' plugin to get rid of this problem in miranda,
and as you say by pinching the code from AIM it's a very small job - but
that is not directly related to this discussion. I'm arguing because I think
the way the OTR plugin in miranda works now is correct, and OTR should, for
the sake of developers for any client, stick to encryption and keep things
simple, and not try to call itself a 'protocol' and deal with the encoding
issues that entails. Currently, if you look at the code, the miranda OTR
plugin and the gaim one are very similar in functionality - neither does
parsing of HTML tags, or URL encoding. If the spec was changed to remove the
'optional html markup', then all that needs to be done is for gaim otr to do
the encryption/decryption after/before the protocol has made the message
valid according to it's own rules - all protocols in all clients must by
definition contain the code to do that already. If markup is allowed just
because this is difficult to do in gaim, then it's the fault of the way gaim
works, and neither the OTR spec nor any OTR plugins should have to shoulder
the burden of working around that.

On 3/23/07, Ian Goldberg <ian at cypherpunks.ca> wrote:
>
> On Fri, Mar 23, 2007 at 02:55:23AM +1100, Scott Ellis wrote:
> > >From Wintermute on the Miranda forum:
> >
> > "Hm .. ok .. the friend made a statement and asked me to post it for him
> (as
> > he isn't registered here and to lazy to do so):
> >
> > "Quote:
> > I had a lengthy email conversation with Ian Goldberg, one of the authors
> of
> > OTR, libotr and the gaim-otr plugin. If you read the OTR spec carefully,
> you
> > will see that it specifies optional HTML-formating in the plaintext. As
> > OTR-messages are merely encapsulated using Jabber (or other) protocols,
> Ian
> > thinks (as do I after thinking about it) that the OTR specs supersede
> the
> > standards of "lower level" protocols such as XMPP. It is the job of an
> OTR
> > plugin to process the HTML tags in the plaintext, if they are not used
> by
> > the client it's the plugins job to strip them.
> > If you don't see it that way, I suggest you contact the otr-dev
> > mailinglist."
> >
> > I think of it quite the opposite way - OTR simply encapsulates protocol
> > messages
> >
> > OTR would have quite a job to do trying to detect whether the client
> > supports HTML in Miranda - there are so many messaging modules etc. And
> > they're likely to change at any time - it would make maintainence
> difficult.
> > I don't think tracking the client's capabilities is the plugin's job.
>
> You're right; I think that whatever calls the OTR plugin should be able
> to understand what the plugin outputs.  In gaim, there's no problem,
> since gaim understands the format of OTR plaintext messages already.
> Miranda hasn't (yet?) implemented tag-handling in most of its protocol
> plugins, fine.  So the Miranda AIM plugin, for example, has to
> explicitly strip the tags, instead of parsing them.  You could have the
> Miranda OTR plugin optionally do that instead; the protocol plugin could
> pass a parameter that says "strip the HTML tags from the plaintext
> before giving it back to me".  Or just have a function in the OTR plugin
> to do that (steal it from the AIM plugin) that the protocol plugins can
> call.


Miranda won't implement tag handling in most of it's protocols, because the
specs do not allow for them. Yes, it's easy to copy the code from aim - but
I would like to avoid having two copies of the same code in the same
application (unless the user can choose to do so, or choose not to without
losing functionality - e.g. by installing an optional 'html stripper'
plugin) - AIM obviously still needs it for clients without OTR, so it can't
simply be moved. Having OTR-specific flags means the protocols need to know
something about the encryption that they otherwise would not. To keep the
interface standard, other encryption plugins would need to handle such flags
as well. And AIM does parse the tags - it will optionally convert them to
bbcodes. It's not scalable - to keep OTR message sessions otherwise
equivalent to non-otr ones, the OTR plugin would have to to re-implement all
similar functionality from all protocol plugins, if and when that
functionality exists.

> And I think the client should output the same *intended message* whether
> or
> > not OTR is installed - after decription, OTR messages from gaim OTR
> contain
> > formatting html, whereas without OTR gaim outputs no formatting html.
>
> Not true: see my otr-users message.  gaim sends *both* the formatted and
> non-formatted versions in Jabber.



> This
> > means transmitting more information than usual - which, although very
> > unlikely, may not be something that the user wants.
>
> There's no more information in the OTR version than in the original
> Jabber message.  Miranda was just ignoring the more information-rich
> part of the Jabber message.


Yes, I didn't realise this about Jabber. What about other protocols like ICQ
and Yahoo?

> Removing the tags means I would have to reimlement something already
> > implemented by at least the AIM protocol plugin. Processing the codes
> > 'properly' for the client could involve conversion from e.g. html to
> > bbcodes, if the client supports those - which would mean differences in
> the
> > nature of OTR plugins for different clients. Or further, if the OTR spec
> > specifies handling of HTML, then the otr library should be able to
> handle it
> > - but again that's a reimplementation of stuff that the protocol can
> already
> > do.
>
> The OTR library gives you a plaintext message, in utf-8 encoding, that
> is allowed to have HTML tags in it.  If you need something different for
> your application, you'll need to convert it.  For example, if your
> application decides it needs to convert the HTML tags to bbcodes, either
> it, or its OTR plugin, will need to do that; libotr won't do it by
> default.  [Of course, if there's a really common conversion that lots of
> different clients need, we could consider just bundling it with the
> library.]


Yes, with the spec the way it is, that's true. But why is markup
specifically allowed? My main point is that if it were not, then I wouldn't
have to do that conversion. The secondary point here is that the OTR library
should encapsulate the protocol, if it's considered a protocol, in it's
entirety - not everything except one little bit. If the OTR library just
does encryption/decryption, then, IMO, OTR would be better defined as an
encryption layer on top of other protocols.

> Also - and I really don't mean to criticize - but the protocol specs for
> > things like jabber and other open protocols, with their RFC's and
> comittees
> > etc etc, tend to be thought out pretty well. The reasons for disallowing
> > things like 'mixed contect' (from the jabber RFC) are usually pretty
> good.
> > For example, if I want to send the text <font>blue</font> to a friend, I
> can
> > do so over jabber (because it is encoded and decoded by the protocol so
> as
> > to not create 'mixed content') and expect him to get the message as I
> typed
> > it, if the client performs to the spec. With OTR, if it's removing tags
> on
> > clients that do not support markup, how does it tell the difference
> between
> > formatting tags and what the user has typed? At a minimum it would
> somehow
> > have to encode tags in message text and formatting tags differently - or
> you
> > have restricted what messages users can and can't send.
>
> OTR expects your plaintext input to be HTML-encoded.  Which means if
> there's a literal "<" in your plaintext message, you should convert it
> to "<" before giving it to OTR.  That's certainly what Jabber does:
> even without OTR, "<"s in messages get converted to "<" on the wire.
> So if you send "<font>blue</font>" (a message intending to start
> with a literal "<") to your buddy, your end will convert it to
> "<font>blue</font>" (whether or not you're using OTR), then
> if you're using OTR, you'll pass *that* to the OTR plugin, which will
> encrypt it.  Your buddy's client will decrypt it back to
> "<font>blue</font>" and display it to him as
> "<font>blue</font>".


Hm - i didn't see anything in the spec saying the plaintext must be URL
encoded. I assume you mean that html entities typed in by the user are URL
encoded and formatting tags are not? 'Cause that's what gaim otr is sending.

If jabber is doing the URL encoding in gaim with OTR, then it does get two
goes at the message (URL encoding before encryption, and then again to send
the encrypted message), and so it should get two goes at recv'd messages
too, if just for the sake of symetry. Which means it's not difficult to let
protocols do the job of making otr messages conform to the protocol rules
under gaim. But I suspect that in gaim, URL encoding/decoding is actually
done by the UI.

In miranda, this would mean having another block of code to do a job that is
already handled by the jabber protocol plugin. And, for properly designed
miranda protocol plugins that will do such text manipulation at the proper
point in the message chain, for e.g. compatibility with other encryption
plugins, it will mean that OTR messages would be double-URL-encoded - ugh -
or OTR would have to know which protocols already do it, and it would need
this knowledge to be maintained either by code changes or API changes.

Is it very hard to add a hook in gaim, so that OTR can affect messages just
before or just after they are sent/recv'd in the network layer?

Option 1 benefits (obviously my preferred option)  - remove optional html
markup from otr spec and change gaim architecture
-----
looser coupling/no connection between otr spec and client capabilities -
smaller scope of responsibilities for otr plugins - easier to implement in a
wider range of clients
no doubling-up of code
no continued maintenance of miranda otr plugin, possibly other client
plugins
gaim gets the benefit of added functionality that can be used accross a
range of plugins
conformance to miranda's existing encryption plugin standards
no miranda changes required
after the work is done, no negative side-effects

...vs Option 2 benefits - make miranda OTR compliant with current OTR spec:
------
relatively easy to implement so that it works for now, by only changing the
miranda plugin
no gaim changes required

The third option of modifying all miranda's message window plugins (and
extensions such as ieview), the miranda messageing API, and probably all
protocols, to allow for html markup and to do URL encoding/decoding isn't
very practical - the benefit would only be to aim, otr, and jabber at
present.

Sorry for being so long-winded about this. We're obviously both going to be
biased by the environment in which we're working - but shouldn't the end
result, in terms of simplicity, 'purity', maintainability, and user
experience, be more important than the ease of getting there? As it stands
there's no way to implement OTR in Miranda without negative side effects -
but it's possible, with more effort, to make everything compliant with no
(lasting) negative side effects, and even gain a few positive ones.

Am I missing something?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cypherpunks.ca/pipermail/otr-dev/attachments/20070323/cb93b549/attachment.html>


More information about the OTR-dev mailing list