[OTR-dev] Fwd: Ensuring transcript soundness in a multiparty chat setting

Tue Dec 10 16:25:26 EST 2013

On 10/12/13 21:13, Nathan Wilcox wrote:
> Either I'm missing some common background, or there's an unaddressed ambiguity in "message ordering" in this thread:
> 
> Chat UIs typically present a linear sequence of messages. This presents a distinct problem if we assume an underlying DAG agreement protocol, because DAGs are not linear. Consider a simple DAG with arrows pointing to predecessor messages:
> 
> B -> A <- C
> 
> There are two ways to "squash" this into a linear order: [A, B, C], or [A, C, B].  This ambiguity is crucial for human interaction; imagine A is "Vanilla icecream is the best!", and B is "Are you planning to do something illegal?" and C is "Yes, without a doubt!"
> 
> Suppose we try to ensure every participant sees a *consistent linear serialization* of the DAG order. We could do this, for example, by sorting branches such as {B, C} consistently such as by the lexical sorting of the message hashes. When we consider time, this leads to a concerning UI behavior: over time, new messages may not only be appended to the linearization of messages, but new messages may also be inserted.  For example, if B sorts consistently before C, then a user may at one time see [A, C], then later [A, B, C].
> 
> I believe all of these issues are already present in a two party chat, regardless of the underlying protocols ordering or reliability guarantees.  (For example, XMPP would have this problem, given TCP without packet loss and without malice once there are multiple federated servers interacting, I believe.)  Suppose Alice sends message A, and Bob receives A.  Now Alice composes and sends B before receiving C while Bob composes and sends C before receiving B. The "typical" chat UI will show Alice [A, B] and Bob [A, C] at various times, but later both clients will show all three messages in some order.
> 
> 
> I'm all on board for decentralized protocols that arrive at agreements about DAG structures, but let's keep an eye on the UX implications. Perhaps a better way to express my concern is to imagine a perfect DAG-order agreement protocol, then imagine an attacker which can influence only how the DAG is serialized in the UI, and they cannot alter message contents or the agreed partial ordering, only the ambiguous parts of the ordering. What attacks can they accomplish?
> 
> Maybe a good chat client would show the DAG structure directly, which I hope users could understand and which would remove the linearization requirement altogether which seems full of unavoidable trade-offs. Also, a good client must show messages which are not under agreement yet - at least the messages the user has entered - in some sort of pending state.
> 
> Regards,
> Nathan
> 
> ps: This isn't just abstract nit-picking. I once had a multi-hour emotional altercation with a significant other and we later realized it was due to our two respective SMS clients showing different message orders and that one of us was responding to one question whereas the other believed the response was for a different question.
> 
> 

That is definitely my favoured solution - avoid trying to linearise the DAG, and show the parents clearly in the UI, like in a git history graph. This information objectively portrays the subjective views of each participant, as opposed to subjectively trying to pseudo-objectify all of those views as a single coherent entity.

It may seem complex, but many messaging boards already do a similar thing - indenting replies to posts etc. This would not be much different, except that messages with >1 direct parent might be slightly awkward, but I believe a robust protocol which opportunistically syncs everyone (the "agreement" and "empty message" broadcast stuff we were talking about before) will make that situation a rarity, and the vast majority of sessions will be mostly-linear in the DAG.

Everything I said before was based on these implications, sorry if it wasn't clear.

> 
> 
> On Thu, Dec 5, 2013 at 12:59 PM, Ximin Luo <infinity0 at gmx.com <mailto:infinity0 at gmx.com>> wrote:
> 
>     On 03/12/13 18:12, George Kadianakis wrote:
>     >> On Mon, Dec 2, 2013 at 3:43 AM, Ximin Luo <infinity0 at gmx.com <http://gmx.com>> wrote:
>     >>> On 02/12/13 02:53, Trevor Perrin wrote:
>     >> [...]
>     >>>>
>     >>>> But I agree that committing to DAG predecessors would result in
>     >>>> smaller messages, so could be advantageous.
>     >>>>
>     >>>> OTOH, piggybacking all predecessors means the recipient learns
>     >>>> all
>     >>>> predecessors for a message upon receipt of that message.  That
>     >>>> has
>     >>>> advantages too:
>     >>>>
>     >>>>  * In an OldBlue-type protocol, the recipient could issue LostMsg
>     >>>> requests for all missing predecessors right away, instead of
>     >>>> needing
>     >>>> multiple rounds of waiting for lost messages to arrive before
>     >>>> discovering more predecessors.
>     >>>>
>     >>>
>     >>> Multiple rounds could be alleviated without resorting to "all
>     >>> previous
>     >>> messages", if they communicate their own Frontier in the LostMsg
>     >>> message. Then,
>     >>> people handling this LostMsg can infer that the grandparent(s) are
>     >>> also Lost,
>     >>> and retransmit those too.
>     >>
>     >> That makes sense.  I'll admit I'm not super interested in
>     >> LostMsg/Retransmit algorithms, as we've got existing chat/messaging
>     >> protocols that do an adequate job with reliability (I think?)
>     >>
>     >
>     > I'm also not super interested in LostMsg/Retransmit algrorithms. That
>     > said, I don't think that the current major chat protocols provide
>     > reliability (when the server is adversarial or when a user suddenly
>     > disconnects).
>     >
>     > For example, I don't think that XMPP provides reliability or
>     > in-order-delivery on its own (see XEP-0198 and XEP-0184). However,
>     > because it's used over TCP, its streams are both reliable and
>     > ordered. Still, communication between Alice and Bob over XMPP actually
>     > involves two streams: Alice <-> Server and Server <-> Bob.  While each
>     > of those streams is reliable and ordered, nothing tells us that the
>     > communication channel Alice <-> Bob is reliable or ordered if we
>     > consider the server as an adversary (i.e. the server can reorder or
>     > drop Alice's packets before forwarding them to Bob).
>     >
>     > FWIW, based on https://otr.cypherpunks.ca/Protocol-v3-4.0.0.html, OTR
>     > itself "assumes a network model which provides in-order delivery of
>     > messages, but that some messages may not get delivered at all (for
>     > example, if the user disconnects).". I think OTR provides its own
>     > reliability by using NAKs and doing retransmissions (at least
>     > according to section 5.1 of the otr-wpes.pdf paper).
>     >
>     > Slightly off-topic, but lately when I'm thinking of the network model
>     > of a secure multiparty chat, I've found it helpful to consider a
>     > multiparty chat over a centralized chat protocol (like IRC or XMPP,
>     > where clients send messages to the server and the server broadcasts
>     > them to the other clients). It seems helpful for two reasons:
>     > a) The server, in centralized protocols, is a powerful network
>     > adversary that can easily reorder, delay or drop messages (it's like
>     > the Global Active Adversary for the distributed case).  Similar
>     > attacks can also be performed by network links in a decentralized P2P
>     > model, but thinking of the centralized case seems easier.
>     > b) In the same protocol-agnostic fashion as OTR, it seems natural to
>     > me that the first instances of secure multiparty chats will use the
>     > popular chat protocols that are currently available. This probably
>     > means XMPP MUCs and IRC, which are both centralized.
>     >
>     > What I'm trying to say, is that even though providing reliability in a
>     > multiparty chat protocol doesn't seem like a huge priority, a robust
>     > such protocol should have a way to ensure reliable end-to-end delivery
>     > of its messages.
>     >
> 
>     Interesting that OTR assumes in-order delivery. If we assume a network-level adversary or untrusted central service, it could definitely deliver messages out-of-order. I think the DAG model already protects against this, though.
> 
>     >> You've talked about periodically broadcasting empty messages.  I
>     >> think
>     >> that's a good idea, particularly if parties can expect that of each
>     >> other, as it allows checking for timeliness and liveness of the
>     >> whole
>     >> group.  Tossing that in, here's my latest:
>     >>
>     >> --
>     >>
>     >> Each Message contains:
>     >>  - PREDECESSOR list containing the most-recently received message
>     >> number from each other participant (equivalently: the number of
>     >> messages received from each other Participant)
>     >>  - HASH over the predecessor messages (including the sender's
>     >>  previous message)
>     >>
> 
>     I still think hashing over all participants is redundant. You only need the direct predecessors, like in the git history DAG. From this, anyone can transitively infer your version of the PREDECESSOR list. A less dense DAG is also easier to analyse. We can run with your version for now, though.
> 
>     >> A Message's HASH can be verified once all predecessors have arrived.
>     >>
>     >> Each Participant maintains:
>     >>  - An UNVERIFIED pool of received messages whose hashes have not yet
>     >> been verified, and an expiration time for each.
>     >>  - Lists of RECEIVED messages from each Participant.  The HEAD
>     >>  message
>     >> in each list is the latest message from that Participant, and has an
>     >> expiration.  The tail message in each list is the oldest message
>     >>  that
>     >> is referred to by an UNVERIFIED message (older messages don't need
>     >>  to
>     >> be kept around).
> 
>     I'm not sure this possible in general. If the unverified message has multiple chains of lost messages preceding it, you don't know which "oldest message" is referred to. Using the example from my last email:
> 
>     1<--2---3---4---5---9
>                  \
>                   +-6---7---8
> 
>     (This would occur when, e.g. whoever sent 6,7,8 did not yet receive 5,9. This might be multiple people. Note that this is using the sparse DAG; you can fill in extra pointers for a dense DAG.)
> 
>     Say Alice sees all messages except 6. She knows about 6 itself, because she has 7 which points to it. However, she does not know 6's own precedessors, so she must keep all messages *in case* 6 points to 1.
> 
>     With the expiry mechanism, you can toss all messages that *would be expired* if it was the HEAD, though. That might be a better approach rather than discarding things via DAG-based metrics.
> 
>     >>
>     >> The expirations associated with UNVERIFIED and HEAD messages are for
>     >> timeliness checks:
>     >>  * If there's ever an expired UNVERIFIED message, that means you
>     >> haven't received some predecessor within a reasonable amount of
>     >> time.
>     >>  * If there's ever an expired HEAD message, that means you haven't
>     >> heard anything from some party within a reasonable amount of time.
>     >>
>     >> On receiving a message, do:
>     >>  - Check that the message has newer predecessors than the previous
>     >> message from this sender
> 
>     This is nice. In the sparse form of the DAG, this check would be equivalent to checking that a new message from a sender, say Bob, has their previous message as an ancestor through the DAG. (The same property for other participants follows transitively from this property, so you only need to check the sender. In the dense case, participants can lie about the extra redundant pointers, so we do need to check them.)
> 
>     However, this check is only possible if we have already verified all the predecessors, so I think it would be better to do this as part of "verifying". An example:
> 
>     1<--2   3<--4
> 
>     To simplify things, assume that Bob sends all messages. We receive 2 from Bob, then 4 from Bob, but we haven't received 3 yet. Doing this check here would fail, since we don't know what 3's parents are. Bob could be attacking us with a bogus message, or everything could work out when we later see that 3 has 2 as a parent.
> 
>     >>  - Add the message to the UNVERIFIED pool and the appropriate
>     >>  RECEIVED
>     >> list, and set expiration times
>     >>  - If any message in UNVERIFIED can now be verified, do so, and
>     >>  either
>     >> raise a security fault or delete it from UNVERIFIED
>     >>
>     >> If you haven't sent a message in X time, send an empty message.
>     >>
>     >> Thoughts?
>     >>
> 
>     Basic gist of it looks good, next we need more detail. I'll think about it as well. :)
> 
>     One comment (which might be redundant as I'm not up-to-date with all the contextual literature) is that this reduces scalability. This may or may not be a concern, but this would make e.g. a conversation with 30 people more costly. (But perhaps other parts of the protocol, like the handshake, already make this costly, so we will just restrict ourselves to a "small group".)
> 
>     >
>     > I like it. It seems like it would work.
>     >
>     > (Would be even better if we could sketch a quick proof for its
>     > soundness too.)
>     >
>     > Also, IMHO, messages should not be forwarded to the chat client while
>     > they are in the UNVERIFIED pool. Assuming normal (non-adversarial)
>     > network operation, receiving message predecessors should be quick and
>     > users shouldn't even notice the buffering.
>     >
>     > We can even make some "Large amount of unverified messages" or
>     > "Impossible to verify messages from participant X for a while"
>     > warnings that we can forward to the chat application instead.
>     >
>     > (I'm saying this, because forwarding unverified messages to the client
>     > with a big fat "UNVERIFIED MESSAGE" warning seems like a recipe for
>     > disaster. Similar to CA-SSL's "The site's security certificate is not
>     > trusted" usability/security nightmare.)
>     >
> 
>     I'm cool either way. It would also be easier to implement in the way that you say. My main motivation for suggesting the other approach was performance - to show the user *something* rather than having things block on one unreceived message. But maybe this is not important enough to outweigh the other security concerns.
> 
>     --
>     GPG: 4096R/1318EFAC5FBBDBCE
>     git://github.com/infinity0/pubkeys.git <http://github.com/infinity0/pubkeys.git>
> 
> 
>     _______________________________________________
>     OTR-dev mailing list
>     OTR-dev at lists.cypherpunks.ca <mailto:OTR-dev at lists.cypherpunks.ca>
>     http://lists.cypherpunks.ca/mailman/listinfo/otr-dev
> 
> 
> 
> 
> -- 
> Nathan Wilcox
> Least Authoritarian

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 897 bytes
Desc: OpenPGP digital signature
URL: <http://lists.cypherpunks.ca/pipermail/otr-dev/attachments/20131210/835e129d/attachment.pgp>