Home About MIT CC BY-SA 4.0 RSS

Operational Transform

For the past few months, I’ve been attempting to develop a collaborative code editor in the command line. What that entails essentially is a Google Doc-like interface, but using ncurses, allow the program to be used in the terminal. There are of course software out which allows editing collaboration away from the browser (a partial list can be found on Wikipedia), but a scant few offer this capability from the command line. What I want to do is build a vi-like editor which can be connected to from halfway around the world, as well as locally.

Needless to say, it’s been a lot harder than I anticipated.

Currently, I’m working on writing the final protocal needed to sync communication between an editing server with the remote client – how does the client efficiently send data to the server, and how does the server relay to the client the global state of the text document? I’ve made some headway, and I would like to document it for posterity.

Given a server S with read-write permission on a file F, and clients A, B both connected to S, we need a way to signal to S:

  • client::insert("some text")
  • client::delete(n)

The server in turn needs to be able to support:

  • server::update(client),

which will update the given client to the correct global state.

To solve this problem, I hatched out a crude versioning solution, which brought me very close to a final answer:

  • The server must keep track of a boolean client_is_synced on its end, one per client connected.
  • The server must keep track of state changes to each client – if client A sends a command to the server which will alter the state of client B, the server must update its internal client B’s state list.
  • There exists a SYNC message with data and a unique, increasing id.
  • The server shall send a SYNC message to synchronize a client state with the global state of the file.
  • The client keeps a version number property, of the last server SYNC message.
  • Before sending a SYNC message to a client, the server shall mark client.client_is_synced to false.
  • Upon receiving a SYNC message from the server, the client must send an ACK response to the server, with the message id in the SYNC packet.
  • If the message is received by the server, and the message id within the ACK response corresponds to the latest state version, the server marks client.client_is_synced = 1
  • If the message is not received by the server (dropped during either the original send or ack), the client continues to be marked unsynced by the server.
  • If the unsycned client sends some communique to the server in the future, the server must attempt to recover the client.

The recovery protocol is as follows:

  • Server sends DESYNC error to the client.
  • Client responds with SYNC request (not an acknowledgement), with its version number, the number of the last SYNC packet sent by the server.
  • The server then responds to this request by sending the client SYNC packets (or a merged SYNC packet) and with the last version number.
  • The client updates its state accordingly, and must ACK the server SYNC.
  • The client then attempts to resend the original request (which may be altered from the original request due to state changes), or drop the request and await further user instructions.

The main puzzle of this protocol is how to represent the state changes of the client and server. An obvious answer to this would be to use diff and patch magic to reconstruct the client state. The downside, however, is that each client state can be upwards of 10 - 20K in length (roughly the maximum size of an editing screen) – the problem of this is that diff starts slowing down dramatically at these sizes – we’ll need a different way to log the changes. A possible library to use in this scenario, dtl-cpp implemented an algorithm detailed in An O(NP) Sequence Comparison Algorithm (Wu et. al 1989); however, past 20% buffer replacement (something that happens during a scrolling event), times measured was at about .2s per diff calculation, which is prohibitively large.

Currently, I am considering the operational transformation route, which is what Google Docs uses to update client states. Essentially, rather than calculating the string diff as a whole, I will need to broadcast the microtransactions made by every client without a lot of merging edits: if clients A and B both are operating on the same relevant data, then effectively, a call to A.insert will result in client B receiving a command for it to also insert the appropriate data into its own state, sent by the server upon receiving the packet from A. Updates to follow.

Resources

These are meant to be an index of resources for myself to use in the future.

  1. Concurrency Control in Groupware Systems (Ellis and Gibbs 1989) This is the original paper on operational transform, with proposed implementation (“dOPT”).
  2. A Counterexample to the Distributed Operational Transform and a Corrected Algorithm for Point-to-point Communication (Cormack 1995) This cites an oversight to the original dOPT algorithm and proposes changes to correct this oversight.
  3. dOPT (Barnes 2005) Presentation in simple English describing the oversight.