Erebos protocol specification

Note

Specification is currently being written, so content is incomplete.

Introduction

Erebos is intended to be a fully decentralized communication and synchronization protocol. Here, “fully decentralized” means that as opposed to federated services, erebos peers do not need to rely on any servers for their functionality. And even though some servers can help with certain task like peer discovery, a user or his identity is not bound to any such server. In the core of its design is a content-addressable filesystem using objects quite similar to those used in git, although git usage and thus its design differ in some important aspects.

Erebos objects that can be stored locally or sent among nodes and are uniquely identified by Blake2 hash of their canonical representation. Some object types can reference other objects using this hash, including for example references to a previous state if some modification is recorded (similarly as commits referece their parent in git). Whenever some objects represent a state that is shared and synchronized among multiple nodes, it can happen that such state is modified independently on different nodes. As synchronization can happen at arbitrary time and typically in background, it is necessary that these modifications can be merged automatically, so a merged state needs to be properly defined without requiring any user interaction (as opposed to git where sometimes merges require manual resolution of conflicts).

Object representation

This section describes the canonical representation of erebos object. Each object is uniquely identified by hash of this representation, but it does not necessarily need to be stored or transmitted in this form, when a more efficient one can be used as appropriate.

Even though most objects described here or given as examples will have textural representation, keep in mind that it is still binary format and can contain arbitrary data.

Basic structure common for all types of objects is:

<type> 0x20 <data length> 0x0A <data>

There are currently two types of objects: blob (arbitrary binary data) and rec (record). <data length> is a ASCII-encoded decimal representation of length of the <data>. <data> is the actual object data, whose format depends on object type.

Blob

Blob just contains arbitrary data without structure relevant for erebos protocol. Mainly intended to be used small files like message attachments and similar. For example, blob object with hash 9331f492583a8f47f9bf21e50ad298e9b395aa4dfb989257e26c15109526ca3c:

blob 13
Hello world!

Record

Record is a type used for most erebos-relevant structures. The <type> value is "rec" and <data> consist of items in the form:

<name> ':' <type> ' ' <value> '\n'

<name> is name of the item. Custom record typs should use lower-case letters and '-' (dash) character only. Core specification will define some values with uppercase letters. However, implementation should accept and work correctly with arbitrary binary value as the item <name> delimited by the first ':' character.

If <value> contains a '\n' (newline) character, it is encoded as a pair of bytes '\n' '\t', that is, a tab character is appended to distinguish it from the newline ending the record item. This means that <name> can not start with a '\t' character.

Format of <value> depends on <type>:

<type> description <value> format
e empty none, the size should be zero
i integer ASCII-encoded decimal representation, optionally prefixed by '-' if negative
t text UTF-8 encoded string
b binary data hexadecimal representation of binary data
d date decimal representation of UNIX time, followed by ' ' (space) character, followed by time zone offset in the form [+-]HHMM
u UUID xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx where each x is an hexadecimal character
r reference "blake2#" followed by hexadecimal representation of the hash of referenced object

Data updates and history

Keys and signatures

Security of the protocol relies on public-key cryptography. For good security properties and small sizes of keys and signatures, the elliptic-curve Ed25519 scheme is used.

Public key is represented as a record with following fields:

type:t "ed25519"
pubkey:b public key data

The Ed25519 public keys are 256 bits (32 bytes) long, so the pubkey field value consists of 64 hexadecimal characters. Corresponding private key is not stored as erebos object, but in separate storage, which can be for example some secure key store.

Signature represented as a record with:

key:r reference to public key corresponding to the key used for the signature
sig:b signature data

Signature in Ed25519 is 64 bytes long, so the sig field contains 128 hexadecimal characters. The signature here is always signature of the unique identifier (the Blake2 hash) of some other erebos object. So finally, to put together signature and the signed data, following Signature structure is used:

SDATA:r reference to signed data
sig:r reference to signature object signing the SDATA digest; can be many

There can be multiple sig fiels referencing signatures of the same data with different keys. The signed object is valid only when all given signatures can be correctly validated with associated public keys.

Identity

As the focus of the protocol is decentralization, there is no central authority that would certify some kind of identity of users. Instead, design similar to PGP with web of trust is used, just with more focus on ease of use and with updates of identity information and keys being part of normal communaciton, not requiring explicit actions or key servers.

Single identity comprises of a set of individual signed IdentityData objects with the following structure:

SPREV:r * reference to signed IdentityData representing previous version of the identity
name:t ? identity name, to be displayed in messages, contact lists, etc
owner:r ? reference to owner identity (see below)
key-id:r reference to the public identity key
key-msg:r ? reference to the public message key

Signed IdentityData means a Signature object whose SDATA member points to an IdentityData object. Note that the only required field is the identity key (key-id).

Validation

Signed IdentityData object is valid iff:

Creation and updates

Merging

Owner hierarchy

Network protocol

(version 0.1)

The erebos network protocol enables secure and reliable communication between two nodes. Each node is expected to poses an erebos identity, which is used for secure key exchange. Once the secure communication is established, the protocol allows for sending individual packets, e.g. with short text messages or status information, as well as using multiple independent streams for bigger or continuous data.

Establishing connection

Connection between nodes starts with 4-way handshake, during which nodes exchange their identity information and derive session key for secure communication. This phase uses plaintext packets, which start with header consisting of erebos record object, potentially followed by additional objects referenced from the header.

The plaintext header can contain following fields:

ACK:r acknowledgement of received packet
REJ:r rejected packet, e.g. data or connection request
VER:t network protocol version
ANN:n announce own identity
INI:r connection initiation
CKS:b cookie set
CKE:b cookie echo
REQ:r request for data
RSP:r response for data request
CRQ:r secure channel request
CAC:r secure channel accepted

Secure communication

Local discovery

Services

Storage

Local and shared state

Attaching devices

Attach service

UUID: 4995a5f9-2d4d-48e9-ad3b-0bf1c2a1be7f

Synchronization

Sync service

UUID: a4f538d0-4e50-4082-8e10-7e3ec2af175d

Contacts

Contact service

UUID: d9c37368-0da1-4280-93e9-d9bd9a198084

Contact state

UUID: 34fbb61e-6022-405f-b1b3-a5a1abecd25e

Direct messages

Direct message service

UUID: c702076c-4928-4415-8b6b-3e839eafcb0d

Direct message state

UUID: ee793681-5976-466a-b0f0-4e1907d3fade