Erebos protocol specification

Note

Specification is currently being written, so content is incomplete.

Introduction

Erebos is intended to be a fully decentralized communication and synchronization protocol. Here, “fully decentralized” means that as opposed to federated services, erebos peers do not need to rely on any servers for their functionality. And even though some servers can help with certain task like peer discovery, a user or his identity is not bound to any such server. In the core of its design is a content-addressable filesystem using objects quite similar to those used in git, although git usage and thus its design differ in some important aspects.

Erebos objects that can be stored locally or sent among nodes and are uniquely identified by Blake2 hash of their canonical representation. Some object types can reference other objects using this hash, including for example references to a previous state if some modification is recorded (similarly as commits reference their parent in git). Whenever some objects represent a state that is shared and synchronized among multiple nodes, it can happen that such state is modified independently on different nodes. As synchronization can happen at arbitrary time and typically in background, it is necessary that these modifications can be merged automatically, so a merged state needs to be properly defined without requiring any user interaction (as opposed to git where sometimes merges require manual resolution of conflicts).

Object representation

This section describes the canonical representation of erebos object. Each object is uniquely identified by hash of this representation, but it does not necessarily need to be stored or transmitted in this form, when a more efficient one can be used as appropriate.

Even though most objects described here or given as examples will have textural representation, keep in mind that it is still binary format and can contain arbitrary data.

Basic structure common for all types of objects is:

<type> 0x20 <data length> 0x0A <data>
There are currently defined these types of objects:
  • blob (arbitrary binary data),
  • rec (record),
  • ondemand (data loaded on demand),
  • chunked (binary data split into chunks),
  • dir (directory listing with file content and metadata).

Further, <data length> is an ASCII-encoded decimal representation of length of the <data>. <data> is the actual object data, whose format depends on object type.

Blob

Blob just contains arbitrary data without structure relevant for erebos protocol. Mainly intended to be used small files like message attachments and similar. For example, blob object with hash 9331f492583a8f47f9bf21e50ad298e9b395aa4dfb989257e26c15109526ca3c:

blob 13
Hello world!

Record

Record is a type used for most erebos-relevant structures. The <type> value is "rec" and <data> consist of items in the form:

<name> ':' <type> ' ' <value> '\n'

<name> is name of the item. Custom record types should use lower-case letters and '-' (dash) character only. Core specification will define some values with uppercase letters. However, implementation should accept and work correctly with arbitrary binary value as the item <name> delimited by the first 0x3A (':') byte.

If <value> contains 0x0A ('\n', newline) byte, it is encoded as a pair of bytes 0x0A 0x09 ('\n' '\t'), that is, a tab character is appended to distinguish it from the newline ending the record item. This means that <name> can not start with a '\t' character.

Format of <value> depends on <type>:

<type> description <value> format
e empty none, the size should be zero
i integer ASCII-encoded decimal representation, optionally prefixed by '-' if negative
t text UTF-8 encoded string
b binary data hexadecimal representation of binary data
d date decimal representation of UNIX time, followed by ' ' (space) character, followed by time zone offset in the form [+-]HHMM
u UUID xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx where each x is an hexadecimal character
r reference "blake2#" followed by hexadecimal representation of the hash of referenced object
w weak reference "blake2#" followed by hexadecimal representation of the hash of referenced object

The difference between (normal) reference and weak reference is that it must be always possible to follow the normal reference: if a record is stored in a storage, then all referenced objects must be stored there as well (otherwise the storage is considered broken); if a peer sends a record object, it also must be able to provide all referenced objects when requested (otherwise the whole exchange is rejected). No such requirements are there in case of weak references.

On-demand

Object type used to reference data that may be requested from other peers based on user action or settings. Which peers are applicable for this depends on particular context—for example in which service or shared type is this object used. Similarly to a weak reference, the referenced object does not need to be stored in the same storage as the referring ondemand object, nor available from a peer sending it.

The <type> value is "ondemand" and <data> structure is:

<component size> '\n' <ref> '\n'

The <component size> is combined size of the objects referenced by <ref> and transitively by record items of the reference (r) type. So if the type of the referenced object is anything other then record, <component size> is just size of that object. If that object is a record, then <component size> also includes all objects referenced via its items of type r, and so on if any of those objects are records. If any object in the component is referenced via multiple paths it is counted only once. When ondemand (or other non-reference) object is encountered it is counted in the <component size>, but no references from it are followed; only reference items in records.

Size of an object is the size of its whole canonical representation, including its <type> and <data length> header (so it is greater then the <data length> value).

Chunked

Alternative to blob for bigger data, which is split into chunks that can be fetched separately. Each chunk is either blob or another chunked object. Similarly to ondemand object type, chunked object can be stored or sent without having the referenced chunks available.

The <type> value is "chunked" and <data> structure is:

<combined length> '\n'
<ref 1> '\n'
...
<ref n> '\n'

Where each <ref …> is a reference to either blob or another chunked object. The chunked object represents binary data concatenated in order from the referenced objects. The <combined length> is then a sum of <total length> (for blob) or <combined length> (for chunked) of the referenced objects, i.e. the total length of the binary data represented by this chunked object.

Directory

Represents directory in a filesystem.

The <type> value is "dir" and <data> consist of items in the form:

<data ref> ' ' <metadata ref> ' ' <filename> '\n'
Where:
<data ref> is a reference to:
<metadata ref> is a reference to a record containing metadata (information about permissions, owner, etc); see below.
<filename> is a name of the given directory entry. Similarly to items in record, embedded newline (0x0A) byte is represented as '\n' '\t' (0x0A 0x09) sequence. Note that this object format by itself does not impose any restrictions on the contents of the <filename>, not even regarding otherwise special characters like /.

Metadata

Metadata information is a record object containing these fields:

TYPE:u ? UUID of the metadata type
PREV:r ? Reference to a fallback or additional metadata

Further fields and their meaning depend on the value of the TYPE:u. Metadata record without the TYPE:u field denotes empty metadata, so for a directory entry without any metadata, the <metadata ref> should point to an empty record object.

Posix file attributes

TYPE:u 890f7e45-eda5-474e-a1e8-b9acc05d9491
mode:i ? File mode (owner/group/others permissions and special bits)
uid:i ? User ID of owner
gid:i ? Group ID of owner
user:t ? User name of owner
group:t ? Group name of owner
atime:d ? Last access time
btime:d ? Creation time
ctime:d ? Last status change time
mtime:d ? Last modification time

Executable bit

TYPE:u e01ac911-aabe-4b01-8a70-9f8512de811d
executable:e ? If present, file is executable (i.e should have the executable bit set), otherwise not

Data updates

Keys and signatures

Security of the protocol relies on public-key cryptography. For good security properties and small sizes of keys and signatures, the elliptic-curve Ed25519 scheme is used.

Public key is represented as a record with following fields:

type:t "ed25519"
pubkey:b public key data

The Ed25519 public keys are 256 bits (32 bytes) long, so the pubkey field value consists of 64 hexadecimal characters. Corresponding private key is not stored as erebos object, but in separate storage, which can be for example some secure key store.

Signature represented as a record with:

key:r reference to public key corresponding to the key used for the signature
sig:b signature data

Signature in Ed25519 is 64 bytes long, so the sig field contains 128 hexadecimal characters. The signature here is always signature of the unique identifier (the Blake2 hash) of some other erebos object. So finally, to put together signature and the signed data, following Signature structure is used:

SDATA:r reference to signed data
sig:r reference to signature object signing the SDATA digest; can be many

There can be multiple sig fields referencing signatures of the same data with different keys. The signed object is valid only when all given signatures can be correctly validated with associated public keys.

Identity

As the focus of the protocol is decentralization, there is no central authority that would certify some kind of identity of users. Instead, design similar to PGP with web of trust is used, just with more focus on ease of use and with updates of identity information and keys being part of normal communication, not requiring explicit actions or key servers.

Single identity comprises of a set of individual signed IdentityData objects with the following structure:

SPREV:r * reference to signed IdentityData representing previous version of the identity
name:t ? identity name, to be displayed in messages, contact lists, etc
owner:r ? reference to owner identity (see below)
key-id:r reference to the public identity key
key-msg:r ? reference to the public message key

Signed IdentityData means a Signature object whose SDATA member points to an IdentityData object. Note that the only required field is the identity key (key-id).

Validation

Signed IdentityData object is valid iff:

Creation and updates

Merging

Owner hierarchy

Network protocol

(version 0.1)

The erebos network protocol enables secure and reliable communication between two nodes. Each node is expected to possess an erebos identity, which is used for secure key exchange. Once the secure communication is established, the protocol allows for sending individual packets, e.g. with short text messages or status information, as well as using multiple independent streams for bigger or continuous data.

Establishing connection

Connection between nodes starts with 4-way handshake, during which nodes exchange their identity information and derive session key for secure communication. This phase uses plaintext packets, which start with a header consisting of erebos record object, potentially followed by additional objects referenced from the header.

The plaintext header can contain following fields:

ACK:r acknowledgement of received packet
REJ:r rejected packet, e.g. data or connection request
VER:t network protocol version
ANN:r announce own identity
INI:r connection initiation
CKS:b cookie set
CKE:b cookie echo
REQ:r request for data
RSP:r response for data request
CRQ:r secure channel request
CAC:r secure channel accepted

Secure communication

Local discovery

Services

Storage

(version 0.1)

Storage format is intended to persistently store erebos objects in a way that such storage can be concurrently used by multiple processes with minimal locking. This is to allow the possibility to use different programs to handle tasks like network communication, notifications or user interaction, while sharing the storage directory and using it as the communication interface.

Note

To provide necessary guarantees in case of concurrent access, it is assumed that the filesystem used by the storage provides:

  1. atomic way to open files in exclusive mode (operation fails when the file already exists, e.g. O_EXCL on unix), and
  2. atomic rename operation.

An invariant that must always hold in an uncorrupted storage is that all objects referenced from present records, that is targets of items of the r type, are also available in the storage. As a consequence, when storing any record object, all the referenced objects must be stored before that record.

In this version, format is defined only for “loose” objects, that is where each object is stored in a separate file. Also garbage collection (deletion of unreferenced objects), is not described here, so for this storage version, that can be done only “offline”, i.e. when no process is using that storage directory.

Heads

Heads are references to objects that can be updated by applications, where particular head is identified by its UUID (which would be typically randomly generated upon its creation). Each head also has a type (also identified by UUID), which should indicate which kind of data is referenced by the head. While only one type is defined here (Local state, see below), any application is free to define its own head type with custom UUID and using that for any purpose.

Disk format

Following structure is used within the erebos storage directory:

erebos-storage file
Contains single line with “0.1” string.
objects/blake2 directory
Contains zlib-compressed contents of objects in files with object’s blake2 digest used for filename. First byte (two hexadecimal characters) are used for an extra subdirectory instead of being directly part of the filename. So e.g. the “Hello world” object with hash 9331f492583a8f47f9bf21e50ad298e9b395aa4dfb989257e26c15109526ca3c would be stored in objects/blake2/93/31f492583a8f47f9bf21e50ad298e9b395aa4dfb989257e26c15109526ca3c file.
heads/<head-type>/<head-id> files

where <head-type> and <head-id> are UUIDs, identifying head type and specific head, respectively. Contents of the head file is a single line with a full digest of the object the head points to, for example a head file refering to the “Hello world” object would contain:

blake2#9331f492583a8f47f9bf21e50ad298e9b395aa4dfb989257e26c15109526ca3c

Writing objects

When application is going to write a new erebos object to storage, it first needs to compute its digest to determine target file path, denoted <path> here. Then, to avoid potential collisions with other processes writing the same file, following steps are done to write the file object:

  1. Open file <path>.lock in exclusive mode (e.g. with O_EXCL on unix). If the opening fails and the lock file already exists, some other process is already writing the same object. In that case, wait for a short interval and check if the target object has been created or the lock file removed. If that does not happen within some timeout, application may assume the lock is stale and remove the lock file before retrying.
  2. Write zlib-compressed contents of the object data to the <path>.lock file.
  3. Rename <path>.lock to <path>.

This process ensures that the final object file always contains the whole object data. So any application reading an object can just try to open the object file, and if it exists, simply read it.

Updating heads

Similarly to writing objects, a lock file is used to avoid conflicts when updating a head file. However, since the file may already exist and contents vary over time, the process is somewhat more complicated. To update a head file <head> application must:

  1. Determine whether the head file is expected to already exist and if so, what is the expected current content, and also determine the intended new content of the head, before touching the head or taking the lock file.
  2. Ensure that the object referenced by the new head value is stored in the storage.
  3. Open file <head>.lock in exclusive mode. If that fails due to the file already existing, retry after short interval, as another process is currently modifying the head. If the lock file still exists after some timeout, and the application is able to determine that it is still the same lock file (i.e. not a different transaction started later), it may consider the lock to be stale and remove the lock file before retrying.
  4. Check if the current content of the head is the expected one (or if it’s not expected to exist then that it indeed does not exist). If there is mismatch, remove lock file, go back to step 1. and determine the updated content based on the new value of the head.
  5. Write the new content to <head>.lock.
  6. Rename <head>.lock to <head>.

Once the head file is written and moved to final destination, that same file is never again opened for writing, only replaced with a new file for a head update. This means that reading a head content does not need any special handling: just open the head file and read it, even if that happens concurrently with a head update, application will always read a valid head content.

Local state

Head type: 1d7491a9-7bcb-4eaa-8f13-c8c4c4087e4e

Local state is a head type pointing to object with the following format:

id:r reference to local device identity
shared:r * reference(s) to tip(s) of shared state
PREV:w ? optional weak reference to previous state

Shared state

Attaching devices

Attach service

UUID: 4995a5f9-2d4d-48e9-ad3b-0bf1c2a1be7f

Synchronization

Sync service

UUID: a4f538d0-4e50-4082-8e10-7e3ec2af175d

Contacts

Contact service

UUID: d9c37368-0da1-4280-93e9-d9bd9a198084

Contact state

UUID: 34fbb61e-6022-405f-b1b3-a5a1abecd25e

Direct messages

Direct message service

UUID: c702076c-4928-4415-8b6b-3e839eafcb0d

Direct message state

UUID: ee793681-5976-466a-b0f0-4e1907d3fade