Distributed Mutable Containers

This document specifies Distributed Mutable Containers (DMC). Distributed Mutable Containers are distributed data structures that can hold references to content while allowing replicas of the data structures to diverge and merge without conflict by using Commutative Replicated Data Types (CRDTs). DMC enables consistent referencing of mutable content or mutable collections of content, allowing a wide-range of applications such as decentralized geographical information systems.

1. Introduction

Distributed Mutable Containers (DMC) are distributed data structures that allow shared control over a container of values. DMC provides two basic container types: a set and a register. A set can hold many member elements whereas a register only holds a single current value.

These two basic container types can be used to build many interesting applications. For example a user profile can be implemented on a register or a map that consists of waypoints and points of interest can be implemented as a set of elements.

Distributed Mutable Containers enable decentralized applications. Containers are distributed and can be replicated over many different transport mechanisms. Container replicas can be mutated locally and their state may diverge. DMC does not impose any restrictions on how local replicas may be mutated and does not requires any out-of-band coordination. Replicas can always be merged back to a consistent state.

This conflict-free merge-ability is possible by using Commutative Replicated Data Types (CRDTs). A down-side is that DMC only provides two container types (set and register) with very specific semantics. However, we argue that these two basic container types are sufficient for building interesting applications, especially when using a graph-based data model.

Using CRDTs also eliminates the necessity of specifying any dependencies between mutating operations or revisions of container states. This enables DMC to forget operations that do no longer contribute to the current state, thereby improving efficiency and allowing removed content to be irrevocably deleted (the right to be forgotten).

By using the Encoding for Robust Immutable Storage (ERIS), most of the data that constitutes a DMC container can be transported over insecure mediums without loosing confidentiality or censorship resistance. Only a small amount of data needs to be transmitted securely for a peer to be able to reconstruct the state of a container.

DMC also allows control over containers to be delegated by attaching additional authorized keys. This allows collective control over a container.

In this document we explain how DMC works, specify the exact semantics, illustrate the required prerequisites and highlight integrations with other systems as well as applications.

1.1. Terminology

Some terms used in this document:

container: A distributed data-structure. We use this term to refer to the logical container that may have many replicas with many different states.
container identifier: An URI of a container that can be used to reference the dynamic state of the container.
container state: The dynamic state that is computed from the replica object graph (see Section 4.8).
object: A container definition, operation or signature (see Section 4.1).
operation: A description of a mutation of a container (see Section 4.4).
out-of-band: Transport of container state or objects over a protocol or medium that is not explicitly designed to use DMC in an ad-hoc manner (e.g. as an e-mail attachment, see Section 6.4).
predicate: There is a clash of the term predicate in this document. In the context of RDF predicates are URNs that appear in a certain position of an RDF triple (see Section 3.1). In the context of Datalog predicates are named facts that can be accessed from a database or inferred from rules.
replica: An instance of a container.
replica object graph: The RDF graph of all the objects that are known to a replica (see Section 4.1).
root public key: The initial public key of a container that can be used to mutate the container and add new authorized keys to a container (see Section 4.6).

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

1.2. Overview

In Section 2 we give an overview of related work and how DMC relates to and differentiates from respective projects.

The necessary prerequisites are briefly discussed in Section 3. This includes the Resource Description Framework (RDF), content-addressing with the Encoding for Robust Immutable Storage (ERIS), Commutative Replicated Data Types (CRDTs) as well as Datalog as specification language for the computation of container state. References to further reading are provided.

In Section 4 we describe the basic concepts and working of DMC containers. The concrete specification of the two container types (set and register) is given in Section 5.

In Section 6 we describe the necessary state local replicas must hold and how replicas can be synchronized over various transport layers. A CBOR serialization is defined in Section 6.4, which can be used for out-of-band transport of DMC containers.

We conclude with some ideas on how applications might use DMC (Section 7) and with some closing remarks on future work (Section 8).

2. Related Work

2.1. Hypercore and Secure ScuttleBut

Hypercore and Secure ScuttleBut (SSB) are protocols for building decentralized applications. Both use an append-only log to keep track of history. This has two major drawbacks:

It is not possible to delete content. This leads to performance as well as privacy issues.
The state of the container may not fork (i.e. the state must always be a proper linked list with a single head). This practically prevents shared control of a container by multiple parties as it would require a high degree of out-of-band coordination to prevent a fork.

DMC solves both issues by using CRDTs, which do not impose any order on mutating operations.

With DMC content can be completely removed (or forgotten) without affecting the current or future state of the container. We believe that this is an essential functionality that enables the right to be forgotten. Note however that deletion can not be enforced on remote replicas. See Section 6.2.1 for more details.

The state of two DMC replicas are permitted to diverge as they can be merged back without conflict at any time. This is essential for allowing truly collective control of containers without any out-of-band coordination.

2.2. Git

Git is a widely-used distributed version control system. Unlike systems based on append-only logs the state of Git repository may diverge and be merged. However, merging is not conflict-free and requires manual intervention as well as out-of-band coordination.

DMC does not require manual merging: CRDTs are guaranteed to always merge without conflict. The draw-back is that containers are not as general as Git repositories. A Git repository can hold an Unix file system, which can be used to implement a register or a set (and many other data structures). The semantics of a DMC container is hard-coded into the container and is not as flexible as a Git repository.

The semantic restriction allows DMC containers to diverge and merge without conflict and, as shown in Section 7, is enough to still implement many interesting applications.

2.3. GNU Name System (GNS)

The GNU Name System (GNS) is a distributed name database being developed by the GNUnet project as a replacement for DNS.

Similar to DMC, GNS provides stable names for mutable content. In GNS names are human-meaningful and can be delegated to form hierarchichal names similar to DNS. DMC, on the other hand, does not provide human-meaningful names (see Section 4.3), but a much more flexible model of mutation and replication.

We believe that GNS and DMC are highly complementary in the sense that GNS might be used for human-meaningful names for DMC containers (see also Section 8.1).

2.4. InterPlanetary File System (IPFS)

The InterPlanetary File System (IPFS) is a protocol for peer-to-peer storing and sharing of data. IPFS, at its core, only provides immutable storage. IPFS together with exentsions such as the InterPlanetary Name System (IPNS) are closer to the goals of DMC.

IPNS seems to be conceptually similar to a Last-Writer-Win Register (see Section 5.2). An IPNS identifier is the hash of a public key. The corresponding secret key can be used to update what the IPNS identifier resolves to. Unlike DMC, IPNS does not explicitly allow state to diverge or authority over an IPNS identifier to be delegated. We believe that this is important for enabling collective control over data structures.

IPFS does not provide confidentiality and only limited censorship resistance. Content stored on IPFS is by default unencrypted and can thus be trivially blocked by individual hosts. DMC improves by always encrypting content with ERIS (see [_content-addressing]).

IPFS can, however, be securely used as an underlying block storage for DMC (see Section 6.2) by only storing encrypted blocks in IPFS [ERIS].

2.5. Vegvisir

Vegvisir is a partition-tolerant distributed data-structure that is based on CRDTs [vegvisir] and served as an inspiration for DMC.

Similarly to DMC, Vegvisir allows delegation of authority over a container. Unlike DMC, Vegvisir encodes a partial ordering of operations on the container. While this allows features such as key revocation (Section 4.6.1), it prevents the right to be forgotten (Section 6.2.1).

2.6. Riak

Riak is a distributed key-value data store for enterprise applications. Similar to DMC, CRDTs are used to ensure consistency over database replicas.

Unlike Riak, DMC is designed for outside of the enterprise datacenter.

3. Prerequisites

3.1. Resource Description Framework (RDF)

The Resource Description Framework (RDF) [RDF] provides a concrete graph data model for highly-interlinked content. It is a concrete data model for Linked Data.

RDF defines a very simple graph model for describing content. The basic unit is a triple consisting of a subject, predicate and object. Subjects and predicates are identified with IRIs (internationalized URIs), while objects are either IRIs or literal values.

DMC uses RDF to describe container definitions, operations and container state. DMC is especially well-suited for applications that also use RDF, but not limited to such.

A basic understanding of RDF is expected to understand this document.

3.2. Content-addressing

Identifying content by the cryptographic hash of the content itself is called content-addressing. This allows content to be easily replicated, enabling decentralized systems.

DMC groups RDF triples into content-addressable fragment graphs [content-addressable-rdf].

Furthermore, DMC uses the Encoding for Robust Immutable Storage (ERIS) [ERIS], which enables secure and censorship resistant content-addressing. ERIS defines how arbirtary content can be encoded into uniformly sized encrypted blocks. We will use this when discussing local state of replicas and synchronization between replicas (Section 6).

3.3. Public-Key Signatures

To ensure authenticity of operations and verify that operations are authorized we use the Ed25519 algorithm for cryptographic signatures and the RDF Signify vocabulary for describing signatures [rdf-signify].

3.4. Commutative Replicated Data Types (CRDTs)

Conflict-free replicated data types are a class of data structures that can be replicated over hosts, updated independently and are guaranteed to be mergeable without conflict.

There are two sub-classes of conflict-free replicated data types: Convergent Replicated Data Types (state based) and Commutative Replicated Data Types (operation based). We will use the latter as it seems to work nicer with content-addressing - operations are content-addressed, immutable and individually identifiable entitites.

Commutative Replicated Data Types rely on the fact that operations that mutate the data structures commute, i.e., the order in which the operations are applied does not change the outcome. This guarantees that distributed replicas of a CRDT converge to the same state as soon as the same set of operations is applied at both replicas (regardless of order). This property is called strong eventual consistency.

We will use two types of CRDTs: An Observed-Remove Set (section Section 5.1) and a Last-Writer-Win Register (section Section 5.2) and will introduce both in the respective sections.

The reader is referred to Shapiro et. al. [crdt] for a formal treatment and an overview of the various types of CRDTs.

3.5. Datalog

Datalog is a declarative logic programming language. It has a very simple syntax and semantics that makes it useful as a query language for interacting with large amounts of data.

We will use Datalog as a specification language for describing the semantics of container state. This allows a very concise description that can be directly used in existing, efficient implementations of Datalog. Furthermore, certain aspects of Datalog semantics (monotonicity) make it perfectly adapted for reasoning about distributed systems [calm].

Datalog can also be used as an application level query language (e.g. for answering queries like "all posts about cats posted by people that I have sent a message to in the last year"). The query can be combined with container state resolution queries to a single query. This enables a DMC implementation that uses Datalog to not only provide primitive containers but a rich interface for efficiently interacting with data.

We require three extensions of Datalog which are well studied in the literature: Built-in predicates (for computing cryptographic functions), stratified negation, and monotonic aggregation. All required extensions are available in the open-source Datalog implementation Soufflé, which was used to develop and test the work presented.

For a formal introduction to Datalog the reader is referred to the overview paper by Ceri et. al. "What you always wanted to know about Datalog (and never dared to ask)" [datalog].

4. Distributed Mutable Containers

In this section we describe the framework in which Distributed Mutable Containers work. Exact specification of container semantics is deferred to Section 5. How replicas of containers manage state and synchronization is described in section Section 6.

4.1. Objects and Graph

DMC distiguishes between three different types of objects:

Container definition: An object that defines a container (see Section 4.2)
Operations: Operations that mutate a container (see Section 4.4)
Signatures: Public-key cryptography signatures of containers (see Section 4.5)

Objects are immutable RDF fragment graphs [content-addressable-rdf] that are content-addressed using ERIS [ERIS] (see Section 3.2). Objects are referenced using the respective ERIS read capability (an URN).

Objects reference each other and together form a graph that we call the replica object graph. The replica object graph is an RDF graph consisting of RDF triples. Every replica of a container maintains its own graph which corresponds to the local state of the container.

The dynamic container state is computed from the replica object graph. The dynamic state can be referenced by using the container identifier (see Section 4.3).

In the following the Datalog predicate graph(Subject, Predicate, Object) is used to access RDF triples in the replica object graph.

4.2. Container Definition

A container is created by creating a container definition.

A container definition MUST include:

Type declaration using the rdf:type predicate to a sub-class of dmc:ContainerDefinition.
A root public key using the dmc:rootPublicKey predicate. The root public key MUST be encoded using the URI scheme defined by RDF Signify [rdf-signify].

The root public key is a Ed25519 public key. The associated secret key of the root public key can be used to mutate the container by signing operations or adding new public keys to the list of authorized keys for the container (see Section 4.6 and Section 4.7).

A container definition MAY contain further triples (predicates and objects) to describe further meta-data (e.g., creator, an ActivityStream inbox, etc.). It is RECOMMENDED to use the Dublin Core Metadata Terms [dcm-terms] for generic meta-data.

For example a container definition for a set (defined in section Section 5.1):

@prefix dmc: <http://purl.org/dmc/ns#> .

<urn:erisx2:AAAELYRT24S2SACAPSMGTNYYHCCQHBDGQH3SC3K3Y35PI73SLTA6JTO7S7XTA66LZQ2OKSAZZI4BWXJCGGJLZ74WIWEMXZ2SXDC4JVAFKM>
    a dmc:SetDefinition ;
    dmc:rootPublicKey <urn:ed25519:pk:O7RZC7XVKCRU4LFT4ACSBP3GQ55PPGBZHCBQJMJ46VKJN63HOMSA> .

Warning

The URN of ERIS encoded content is not yet finalized. See Section 8.1.

Note	A container definition is immutable. References to the container definition ERIS read capability (URN) always resolve the same content. To resolve the dynamic and mutable state of a container we use the container identifier.

4.3. Container Identifier

To reference the dynamic and mutable state of a container we use the special URN scheme dmc:CONTAINER_DEFINITION_ERIS_CAPABILITY where CONTAINER_DEFINITION_ERIS_CAPABILITY is the encoded ERIS read capability.

For example the identifier for the container defined in the previous sections is:

dmc:AAAELYRT24S2SACAPSMGTNYYHCCQHBDGQH3SC3K3Y35PI73SLTA6JTO7S7XTA66LZQ2OKSAZZI4BWXJCGGJLZ74WIWEMXZ2SXDC4JVAFKM

4.4. Operations

Operations mutate container state. They are represented as RDF graphs that are content-addressed using ERIS.

Operations are either:

Container type specific operations that mutate the container (i.e. dmc:Add and dmc:Remove for the set container and dmc:Update for the register)
Operation to add an additional authorized key to a container (dmc:AddKey see Section 4.6)

All operations are sub-classes of dmc:Operation.

All operations MUST contain a single dmc:container predicate to specify the container the operation acts on.

For example the operation for adding an element to the set defined above looks like this:

@prefix dmc: <http://purl.org/dmc/ns#> .

<urn:erisx2:AAACG7NGKQZ4ZSR4D4VS3MFQNT5DHHOS7QYNSH5MCQR6HTYO6GTSLB43CFDWNBPNZ5FTG2LFW5QM735GNT6Y44QHCSEAGSRIMYCNJ3ND44>
    a dmc:Add ;
    dmc:container <dmc:AAAELYRT24S2SACAPSMGTNYYHCCQHBDGQH3SC3K3Y35PI73SLTA6JTO7S7XTA66LZQ2OKSAZZI4BWXJCGGJLZ74WIWEMXZ2SXDC4JVAFKM> ;
    rdf:value <urn:erisx2:AAAAV4OIFHWY67XFEHAOQVXUOWTYDVG5TEY6S6IW4PJ4SQLVJJF4MIKNDLKUDPPHDCKLBUIAJQ3U2IEARRPFHEHWFW5NJY7BJUGFESPGDQ> .

4.5. Signatures

Public-key cryptographic signatures are used to ensure that operations are issued by authorized entitites.

The RDF Signify vocabulary [rdf-signify] is used to describe signatures as RDF. This means:

Signatures MUST have type signify:Signature
The signed operations MUST be referenced using the signify:message predicate. There MUST NOT be multiple signify:message predicates.
The Ed25519 signature is encoded using the rdf:value predicate. There MUST NOT be multiple rdf:value predicates.

We introduce the following signed helper Datalog predicate to decide if there is a valid signature of Message that can be verfied by the public key Key:

signed(Message, Key) :-
    graph(Signature, a, signify:Signature),
    graph(Signature, signify:message, Message),
    graph(Signature, rdf:value, SignatureValue),
    ed25519Verify(SignatureValue, Message, Key).

This uses the special ed25519Verify(Signature, Message, Key) Datalog predicate which performs the validation of the Ed25519 signature.

An example signature:

@prefix signify: <http://purl.org/signify/ns#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<urn:erisx2:AAAHZ5E6BYXD7Y3LU625HGB5OXRBLZJQ5TXURD5S5PWCIHEVI7IKKIUF7B5PQRIO52VIEARK5Y2WT4L5BEKF2AXJHN5H3M34XLYJO6HKF4>
    a signify:Signature ;
    signify:message <urn:erisx2:AAACG7NGKQZ4ZSR4D4VS3MFQNT5DHHOS7QYNSH5MCQR6HTYO6GTSLB43CFDWNBPNZ5FTG2LFW5QM735GNT6Y44QHCSEAGSRIMYCNJ3ND44> ;
    rdf:value "HOmPzPF8lbUyv/p9Nho/43aFFsLI9y9TiL0vvAHgemesmBt3smNV+2KY/AuZqE756a0jeLkw1dxZPRiu9zrTCg=="^^xsd:base64Binary ;
    signify:publicKey <urn:ed25519:pk:O7RZC7XVKCRU4LFT4ACSBP3GQ55PPGBZHCBQJMJ46VKJN63HOMSA> .

4.6. Authorized Keys

An authorized key of a container is an Ed25519 public key whose associated secret key can be used to sign operations.

An authorized key is either:

A root public key of a container
A key that has been added to the container using a dmc:AddKey operation that is signed with a root public key

The following authorizedKey Datalog predicate captures these two possibilities:

authorizedKey(Container, Key) :-
    graph(Container, dmc:rootPublicKey, Key).

authorizedKey(Container, Key) :-
    graph(Container, dmc:rootPublicKey, RootPublicKey),
    graph(AddKey, a, dmc:AddKey),
    graph(AddKey, dmc:container, Container),
    graph(AddKey, rdf:value, Key),
    signed(AddKey, RootPublicKey).

A key Key is an authorized key for a container Container if and only if the authorizedKey(Container, Key) holds.

4.6.1. Revocation

Currently DMC does not allow authorized keys to be revoked. Once a key has been added, it can not be revoked.

Adding the ability to revoke a key is tricky as it can break the commutativity of operations on a container. Even when the set of authorized keys is modelled as an CRDT (i.e. an OR-Set, see Section 5.1) the composition with the elements of a container does not commute.

For example:

A container (a set) is defined.
An additional key key1 is authorized for the container using the root public key.
There are two replicas of the container that become seperated.
At replica 1 an element apple is added using key1.
At replica 2 the key key1 is revoked.

There are now three ways that the divergent state of replica 1 and 2 can be merged:

apple is an element of the container because key1 has been revoked after the addition of apple (the view of replica 1).
apple is not an element of the container because key1 has been revoked before the addition of apple (the view of replica 2).
apple is not an element of the container as key revocation also revokes all operations that where signed with the key.

Options 1 and 2 seem more natural as key revocation should not necessarily revoke past operations. However, the order of operations and revocation of the key is important - mutating operations and key revocations do not commute.

Options 3 preserves commutativity of operations, key authorizations and key revocations. However, the semantics do not seem to fit applications we have in mind.

Further work is required to specify and implement key revocation in DMC.

Note that Vegvisir (see Section 2.5) solves this by imposing a (partial) ordering of mutating operations and key authorizations/revocations. This allows an exact definition of when option 1 or option 2 should be used when merging. However, this creates permanent links to past operations that can no longer be forgotten. This prevents the right to be forgotten and does not seem like a viable solution for DMC (Section 6.2.1).

4.7. Authorized Operations

An operation is authorized on a container if and only if it is signed by an authorized key of the container.

authorizedOperation(Container, Operation) :-
   graph(Operation, dmc:container, Container),
   authorizedKey(Container, Key),
   signed(Operation, Key).

An operation Operation is a valid operation on the container Container when authorizedOperation(Container, Operation) holds.

4.8. Container State

The container state is computed from an replica object graph. Implementations MUST compute the container state when the container identifier is dereferenced.

Container state is represented as RDF using a sub class of the dmc:Container class (either dmc:Set or dmc:Register). The exact representation of the container state as well as the semantics is defined in section Section 5 for the different container types.

For example the state of the set defined above would be resolved as follows after two elements have been added:

@prefix dmc: <http://purl.org/dmc/ns#> .

<dmc:AAAELYRT24S2SACAPSMGTNYYHCCQHBDGQH3SC3K3Y35PI73SLTA6JTO7S7XTA66LZQ2OKSAZZI4BWXJCGGJLZ74WIWEMXZ2SXDC4JVAFKM>
    a dmc:Set ;
    dmc:rootPublicKey <urn:ed25519:pk:O7RZC7XVKCRU4LFT4ACSBP3GQ55PPGBZHCBQJMJ46VKJN63HOMSA> ;
    dmc:member <urn:erisx2:AAAAV4OIFHWY67XFEHAOQVXUOWTYDVG5TEY6S6IW4PJ4SQLVJJF4MIKNDLKUDPPHDCKLBUIAJQ3U2IEARRPFHEHWFW5NJY7BJUGFESPGDQ> ;
    dmc:member <urn:erisx2:AAAF5Q7CXYPZH63IQ3EF4YVMBOEC6YFQH6JQ7YQMG2QYMKDJLWM5LHXRA6XBHJGKMCJL7QTWAH4PFCTDVXIIKP7HOTKGVVIOOAUAXKFNUE> .

The state depends on the available operations and signatures of operations (objects, see Section 4.1). See Section 6 on how the state at local replicas is managed and can be synchronized with other replicas.

5. Container types

5.1. Set

Sets are unordered collections that hold distinct members. They are a useful base data structure for building complex applications.

Adding and removing elements from a set are, in general, non-commutative operations:

\${x} vv {x} \\ {x} != {x} \\ {x} vv {x}.\$

We will use a little trick to solve this problem. The data structure that implements this trick is called an Observed-Remove Set (OR-Set).

5.1.1. Observed-Remove Set

Instead of keeping track of only the elements in the set, we keep track of tuples of elements along with a unique identifier of the operation that was used to add the element:

Applying the operations \$o_1\$ which adds the element \$a\$ to the an empty set container results in \${(o_1,a)}\$.
We again add the element \$a\$ with the operation \$o_2\$ resulting in \${(o_1, a), (o_2,a)}\$

This set of tuples is just an internal representation. The exposed set is \${a}\$.

When removing an element we must specify the element and operation that added the element: After removing the element \$a\$ that was added with operation \$o_1\$ the internal representation of the set is \${o_2, a}\$. The exposed set remains \${a}\$.

It can be shown that the operations (add and remove) on an OR-Set commute. The reader is referred to Shapiro et. al. [crdt] for a formal treatment of the Observed-Remove Set.

5.1.2. Definition

A set is defined by creating a container definition of type dmc:SetDefinition (see example in section Section 4.2).

5.1.3. Operations

There are two operations that mutate a set container:

dmc:Add: Adds elements to the set. The element is referred to by the rdf:value property.
dmc:Remove: Removes elements from the set. Instead of referring to the element to be removed directly, the operation refers to the dmc:Add operation that added the elements to be removed with the property dmc:operation (see Section 5.1.1).

Set operations MUST reference the element (when dmc:Add) or the operations (when dmc:Remove) using the rdf:value predicate. Set operations MAY have multiple rdf:value predicates. This allows the adding or removing of multiple elements with a single operation.

5.1.4. State

The following Datalog program and the predicate setMember(Container,Member) defines the current members of the set given operations in the graph (using the Datalog predicate graph):

setOperationRemoved(Container,Operation) :-
   graph(Operation, a, dmc:Add),
   graph(Operation, dmc:container, Container),
   graph(RemoveOperation, dmc:operation, Operation),
   graph(RemoveOperation, a, dmc:Remove),
   authorizedOperation(Container, RemoveOperation).

setMember(Container, Member) :-
    graph(Container, a, dmc:Set),
    graph(Operation, a, dmc:Add),
    graph(Operation, rdf:value, Member),
    authorizedOperation(Container, Operation),
    ! setOperationRemoved(Container, Operation).

The helper predicate setOperationRemoved checks if the operation has been "removed".

The state is represented as an dmc:Set RDF graph where members are added using the dmc:member predicate.

5.2. Register

A register is a data structure that holds at most a single element. This is useful for singleton objects that can be updated such as user profiles.

The DMC register is based on a Last-Writer-Wins register.

5.2.1. Last-Writer-Wins Register

A Last-Writer-Wins Register (LWW-Register) [crdt] creates an order on updates that change the value by adding a timestamp. The current value is the value of the update with the greatest timestamp.

Note that this requires a form of weak-coordination, namely having timestamps that are consistent with causal order. It also permits a denial-of-service attack as a malicious agent who is authorized to update the register may use a extremely high timestamp and block updates from other authorized agents (until they have adapted to using an even higher timestamp). The DMC register is not suited for potentially malicious agents that share control over the register. Nevertheless, we believe it is a valuable data structure for containers where control is kept very tight and authorized agents are trusted to a high degree. A concrete application is user profiles where only users are authorized to mutate the profile.

5.2.2. Definition

A register is defined by creating a container definition of type dmc:RegisterDefinition.

5.2.3. Operations

There is only one operation to mutate a register: dmc:Update.

A timestamp MUST be included with the property dmc:timestamp. A reference to the value to which the register shall be set MUST be defined with the rdf:value predicate. There MUST NOT be multiple rdf:value predicates.

5.2.4. State

The following Datalog program and the predicate registerValue` defines the current value of the register given the operations in a graph:

registerUpdateTimestampValue(Register, Timestamp, Value) :-
    graph(Operation, a, dmc:Update),
    graph(Operation, dmc:container, Register),
    graph(Operation, dmc:timestamp, Timestamp),
    graph(Operation, rdf:value, Value),
    authorizedOperation(Register, Operation).

registerValue(Register, Value) :-
    _ = max Timestamp : registerUpdateTimestampValue(Register, Timestamp, Value).

The registerUpdateTimestampValue is a helper predicate to get all authorized updates with timestamps and values.

Note	Soufflé has only recently added the aggregation functionality required in the `registerValue` predicate (not included in version 2.0.2, see pull request #1693).

The state is encoded with type dmc:Register and the current value is indicated with the rdf:value property.

Note	The register may be in a state where it does not hold any value if there has been no update operation.

6. Replica State and Synchronization

In Section 4 we have seen how a replica of a DMC container consists of objects that together form a graph. In this section we will discuss how objects are stored locally and can be synchronized over various transport layers.

6.1. Objects and Blocks

An implementation of DMC that is capable of computing the state of a local replica MUST maintain two data structures:

replica objects: Set of ERIS read capabilities of all objects the replica knows about (see Section 4.1).
replica block references: Set of references to blocks required to decode the replica objects.

Implementations MAY include references to other blocks in the replica block references. For example, it might make sense to add references to blocks required to decode ERIS encoded content that was added to a container (in addition to the operations and signatures).

The replica block references can be shared with caching peers - peers that can help distribute the container state without being able to compute container state.

The necessary data structures can be implemented on a persistent key-value store, a relational database or other persistent storage mechanisms.

6.2. Block Storage

The replica objects and replica block references only store read capabilities and references to blocks. A DMC implementation MUST also be able to store and retrieve blocks of size 1KiB or 32KiB (permitted ERIS block sizes). References by which blocks can be retrieved are the Blake2b hash of block content and are stored in the set of block references.

External system such as IPFS or GNUNet can be used as block storage (see examples in the ERIS specification [ERIS]).

6.2.1. Garbage Collection and the Right to be Forgotten

When an object is removed from replica objects, corresponding block references SHOULD be removed from the replica block references. This MAY be implemented as a garbage collection procedure that is run occasionally.

Implementations MAY also maintain a set of garbage collected block references. This set of garbage collected block references can be used to prevent previously deleted content to reappear in a local replica.

6.3. Transport Layers

Replicas of DMC containers can be synchronized over many transport layers. The general synchronization procedure is:

Synchronize the replica objects and replica block references. This is a problem known as set reconciliation.
Synchronize blocks so that replicas hold blocks in local block storage.

This synchronization procedure can be implemented over protocols such as HTTP or CoAP. Exact descriptions of this procedure over concrete transport layers will follow with implementations (see Section 8.1).

In Section 6.4 we define a CBOR serialization of replica state. This serialization can be used for out-of-band transport over protocols like Email, FTP or Sneakernet.

Futhermore it is possible to use publish-subscribe systems (such as MQTT or XMPP) to keep replicas synchronized. Replicas subscribe to a publish-subscribe topic and publish modifications of replica objects and replica block references. However, it seems necessary to still be able to run the synchronization procedure as described above when replicas disconnect and reconnect from the publish-subscribe system.

As part of the DREAM project we are researching a peer-to-peer publish-subscribe system, UPSYCLE, which will also be usable to synchronize replicas of DMC containers.

6.4. CBOR serialization

In this section we define a CBOR serialization of replica state suitable for out-of-band transport of replica state (e.g. as an e-mail attachment or on a USB disk). This serialization is not suitable as working state representation. Implementations should use more efficient state storage such as key-value stores.

The replica state is a CBOR array with elements:

Identifier: The container identifier (see Section 4.3) as binary encoded ERIS read capability.
Objects: An optional CBOR array of the ERIS read capability of objects.
Blocks: An optional map of block references (byte string) to block content (byte string).

Objects and blocks are optional for the cases where a CBOR serialization of the replica state is created for caching peers that are not able to decode objects and compute container state or when an external block storage is used.

The specification of the serialization is given as Concise Data Definition Language (CDDL) [RFC8610]:

ReplicaState = [
    Identifier,
    ? Objects,
    ? Blocks
]

ERISReadCapability = #6.276(bstr)

Identifier = ERISReadCapability

Objects = [ * ERISReadCapability ]
Blocks = { * bstr => bstr }

7. Applications

DMC provides two container types (set and register) that can be used as underlying data structure for many applications.

We highlight to applications that serve as motivation for DMC and are being developed in the context of the openEngiadina and INCOMMON projects.

7.1. Geographical Information Systems (GIS)

Geogrpahical Information Systems (GIS) are systems for modelling, analysis and management of geographically related resources.

We believe that DMC can serve as a foundation for decentralized Geographical Information Systems as it addresses many challenges in building a GIS [smallworld]:

Accomodate existing data: By using RDF many exising data sources can be included seamlessly. This includes large data sets by public institutions as well as from projects such as OpenStreetMap (see LinkedGeoData).
Version Management: DMC intrinsically provides version managment and allows changes to be tracked and attributed.
Query Language: We believe that Datalog is an extremly well-suited language for querying geographically related resources.
Large Data Volumes: We hope that DMC can be implemented to efficiently handle large sets of data. Further work is required to validate this.

7.2. ActivityPub

ActivityPub is a protocol for federated social media that uses the ActivityStreams [ActivityStreams] vocabulary to describe social interactions. As ActivityStreams is simply an RDF vocabulary it can be used to describe social interactions over many different data sets (e.g. the creation of a node on a map).

DMC is very well suited for handling RDF and thus also ActivityStreams. This allows interaction between DMC systems and ActivityPub services. Furthermore AcitivtyPub can be implemented using DMC containers.

8. Conclusion

It’s Tricky, Tricky, Tricky

— Run-D.M.C.

8.1. Future Work

In the context of the DREAM project we will develop an OCaml implementation of DMC. We expect refinements to the specification from insight gained while implementing the core specification as well as applications that use DMC.

DMC relies on novel specifications such as ERIS, RDF Signify and Content-addressable RDF. Further work is required to mature and finalize these ideas.

It would be interesting to explore usage of other complementary systems such as the GNU Name System (GNS), GNUNet, IPFS, Named Data Networks with DMC.

8.2. Acknowledgments

We would like to acknowledge the value of Sci-Hub and Library Genesis for this work. Independent research such as this would not be possible without liberated access to knowledge. Water the flowers - clean the volcanoes.

An initial version of this document was developed as part of the openEngiadina project and supported by the NLNet Foundation trough the NGI0 Discovery Fund. Further work has been done as part of the DREAM project and is supported within the framework of the NGI-POINTER project.

Appendix A: RDF Vocabulary

The RDF vocabulary is also available at http://purl.org/dmc/ns.

@prefix dmc: <http://purl.org/dmc/ns#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

<http://purl.org/dmc/ns#>
    a owl:Ontology ;
    rdf:label "Distributed Mutable Containers" ;
    rdfs:comment "Vocabulary for describing state and operations of Distributed Mutable Containers" .

# Container (asbtract class)

dmc:Container
    a rdfs:Class ;
    rdf:label "DMC container" ;
    rdfs:comment "A DMC container." .

# Container Definition

dmc:ContainerDefinition
   rdf:label "ContainerDefinition" ;
   rdfs:comment "Definition of a Distributed Mutable Container" .

dmc:rootPublicKey
    a rdfs:Property ;
    rdfs:domain dmc:Container ;
    rdf:label "root public-key" ;
    rdfs:comment "The root public-key of the container" .

# General predicates

dmc:container
    a rdfs:Property ;
    rdfs:range dmc:Container ;
    rdf:label "container" ;
    rdfs:comment "The associated container." .

dmc:operation
    a rdfs:Property ;
    rdfs:range dmc:Operation ;
    rdf:label "operation" ;
    rdfs:comment "The associated operation" .

# Operation (abstract class)

dmc:Operation
    a rdfs:Class ;
    rdf:label "Operation" ;
    rdfs:comment "An operation on a Distributed Mutable Container" .

# Operation to add an authorized key

dmc:AddKey
    rdfs:subClassOf dmc:Operation ;
    rdf:label "AddKey" ;
    rdfs:comment "Operation that adds an authorized key to the container" .

# Set

dmc:SetDefinition
    rdfs:subClassOf dmc:ContainerDefinition ;
    rdf:label "SetDefinition" ;
    rdfs:comment "Definition of a DMC set" .

dmc:Set
    rdfs:subClassOf dmc:Container ;
    rdf:label "Set" ;
    rdf:comment "A DMC set" .

dmc:member
    a rdfs:Property ;
    rdf:label "member" ;
    rdfs:comment "Member element of a DMC set" ;
    rdfs:domain dmc:Set .

dmc:Add
    rdfs:subClassOf dmc:Operation ;
    rdf:label "Add" ;
    rdfs:comment "Operation that adds an element to a DMC set" .

dmc:Remove
    rdfs:subClassOf dmc:Operation ;
    rdf:label "Remove" ;
    rdfs:comment "Operation that removes an elemnt of a DMC set" .


# Register

dmc:RegisterDefinition
    rdfs:subClassOf dmc:ContainerDefinition ;
    rdf:label "RegisterDefinition" ;
    rdfs:comment "Definition of a DMC register" .

dmc:Register
    rdfs:subClassOf dmc:Container .

dmc:Update
    a rdfs:Class ;
    rdfs:subClassOf dmc:Operation .

dmc:timestamp
    a rdfs:Property ;
    rdfs:domain dmc:Update .

Changelog

v0.2.0 (2021-03-30)

Major revision.

Moved vocabulary namespace from http://purl.org/dmc to http://purl.org/dmc/ns
Added extended introduction with comparison to other projects
Added ability to add additional authorized keys
Rewrite of most sections

v0.1.0 (2020-10-19)

Initial version.

Copyright

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

Normative References

[ERIS] pukkamustard. Encoding for Robust Immutable Storage. 2020
[content-addressable-rdf] pukkamustard. Content-addressable RDF. 2020
[rdf-signify] pukkamustard. RDF Signify. 2020
[RDF] Graham Klyne, Jeremy J. Carroll, Brian McBride. RDF 1.1 Concepts and Abstract Syntax, 2014.
[TURTLE] David Beckett, Tim Berners-Lee, Eric Prud’hommeaux, Gavin Carothers. RDF 1.1 Turtle, 2014
[RFC2119] S. Bradner. Key words for use in RFCs to Indicate Requirement Levels, 1997.
[RFC8610] H. Birkholz, C. Vigano, C. Bormann. Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures, 2019.
[RFC8949] C. Bormann & P. Hoffman. Concise Binary Object Representation (CBOR), 2020.

Informative References

[ActivityStreams] Snell and Prodromou, Activity Streams 2.0, 2017.
[calm] Hellerstein, Joseph M., and Peter Alvaro. "Keeping CALM: when distributed consistency is easy." arXiv preprint arXiv:1901.01930 (2019).
[crdt] Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski. A comprehensive study of Convergent and Commutative Replicated Data Types. [Research Report] RR-7506, Inria – Centre Paris-Rocquencourt; INRIA. 2011, pp.50.
[datalog] Ceri, Stefano, Georg Gottlob, and Letizia Tanca. "What you always wanted to know about Datalog(and never dared to ask)." IEEE transactions on knowledge and data engineering 1.1 (1989): 146-166.
[dcm-terms] DCMI Usage Board. DCMI Metadata Terms. 2020.
[lsd0001] M. Schanzenbach, C. Grothoff, B. Fix. The GNU Name System
[smallworld] Richard G. Newell, David G. Theriault. Smallworld Technical Paper No. 1 - Ten Difficult Problems in Building a GIS. 2011.
[vegvisir] Kolbeinn Karlsson, Danny Adams, Edwin Ma, Weitao Jiang, Robbert van Renesse, Hakim Weatherspoon, and Stephen Wicker Vegvisir: A Partition-Tolerant Blockchain for the Internet-of-Things In the 2018 IEEE 38th International Conference on Distrbuted Computing Systems(ICDCS), Vienna, Austria, July 2018.