Distributed Mutable Containers

Distributed Mutable Containers are distributed data structures that are capable of holding references to arbitrary content while allowing replicas of the data structures to diverge out conflict. We present a concrete implementation of a set (holding multiple values) and a register (holding a single value). Containers are based on Commutative Replicated Data Types (CRDTs). Compared to distributed data structures based on append-only logs (e.g. Hypercore or Secure ScuttleBut) these CRDT based containers allow collective and shared control where authorization to mutate containers can be delegated. Furthermore they do not require the entire history to be known to compute current state which improves efficiency and allows "the right to be forgotten". The containers are specified independently of any transport or storage layer and can be used as underlying data structures for systems implementing ActivityPub, Linked Data Platform over conventional HTTP or systems built on more decentralized technology and network topologies. We present a RDF vocabulary for describing containers, state and operations as well as the semantics for computing container state.

1. Introduction

In previous work we have illustrated how content-addressing can be used to make system more robust and secure [eris]. However, content-addressed data is immutable. A reference to a piece of content will always refer to the exactly same content. This is very desirable for many applications and the reason why content-addressing allows more robust systems.

Nevertheless, there are applications where one wants to reference mutable content. Examples include user profiles or dynamic collections of content. Making small changes to a user profile or adding a piece of content to a collection should preserve the identifiers.

Conventional URLs, i.e. networks addresses of a host that make content available, allow mutable content: the host simply returns the updated content. However, this requires trusting the host to always return the right content as well as relying on the host to be available to provide content.

Public-key cryptography can be used to reduce trust and reliance on a single host while allowing mutable content. Systems like Hypercore (previously Dat) and Secure ScuttleBut (SSB) allow content to be referenced by a public key where holders of the associated secret key can publish updates. There is no need to trust or rely on a single host as the content can be verified to have been published by a holder of the secret key and the content does not need to be stored on a single host.

Hypercore and SSB (and other similar systems) use an append-only log to keep track of history. This has two major drawbacks:

It is not possible to delete content. This leads to performance as well as privacy issues.
The state of the container may not fork (i.e. the state must always be a proper linked list with a single head). This practically prevents shared control of a container by multiple parties as it would require a high degree of out-of-band coordination to prevent a fork.

In the following we specify two mutable containers that improve on append-only logs by using Commutative Replicated Data Types (CRDTs) as underlying data structures:

A set that can hold multiple references to content.
A register that at any given moment holds at most a single reference to content.

By using CRDTs the state of containers is allowed to diverge and is guaranteed to converge on a consistent state eventually. Furthermore, the container history can be garbage collected, allowing content that has been removed from the container to be forgotten.

2. Prerequisites

2.1. Content-addressable RDF and ERIS

We use the scheme for content-addressing as described in previous work [content-addressable-rdf] as well as the ERIS [eris] encoding. This allows us to refer to operations as well as state using an identifier that is determined by the content of the operations as well as enable secure transport and storage.

Example RDF data will be shown in the RDF/Turtle serialization format - a human readable format.

2.2. RDF Signfiy

To ensure authenticity of operations and verify that operations are authorized we use the Ed25519 algorithm for cryptographic signatures and the RDF Signify vocabulary for describing signatures [rdf-signify].

2.3. Commutative Replicated Data Types

Conflict-free replicated data types is a class of data structures that may replicated over hosts, updated independantly and guaranteed to be mergeable without conflict.

There are two sub-classes of conflict-free replicated data types: Convergent Replicated Data Types (state based) and Commutative Replicated Data Types (operation based). We will use Commutative Replicated Data Types.

Commutative Replicated Data Types rely on the fact that operations that mutate the data structures commute, i.e. the order in which the operations are applied does not change the outcome. This guarantees that distributed replicas of a CRDT converge to the same state as soon as the same set of operations are applied at both replicas (regardless of order). This property is called strong eventual consistency.

We will use two types of CRDTs: An Observed-Remove Set (section Set) and a Last-Writer-Win Register (section Register) and will introduce both in the respective sections.

The reader is referred to Shapiro et. al. [crdt] for a formal treatment and an overview of the various types of CRDTs.

2.4. Datalog

Datalog is a declarative logic programming language. It has a very simple syntax and exactly defined semantics that make it useful as a query language for databases.

We will use Datalog as a specification language for describing semantics of containers. This allows a very concise description that can be directly used in existing, efficient implementations of Datalog. Furthermore, certain aspects of Datalog semantics (monotonicity) make it perfectly adapted for reasoning about distributed systems [calm].

Datalog can also be used as an application level query language (e.g. for answering queries like "all posts about cats posted by people that I have sent a message to in the last year"). The query can be combined with container state resolution queries to a single query. This unified approach of querying data seems very appealing and we hope to investigate it further work.

We require two extensions of Datalog which are well studied in the literature: Built-in predicates (for computing cryptographic functions) and stratified negation. For the register container we also require aggregation. The author needs to do more work to understand how aggregation works in Datalog. All required extensions are available in the open-source Datalog implementation Soufflé, which was used to develop and test the work presented. A runable implementation of container semantics and examples are available in the DMC repository.

The predicate graph is used for accessing the RDF graph that contains all triples to be considered (in the Datalog literature this is called an extensional database). We assume that everything in the graph is content-addressed with ERIS [content-addressable-rdf].

Following built-in predicate are used:

ed25519Verify: Verifies a Ed25519 signature given a public-key and a signature, i.e. ed25519Verify(signature, message, publicKey) if signature is a valid Ed25519 signature of message with public-key publicKey.

The Datalog syntax used is very similar to that of Soufflé with the difference that we always use identifiers starting with upper-case to denote variables and lower-case to indicate symbols (instead of delimiting them in quotation marks). We use a shorthand for IRI constants, e.g. dmc:Set expands to http://purl.org/dmc#Set. Additionally we use the shorthand a for rdf:type.

For a formal introduction to Datalog the reader is referred to the overview paper by Ceri et. al. "What you always wanted to know about Datalog (and never dared to ask)" [datalog].

Alternative Datalog implementations that can be used include AbcDatalog. However, AbcDatalog can only be used for the Set container as it lacks aggregation. A Guile Datalog implementation that will be usable for both containers presented is being developed (guile-schemantic).

3. Distributed Mutable Containers

In a nutshell a Distributed Mutable Container works as follows:

A Distributed Mutable Container is defined as an immutable RDF graph (section Definition).
A container identifier is derived from the immutable reference of the container definition that can be used to resolve the dynamic state of a container (section Identifier).
Operations are created that mutate a container (section Operations).
Resolving a container identifier returns the current container state based on available operations. An overview of this process is given in section [_operations_and_resolving_container_state]. The exact semantics of how to resolve the current state is defined for the different container types in section Container types.

3.1. Definition

A container is created by creating a container definition, which is a content-addressed RDF graph (using ERIS) describing the type of the container and the root public key.

The root public key is the public key of the associated secret key that can be used to mutate the container. The property is formally defined in the RDF vocabulary (section RDF Vocabulary).

Note	The root public key is called so because we intend the list of keys that can mutate the container to be extendable in the future (whereas the root public key stays fixed).

For example a container definition for a set (defined in section Set):

@prefix dmc: <http://purl.org/dmc#> .

<urn:erisx2:AAAELYRT24S2SACAPSMGTNYYHCCQHBDGQH3SC3K3Y35PI73SLTA6JTO7S7XTA66LZQ2OKSAZZI4BWXJCGGJLZ74WIWEMXZ2SXDC4JVAFKM>
    a dmc:SetDefinition ;
    dmc:rootPublicKey <urn:ed25519:pk:O7RZC7XVKCRU4LFT4ACSBP3GQ55PPGBZHCBQJMJ46VKJN63HOMSA> .

Note	The root public key is encoded as specified by RDF Signify [rdf-signify].

Warning

The URN of ERIS encoded content is not yet finalized.

The container definition is immutable. References to the container definition URN always resolve the same content. To resolve the dynamic and mutable stae of a container we use the container identifier.

A container definition has the type (rdf:type property) dmc:SetDefinition or dmc:RegisterDefinition. Both dmc:SetDefinition and dmc:RegisterDefinition are defined as sub-classes of dmc:ContainerDefinition using the rdfs:subClassOf property.

3.2. Identifier

To reference the dynamic and mutable state of a container we use the special URN scheme dmc:CONTAINER_DEFINITION_ERIS_CAPABILITY where CONTAINER_DEFINITION_ERIS_CAPABILITY is the encoded ERIS read capability.

For example the identifier for the container defined in the previous sections is:

dmc:AAAELYRT24S2SACAPSMGTNYYHCCQHBDGQH3SC3K3Y35PI73SLTA6JTO7S7XTA66LZQ2OKSAZZI4BWXJCGGJLZ74WIWEMXZ2SXDC4JVAFKM

3.3. Operations

Operations mutate container state. They are represented as RDF graphs that are content-addressed using ERIS.

The container types define specific operations (i.e. dmc:Add and dmc:Remove for the set container and dmc:Update for the register). All operations are sub-classes of dmc:Operation.

The property dmc:container is used to refer to the container the operation acts on.

For example the operation for adding an element to the set defined above looks like this:

@prefix dmc: <http://purl.org/dmc#> .

<urn:erisx2:AAACG7NGKQZ4ZSR4D4VS3MFQNT5DHHOS7QYNSH5MCQR6HTYO6GTSLB43CFDWNBPNZ5FTG2LFW5QM735GNT6Y44QHCSEAGSRIMYCNJ3ND44>
    a dmc:Add ;
    dmc:container <dmc:AAAELYRT24S2SACAPSMGTNYYHCCQHBDGQH3SC3K3Y35PI73SLTA6JTO7S7XTA66LZQ2OKSAZZI4BWXJCGGJLZ74WIWEMXZ2SXDC4JVAFKM> ;
    rdf:value <urn:erisx2:AAAAV4OIFHWY67XFEHAOQVXUOWTYDVG5TEY6S6IW4PJ4SQLVJJF4MIKNDLKUDPPHDCKLBUIAJQ3U2IEARRPFHEHWFW5NJY7BJUGFESPGDQ> .

To ensure that an operation is authorized a signature of the operation by the secret key associated with the root public key of the container is required. For example:

@prefix rdf-signify: <http://purl.org/rdf-signify#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<urn:erisx2:AAAHZ5E6BYXD7Y3LU625HGB5OXRBLZJQ5TXURD5S5PWCIHEVI7IKKIUF7B5PQRIO52VIEARK5Y2WT4L5BEKF2AXJHN5H3M34XLYJO6HKF4>
    a rdf-signify:Signature ;
    rdf-signify:message <urn:erisx2:AAACG7NGKQZ4ZSR4D4VS3MFQNT5DHHOS7QYNSH5MCQR6HTYO6GTSLB43CFDWNBPNZ5FTG2LFW5QM735GNT6Y44QHCSEAGSRIMYCNJ3ND44> ;
    rdf:value "HOmPzPF8lbUyv/p9Nho/43aFFsLI9y9TiL0vvAHgemesmBt3smNV+2KY/AuZqE756a0jeLkw1dxZPRiu9zrTCg=="^^xsd:base64Binary ;
    rdf-signify:publicKey <urn:ed25519:pk:O7RZC7XVKCRU4LFT4ACSBP3GQ55PPGBZHCBQJMJ46VKJN63HOMSA> .

Note	Both operation signature are content-addressed using ERIS and immutable.

We introduce a helper predicate that can decide if an operation is authorized for a container:

operationAuthorized(Container, Operation) :-
    graph(Container, dmc:rootPublicKey, RootPublicKey),
    graph(Operation, dmc:container, Container),
    graph(Signature, a, rdf-signify:Signature),
    graph(Signature, rdf-signify:message, Operation),
    graph(Signature, rdf:value, SignatureValue),
    ed25519Verify(SignatureValue, Operation, RootPublicKey).

The predicate operationAuthorized(Container,Operation) holds if operation Operation is an authorized operation on container Container.

3.4. State

A container identifier is resolved to a container state that is a description of the current state of the container. The exact semantics of how the container state is resolved is defined in section Container types for the different container types.

For example the state of the set defined above would be resolved as follows after two elements have been added:

@prefix dmc: <http://purl.org/dmc#> .

<dmc:AAAELYRT24S2SACAPSMGTNYYHCCQHBDGQH3SC3K3Y35PI73SLTA6JTO7S7XTA66LZQ2OKSAZZI4BWXJCGGJLZ74WIWEMXZ2SXDC4JVAFKM>
    a dmc:Set ;
    dmc:rootPublicKey <urn:ed25519:pk:O7RZC7XVKCRU4LFT4ACSBP3GQ55PPGBZHCBQJMJ46VKJN63HOMSA> ;
    dmc:member <urn:erisx2:AAAAV4OIFHWY67XFEHAOQVXUOWTYDVG5TEY6S6IW4PJ4SQLVJJF4MIKNDLKUDPPHDCKLBUIAJQ3U2IEARRPFHEHWFW5NJY7BJUGFESPGDQ> ;
    dmc:member <urn:erisx2:AAAF5Q7CXYPZH63IQ3EF4YVMBOEC6YFQH6JQ7YQMG2QYMKDJLWM5LHXRA6XBHJGKMCJL7QTWAH4PFCTDVXIIKP7HOTKGVVIOOAUAXKFNUE> .

Note that the state depends on available operations on the container. Operation transport and synchronization are beyond the scope of this document, we assume that the available operations are given in an RDF graph. Some ideas on how operation transport and synchronization might work are illustrated in section [_applications].

4. Container types

4.1. Set

Sets are unordered collections that hold distinct members. They are a useful base data structure for building complex applications.

Adding and removing elements from a set are, in general, non-commutative operations:

\${x} vv {x} \\ {x} != {x} \\ {x} vv {x}.\$

We will use a little trick to solve this problem. The data structure that implements this trick is called an Observed-Remove Set (OR-Set).

4.1.1. Observed-Remove Set

Instead of keeping track of only the elements in the set, we keep track of tuples of elements along with an unique identifier of the operation that was used to add the element:

Applying the operations \$o_1\$ which adds the element \$a\$ to the an empty set container results in \${(o_1,a)}\$.
We again add the element \$a\$ with the operation \$o_2\$ resulting in \${(o_1, a), (o_2,a)}\$

This set of tuples is just an internal representation. The exposed set is \${a}\$.

When removing an element we must specify the element and operation that added the element: After removing the element \$a\$ that was added with operation \$o_1\$ the internal representation of the set is \${o_2, a}\$. The exposed set remains \${a}\$.

It can be shown that the operations (add and remove) on an OR-Set commute.

The reader is referred to Shapiro et. al. [crdt] for a formal treatment of the Observed-Remove Set.

4.1.2. Definition

A set is defined by creating a container definition of type dmc:SetDefinition (see example in section Definition).

4.1.3. Operations

There are two operations that mutate a set container:

dmc:Add: Adds an element to the set. The element is referred to by the rdf:value property.
dmc:Remove: Removes an element from the set. Instead of referring to the element to be removed directly, the operation refers to the dmc:Add operation that added the elements to be removed with the property dmc:operation (see Observed-Remove Set).

4.1.4. State

The following Datalog program and the predicate setMember(Container,Member) defines the current members of the set given operations in the graph (predicate graph):

setOperationRemoved(Container,Operation) :-
   graph(Operation, a, dmc:Add),
   graph(Operation, dmc:container, Container),
   graph(RemoveOperation, dmc:operation, Operation),
   graph(RemoveOperation, a, dmc:Remove),
   operationAuthorized(Container, RemoveOperation).

setMember(Container, Member) :-
    graph(Container, a, dmc:Set),
    graph(Operation, a, dmc:Add),
    graph(Operation, rdf:value, Member),
    operationAuthorized(Container, Operation),
    ! setOperationRemoved(Container, Operation).

The helper predicate setOperationRemoved checks if the operation has been "removed".

Members of the set are added to the container state using the dmc:member property (see example in section State).

4.2. Register

A register is a datastructure that holds at most a single element. This is useful for singleton objects that can be updated such as user profiles.

The DMC register is based on a Last-Writer-Wins register.

4.2.1. Last-Writer-Wins Register

A Last-Writer-Wins Register (LWW-Register) [crdt] creates an order on updates that change the value by adding a timestamp. The current value is the value of the update with the greatest timestamp.

Note that this requires a form of weak-coordination, namely having timestamps that are consistent wit causal order. It also permits a denial-of-service attack as a malicious agent who is authorized to update the register may use a extremely high timestamp and block updates from other authorized agents (until they have adapted to using an even higher timestamp). The DMC register is not suited for potentially malicious agents that share control over the register. Nevertheless, we believe it is a valuable datastructure for containers where control is kept very tight and authorized agents are trusted to a high degree. A concrete application is user profiles.

4.2.2. Definition

A register is defined by creating a container definition of type dmc:RegisterDefinition.

4.2.3. Operations

There is only one operation to mutate a register:

dmc:Update: Updates the value of the register. A timestamp must be included with the property dmc:timestamp. The value to which the register to be updated to is defined with the rdf:value property.

4.2.4. State

The following Datalog program and the predicate registerValue` defines the current value of the register given the operations in a graph:

registerUpdateTimestampValue(Register, Timestamp, Value) :-
    graph(Operation, a, dmc:Update),
    graph(Operation, dmc:container, Register),
    graph(Operation, dmc:timestamp, Timestamp),
    graph(Operation, rdf:value, Value),
    operationAuthorized(Register, Operation).

registerValue(Register, Value) :-
    _ = max Timestamp : registerUpdateTimestampValue(Register, Timestamp, Value).

The registerUpdateTimestampValue is a helper predicate to get all authorized updates with timestamps and values.

Note	Soufflé has only recently added the aggregation functionality required in the `registerValue` predicate (not included in version 2.0.2).

The state is encoded with type dmc:Register and the current value is indicated with the rdf:value property.

Note	The register may be in a state where it does not hold any value if there has been no update operation.

5. Further work

This initial version is a proof-of-concept intended to start community discussion and get feedback. Following work is planned:

Implementation as underlying containers for an ActivityPub service. This will be done as next milestone in the openEngiadina project. We hope to gain more experience and refine this write-up. A Guile Datalog implementation is also being prepared that will allow easier experimentation with the containers presented.
Delegated authorization: Currently there is only a single root public-key for every container. We would like to investigate how authorization can be delegated in a more fine-grained manner without sharing the root secret-key. Previous related work includes the Vegvisir project [vegivisir].
Garbage collection: Further work is required to describe how operations and previous states can be forgotten.
More efficient serialization. For storage and network transport we currently use the S-Expression based serialization presented for content-addressing RDF. This serialization is quite verbose and can not be directly queried. We would like to investigate formats such as HDT [hdt].
This specification relies on previous work that still requires refinement and finalization (ERIS, Content-addressable RDF and RDF Signify).

Implementation in an ActivityPub service and more practical aspects will be worked on in the context of the openEngiadina project (supported by NGI0 via NLNet). More technical aspects (e.g. delegated authorization) as well as peer-to-peer transport layers will be researched in the context of the DREAM project (supported by NGI Pointer).

6. Acknowledgments

Many thanks to the DREAM team for support and valuable discussions as well as to Christian Weilbach for sparking a deep interest in Datalog.

I would like to acknowledge the value of Sci-Hub and Library Genesis for this work. Independent research such as this would not be possible without liberated access to knowledge. Water the flowers - clean the volcanoes.

This work has been supported by the NLNet Foundation trough the NGI0 Discovery Fund.

Appendix A: RDF Vocabulary

@prefix dmc: <http://purl.org/dmc#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

<http://purl.org/dmc#>
    a owl:Ontology ;
    rdfs:label "Distributed Mutable Containers" ;
    rdfs:comment "Vocabulary for describing state and operations of Distributed Mutable Containers" .

dmc:Container
    a rdfs:Class ;
    rdfs:label "Distributed Mutable Container" .

dmc:rootPublicKey
    a rdfs:Property ;
    rdfs:domain dmc:Container ;
    rdf:label "root public-key" ;
    rdfs:comment "The root public-key of the container" .

dmc:container
    a rdfs:Property ;
    rdfs:range dmc:Container ;
    rdfs:label "container" ;
    rdfs:comment "The associated container." .

dmc:ContainerDefinition
   rdfs:subClassOf dmc:Container ;
   rdfs:label "contaner definition" ;
   rdfs:comment "Definition of a Distributed Mutable Container" .

dmc:State
   rdfs:subClassOf dmc:Container ;
   rdfs:label "container state" ;
   rdfs:comment "State of a Distributed Mutable Container" .

dmc:Operation
    a rdfs:Class ;
    rdf:label "operation" ;
    rdfs:comment "An operation on a Distributed Mutable Container" .

dmc:SetDefinition
    rdfs:subClassOf dmc:ContainerDefinition ;
    rdfs:label "set definition" ;
    rdfs:comment "Definition of a Set" .

dmc:Set
    rdfs:subClassOf dmc:State .

dmc:member
    a rdfs:Property ;
    rdfs:domain dmc:Set .

dmc:Add
    a rdfs:Class ;
    rdfs:subClassOf dmc:Operation .

dmc:Remove
    a rdfs:Class ;
    rdfs:subClassOf dmc:Operation .

dmc:operation
    a rdfs:Property ;
    rdfs:domain dmc:Remove;
    rdfs:range dmc:Add;
    rdfs:label "operation" .

dmc:RegisterDefinition
    rdfs:subClassOf dmc:ContainerDefinition .

dmc:Register
    rdfs:subClassOf dmc:state .

dmc:Update
    a rdfs:Class ;
    rdfs:subClassOf dmc:Operation .

dmc:timestamp
    a rdfs:Property ;
    rdfs:domain dmc:Update .

The vocabulary is also provided as Turtle file.

Appendix B: Copyright

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

[crdt] Marc Shapiro, Nuno Preguiça, Carlos Baquero, Marek Zawirski. A comprehensive study of Convergent and Commutative Replicated Data Types. [Research Report] RR-7506, Inria – Centre Paris-Rocquencourt; INRIA. 2011, pp.50.
[content-addressable-rdf] openEngiadina. Content-addressable RDF. 2020
[eris] openEngiadina. Encoding for Robust Immutable Storage. 2020
[rdf-signify] openEngiadina. RDF Signify. 2020
[vegvisir] Kolbeinn Karlsson, Danny Adams, Edwin Ma, Weitao Jiang, Robbert van Renesse, Hakim Weatherspoon, and Stephen Wicker Vegvisir: A Partition-Tolerant Blockchain for the Internet-of-Things In the 2018 IEEE 38th International Conference on Distrbuted Computing Systems(ICDCS), Vienna, Austria, July 2018.
[calm] Hellerstein, Joseph M., and Peter Alvaro. "Keeping CALM: when distributed consistency is easy." arXiv preprint arXiv:1901.01930 (2019).
[datalog] Ceri, Stefano, Georg Gottlob, and Letizia Tanca. "What you always wanted to know about Datalog(and never dared to ask)." IEEE transactions on knowledge and data engineering 1.1 (1989): 146-166.
[hdt] Fernández, Javier D., et al. "Binary RDF representation for publication and exchange (HDT)." Journal of Web Semantics 19 (2013): 22-41.