aboutsummaryrefslogtreecommitdiff
path: root/doc/eris.adoc
blob: 71d2f434741c5aada05db857523179108fd10bb1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
= Encoding for Robust Immutable Storage (ERIS)
pukkamustard <pukkamustard@posteo.net>
0.2.0-draft
:toc: left
:xrefstyle: short
:sectnums:
:sectanchors:

[abstract]
The Encoding for Robust Immutable Storage (ERIS) is an encoding of arbitrary content into a set of uniformly sized, encrypted and content-addressed blocks as well as a short identifier - the _read capability_. The content can be reassembled from the encrypted blocks only with the read capability. The encoding is defined independent of any storage or transport layer. Together with content-addressable storage, ERIS can be used as a building block  for robust and decentralized applications.

== Introduction

Unavailability of content on computer networks is a major cause for reduced reliability of networked services <<Polleres2020>>.

Availability can be increased by caching content on multiple peers. However most content on the Internet is identified by its location. Caching location-addressed content is complicated as the content receives a new location.

An alternative to identifying content by its location is to identify content by its content itself. This is called content-addressing. The hash of some content is computed and used as an unique identifier for the content.

Content-addressed content is much easier to cache as the content is completely decoupled from any physical location. It is much easier to ensure availability of content-addressed content than it is for location-addressed content.

Authenticity of content is automatically ensured with content-addressing (when using a cryptographic hash) as the identifier of the content can be computed and be checked to match the requested identifier.

However, naive content-addressing has certain drawbacks:

- Large content is stored as a large blob. In order to optimize storage and network operations it is better to split up content into smaller uniformly sized blocks and reassemble blocks when needed.
- Confidentiality: Content is readable by all peers involved in transporting, caching and storing content.

ERIS is an encoding that addresses these issues by splitting blocks into small uniformly sized blocks and encrypting blocks.

=== Objectives

The objectives of ERIS are:

Availability :: Content encoded with ERIS can be easily replicated and cached.
Authenticity :: Authenticity of content can be verified efficiently.
URN reference :: ERIS encoded content can be referrenced with a single URN.
Storage efficiency :: ERIS can be used to encode small content (< 1 kibibyte) as well as large content (> many gibibyte) with reasonable storage overhead.
Simplicity :: The encoding should be as simple as possible in order to allow correct implementation on various platforms and in various languages.


=== Scope

ERIS describes how arbitrary content (sequence of bytes) can be encoded into a set of uniformly sized blocks and an identifier with which the content can be decoded from the set of blocks.

ERIS does not prescribe how the blocks should be stored or transported over network. The only requirement is that a block can be referenced and accessed (if available) by the hash value of the contents of the block. In section <<_storage_and_transport_layers>> we show how existing technology (including IPFS) can be used to store and transport blocks.

There is also no support for grouping content or mutating content. In section <<_namespaces>> we describe how such functionality can be implemented on top of ERIS.

The lack of certain functionalities is intentional. ERIS is an attempt to find a minimal common basis on which higher functionality can be built. Lacking functionality in ERIS is an acknowledgment that there are many ways of implementing such functionality at a different layer that may be optimized for certain use-cases.

=== Previous work

ERIS is inspired and based on the encoding used in the file-sharing application of https://gnunet.org/[GNUNet] - Encoding for Censorship-Resistant Sharing (ECRS) <<ECRS>>.

ERIS differs from ECRS in following points:

Cryptographic primitives :: ECRS itself does not specify any cryptographic primitives but the GNUNet implementation uses the SHA-512 hash and AES cipher. ERIS uses the Blake2b-256 cryptographic hash <<RFC7693>> and the ChaCha20 stream cipher <<RFC8439>>. This improves performance, storage efficiency (as hash references are smaller) and allows a convergence secret to be used (via Blake2b keyed hashing; see <<_convergence_secret>>).
Block size :: ECRS uses a fixed block size of 32 KiB. This is inefficient when encoding small content. ERIS allows a block size of 1 KiB or 32 KiB, allowing efficient encoding of small and large content (see <<_block_size>>).
URN :: ECRS does not specify an URN for referring to encoded content (this is specified as part of the GNUNet file-sharing application). ERIS specifies an URN for encoded content regardless of encoding application or storage and transport layer.
Namespaces :: ECRS defines two mechanisms for grouping and discovering encoded content (SBlock and KBlock). ERIS does not specify any such mechanisms (see <<_namespaces>>).

Other related projects include Tahoe-LAFS and Freenet. The reader is referred to the ECRS paper <<ECRS>> for an in-depth explanation and comparison of related projects.

=== Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 <<RFC2119>>.

We use binary prefixes for multiples of bytes, i.e: 1024 bytes is 1 kibibyte (KiB), 1024 kibibytes is 1 mebibyte (MiB) and 1024 mebibytes is 1 gigibytes (GiB).

TODO a glossary of terms used.

== Specification of ERIS

=== Cryptographic Primitives

The cryptographic primitives used by ERIS are a cryptographic hash funciton, a symmetric key cipher and a padding algorithm. The hash function and cipher are readily available in open-source libraries such as https://github.com/jedisct1/libsodium[libsodium] or https://monocypher.org/[Monocypher]. The padding algorithm can be implemented with reasonable effort.

==== Cryptographic Hash Function

Blake2b <<RFC7693>> with output size of 256 bit (32 byte). We use the keying feature and refer to the key used for keying Blake2b as the _hashing key_.

Provides the functions `Blake2b-256(INPUT,HASHING-KEY)` for keyed hashing and `Blake2b-256(INPUT)` for unkeyed hashing.

==== Symmetric Key Cipher
ChaCha20 (IETF variant) <<RFC8439>>. Provides `ChaCha20(INPUT, KEY)`,  where `INPUT` is an arbirtarty length byte sequence and `KEY` is the 256 bit encryption key. The output is the encrypted byte sequence.

The 32 bit initial counter as well as the 96 bit nonce are set to 0. We can safely use the zero nonce as we never reuse a key.

Decryption is done with the same function where `INPUT` is the encrypted byte sequence.

==== Padding Algorithm

We use a byte padding scheme to ensure that input content size is a multiple of a block size. Provides following functions:

`PAD(INPUT,BLOCK-SIZE)` :: For `INPUT` of size `n` adds a mandatory byte valued `0x80` (hexadecimal) to `INPUT` followed by `m < BLOCK-SIZE - 1` bytes valued `0x00` such that `n + m + 1` is a multiple of `BLOCK-SIZE`.
`UNPAD(INPUT,BLOCK-SIZE)` :: Starts reading bytes from the end of `INPUT` until a `0x80` is read and then returns bytes of `INPUT` before the `0x80`. Throws an error if a value other than `0x00` is read before reading `0x80` or if no `0x80` is read after reading  `BLOCK-SIZE - 1` bytes from the end.

This is the padding algorithm implemented in https://libsodium.gitbook.io/doc/padding[libsodium]footnote:[This padding algorithm is apparently also specified in ISO/IEC 7816-4. However, the speicifcation is not openly available. Fuck you ISO.].

=== Block Size

ERIS uses two block sizes: 1KiB (1024 bytes) and 32KiB (32768 bytes). The block size must be specified when encoding content.

Both block sizes can be used to encode content of arbitrary size. The block size of 1KiB is an optimization towards smaller content.

Content smaller than TODO SHOULD be encoded with block size 1KiB, content larger than TODO SHOULD be encoded with block size 32KiB.

The block size is encoded in the read capability and the decoding process is capable of handling both cases.

[NOTE]
====
When using block size 32KiB to encode content smaller than 1KiB, the content will be encoded in a 32KiB block. This is a storage overhead of over 3100%. When encoding very many pieces of small content (e.g. short messages or cartographic nodes) this overhead is not acceptable.

On the other hand, using small block sizes increases the number of internal nodes that must be used to encode the content (see <<_collect_reference_key_pairs_in_nodes>>). When encoding larger content it is more efficient to use a block size of 32KiB.
====

=== Convergence Secret

Using the hash of the content as key is called _convergent encryption_.

Because the hash of the content is deterministically computed from the content, the key will be the same when the same content is encoded twice. This results in de-duplication of content. Convergent encryption suffers from two known attacks: The Confirmation Of A File Attack and The Learn-The-Remaining-Information Attack <<Zooko2008>>. A defense against both attacks is to use a _convergence secret_. This results in different encoding of the same content with different convergence secret.

If no convergence secret is specified a null convergence secret is used (32 bytes of zeroes).

The convergence secret is implemented as the keying feature of the Blake2 cryptographic hash <<RFC7693>>.

=== Encoding

Inputs to the encoding process are:

`CONTENT` :: An arbitary length byte sequence of content to be encoded.
`CONVERGENCE-SECRET` :: A 256 bit (32 byte) byte sequence (see <<_convergence_secret>>).
`BLOCK-SIZE` :: The block size used for encoding in bytes can be either 1024 (1KiB) or 32768 (32KiB) (see <<_block_size>>).

Content is encoded by first splitting into uniformly sized blocks, encrypting the blocks and computing references to the blocks. If there are multiple references to blocks they are collected in nodes that have the same size as content blocks. The nodes are encrypted and references to the nodes are computed. This process is repeated until there is a single root reference.

References to nodes and blocks of content consist of a reference to an encrypted block and a key to decrypt the block - a _reference-key pair_. The process of encrypting a block and computing a reference-key pair is explained in <<_encrypt_block_and_compute_reference_key_pair>>.

The encoding process constructs a tree of reference-key pairs that reference nodes that hold references to nodes of a lower level or to content.

The number of reference-key pairs collected into a node is called the _arity_ of the tree and depends on the block size. For block size 1KiB the arity of the tree is 16, for block size 32KiB the arity is 512.

An encoding of a content that is split into eight blocks is depicted in <<figure_merkle_tree>>. For illustration purposes the tree is of arity 2 (instead of 16 or 512).

[[figure_merkle_tree]]
.Encoding of content as tree. Solid edges are concatenations of reference-key pairs as described in <<_collect_reference_key_pairs_in_nodes>>. Dotted edges are encryption and computation of reference-key pairs as described in <<_encrypt_block_and_compute_reference_key_pair>>.
image::eris-merkle-tree.svg[Merkle Tree,opts=inline]

The block-size, the level of the root reference and the root reference-key pair itself are the necessary pieces of information required to decode content. The tuple consisting of block size, level, root reference and key is called the _read capability_.

The encrypted blocks and the read capability are the outputs of the encoding process.

A pseudo-code implementation of the encoding process:

[source,pseudocode]
----
ERIS-Encode(CONTENT, CONVERGENCE-SECRET, BLOCK-SIZE):
    // initialize empty list of blocks to be output
    BLOCKS := []

    // initialize level to 0
    LEVEL := 0

    // split the input content into uniformly sized blocks and encode
    LEVEL-0-BLOCKS, RK-PAIRS := Split-Content(CONTENT, BLOCK-SIZE)

    // add blocks from level 0 to blocks to be output
    BLOCKS := BLOCKS ++ LEVEL-0-BLOCKS

    // loop until there is a single root reference
    WHILE Length(RK-PAIRS) > 1:
        LEVEL-BLOCKS, RK-Pairs := Collect-RK-Pairs(RK-PAIRS, CONVERGENCE-SECRET, BLOCK-SIZE)

        // add blocks to blocks to be output and increase the level counter
        BLOCKS := BLOCKS ++ LEVEL-BLOCKS
        LEVEL := LEVEL + 1

    // extract the root reference-key pair
    ROOT-RK-PAIR := RK-PAIRS[0]
    ROOT-REFERENCE, ROOT-KEY := ROOT-RK-PAIR

    // return blocks and read-capability
    RETURN BLOCKS, BLOCK-SIZE, LEVEL, ROOT-REFERENCE, ROOT-KEY
----

The sub-process `Split-Content` and `Collect-RK-Pairs` are explained in the following sections.

==== Splitting Input Content into Blocks

Input content is split into blocks of size at most block size such that only the last content block may be smaller than block size.

The last content block is always padded according to the padding algorithm to block size. If the size of the padded last block is larger than block size it is split into content blocks of block size.

A pseudo code implementation:

[source,pseudocode]
----
Split-Content(CONTENT,CONVERGENCE-SECRET,BLOCK-SIZE):
    // initialize list of blocks and reference-key pairs to output
    BLOCKS := []
    RK-PAIRS := []

    // read blocks of size BLOCK-SIZE from CONTENT
    WHILE CONTENT-BLOCK, LAST? := READ(CONTENT, BLOCK-SIZE):

        IF LAST?:
            // pad block if it is the last
            PADDED := PAD(CONTENT-BLOCK, BLOCK-SIZE)

            IF Length(PADDED) > BLOCK-SIZE:
                PADDED-0, PADDED-1 := SPLIT(PADDED, BLOCK-SIZE)
                ENCRYPTED-BLOCK-0, RK-PAIR-0 := Encrypt-Block(PADDED-0, CONVERGENCE-SECRET)
                ENCRYPTED-BLOCK-1, RK-PAIR-1 := Encrypt-Block(PADDED-1, CONVERGENCE-SECRET)
                BLOCKS := BLOCKS ++ [ENCRYPTED-BLOCK-0, ENCRYPTED-BLOCK-1]
                RK-PAIRS := RK-PAIRS ++ [RK-PAIR-0, RK-PAIR-1]
            ELSE:
                ENCRYPTED-BLOCK, RK-PAIR := Encrypt-Block(PADDED, CONVERGENCE-SECRET)
                BLOCKS := BLOCKS ++ [ENCRYPTED-BLOCK]
                RK-PAIRS := RK-PAIRS ++ [RK-PAIR]

         ELSE:
            ENCRYPTED-BLOCK, RK-PAIR := Encrypt-Block(CONTENT-BLOCK, CONVERGENCE-SECRET)
            BLOCKS := BLOCKS ++ [ENCRYPTED-BLOCK]
            RK-PAIRS := RK-PAIRS ++ [RK-PAIR]

    RETURN BLOCKS, RK-PAIRS
----

NOTE: If the length of the last content block is exactly block size, then padding will result in a padded block that is double the block size and must be split.

==== Encrypt Block and Compute Reference-Key Pair

A _reference-key pair_ is a pair consisting of a reference to an encrypted block and the key to decrypt the block. Reference and key are both 32 bytes long. The concatenation of  a reference-key pair is 64 bytes long (512 bits).

The `Encrypt-Block` function encrypts a block and returns the encrypted block along with the reference-key pair:

[source,pseudocode]
----
Encrypt-Block(INPUT, CONVERGENCE-SECRET):
    KEY := Blake2b-256(INPUT,CONVERGENCE-SECRET)
    ENCRYPTED-BLOCK := ChaCha20(INPUT,KEY)
    REFERENCE := Blake2b-256(ENCRYPTED-BLOCK)
    RETURN ENCRYPTED-BLOCK, REFERENCE, KEY
----

The convergence-secret MUST NOT be used to compute the reference to the encrypted block.

==== Collect Reference-Key Pairs in Nodes

Reference-key pairs are collected into nodes of size block size by concatenating reference-key pair. The node is encrypted, and a reference-key pair to the node is computed. This results in a sequence of reference-key pairs that refer to nodes containing reference-key pairs at a lower level - a tree.

If there are less than arity number of references-key pairs to collect in a node, then the node is filled with missing number of _null reference-key pairs_ - 64 bytes of zeros. The size of a node is always equal the block size (implemented with the `FILL-WITH-NULL-RK-PAIRS` function).

A pseudo-code implementation of `Collect-RK-Pairs`:

[source,pseudocode]
----
Collect-RK-Pairs(INPUT-RK-PAIRS, CONVERGENCE-SECRET, BLOCK-SIZE):
    // number of reference-key pairs in a node
    ARITY := BLOCK-SIZE / 64

    // initialize blocks and reference-key pairs to output
    BLOCKS := []
    OUTPUT-RK-PAIRS := []

    // take ARITY reference-key pairs from INPUT-RK-PAIRS at a time
    WHILE RK-PAIRS-FOR-NODE := TAKE(INPUT-RK-PAIRS, ARITY):
        // make sure there are exactly ARITY reference-key pairs in node
        RK-PAIRS-FOR-NODE := FILL-WITH-NULL-RK-PAIRS(RK-PAIRS-FOR-NODE, ARITY)

        // concat reference-key pairs to node
        NODE := CONCAT(RK-PAIRS-FOR-NODE)

        // encrypt node and compute reference-key pair
        BLOCK, RK-TO-NODE := Encrypt-Block(NODE, CONVERGENCE-SECRET)

        // add node to output
        BLOCKS := BLOCKS ++ [BLOCK]
        OUTPUT-RK-PAIRS := OUTPUT-RK-PAIRS ++ [RK-TO-NODE]

    RETURN BLOCKS, OUTPUT-RK-PAIRS
----

==== Streaming

The encoding process can be implemented to encode a stream of content while immediately outputting encrypted blocks when ready and eagerly collecting reference-key pairs to nodes. This allows the encoding of larger-than-memory content.

For an example, see https://gitlab.com/openengiadina/eris/-/raw/main/eris/encode.scm[the reference Guile implementation].

=== Decoding

Given an ERIS read capability and access to blocks via a block-storage the content can be decoded.

[source, pseudocode]
----
ERIS-Decode-Recurse(LEVEL, REFERENCE, KEY):
    IF LEVEL == 0:
        ENCRYPTED-CONTENT-BLOCK := Block-Storage-Get(REFERENCE)
        RETURN ChaCha20(CONTENT-BLOCK, KEY)
    ELSE:
        ENCRYPTED-NODE := Block-Storage-Get(REFERENCE)
        NODE := ChaCha20(ENCRYPTED, KEY)
        OUTPUT := []
        WHILE SUB-REFERENCE, SUB-KEY := Read-RK-Pair-From-Node(NODE):
            OUTPUT := OUTPUT ++ [ERIS-DECODE-Recurse(LEVEL - 1, SUB-REFERENCE, SUB-KEY)]
        RETURN CONCAT(OUTPUT)

ERIS-Decode(BLOCK-SIZE, LEVEL, ROOT-REFERENCE, ROOT-KEY):
    PADDED := ERIS-Decode-Recurse(LEVEL, ROOT-REFERENCE, ROOT-KEY)
    RETURN UNPAD(PADDED, BLOCK-SIZE)
----

Where the block-storage can be accessed as follows:

`Block-Storage-Get(REFERENCE)` :: Returns a block such that `Blake2b-256(Block-Storage-Get(REFERENCE)) == REFERENCE` or throws an error.

A streaming decoding procedure can be implemented where the content can be output block wise and does not need to be kept in memory for unpadding. For an example, see https://gitlab.com/openengiadina/eris/-/raw/main/eris/decode.scm[the reference Guile implementation].

Random access is possible by only decoding selected sub-trees.

=== Binary Encoding of Read Capability

The read-capability consisting of the block-size, level of root reference-key pair as well as the root reference-key pair form the necessary pieces of information required to decode content.

We specify an binary encoding of the read-capability 66 bytes:

|===
|Byte offset | Content | Length (in bytes)

| 0 | block size (`0x00` for block size 1KiB and `0x01` for block size 32KiB)| 1
| 1 | level of root reference-key pair as unsigned integer | 1
| 2 | root reference | 32
| 34 | root key | 32
|===

The initial field (block size) also encodes the ERIS version. Future versions of ERIS MUST use different codes to encode block sizes.

TODO using 1 byte to encode level limits size of content that can be encoded. Add a comment on that.

=== URN

A read-capability can be encoded as an URN: `urn:eris:BASE32-READ-CAPABILITY`, where `BASE32-READ-CAPABILITY` is the unpadded Base32 <<RFC4648>> encoding of the read capability.

For example the ERIS URN of the UTF-8 encoded string "Hail ERIS!" (with block size 1KiB and null convergence secret):

`urn:erisx2:AAAAV4OIFHWY67XFEHAOQVXUOWTYDVG5TEY6S6IW4PJ4SQLVJJF4MIKNDLKUDPPHDCKLBUIAJQ3U2IEARRPFHEHWFW5NJY7BJUGFESPGDQ`

== Applications

=== Storage and Transport Layers

=== Namespaces

== Implementations

A reference implementation is available in Guile: https://gitlab.com/openengiadina/eris/

== Acknowledgments

[appendix]
== Test Vectors

=== Machine Readable

A set of test vectors are provided in the  https://gitlab.com/openengiadina/eris/-/tree/main/test-vectors[ERIS repository]. Implementations of the ERIS encoding MUST be able to satisfy the test vectors.

The test vectors are given as machine-readable JSON files. For example the test vector `eris-test-vector-00.json`:

[source,json]
----
{
  "id": 0,
  "name": "short string (block size 1KiB)",
  "description": "Encode the UTF-8 encoding of the string \"Hail ERIS!\" with block-size 1KiB and null convergence-secret.",
  "content": "JBQWS3BAIVJESUZB",
  "convergence-secret": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
  "block-size": 1024,
  "read-capability": {
    "block-size": 1024,
    "level": 0,
    "root-reference": "BLY4QKPNR57OKIOA5BLPI5NHQHKN3GJR5F4RNY6TZFAXKSSLYYQQ",
    "root-key": "JUNNKQN544MJJMGRABGDOTJAQCGF4U4Q6YW3VVHD4FGQYUSJ4YOA"
  },
  "urn": "urn:erisx2:AAAAV4OIFHWY67XFEHAOQVXUOWTYDVG5TEY6S6IW4PJ4SQLVJJF4MIKNDLKUDPPHDCKLBUIAJQ3U2IEARRPFHEHWFW5NJY7BJUGFESPGDQ",
  "blocks": {
    "BLY4QKPNR57OKIOA5BLPI5NHQHKN3GJR5F4RNY6TZFAXKSSLYYQQ": "Y5JKORBMSAMYSSNW5CJN3GPNMY6G6MJ3EWNEBE5W2KT6PTN3XETYGQ6N47ZK7NSE2VQZPH7RQOF53UATSC7HO7AK6K3K3VI7A2CQOIQIFVO7F2JTBA7EQ3U7TXND5EUNX2GIVARNJ4DC5UBADDDJKZ3ILMVMW3NT5H6LTWD4QTRLYHXZBRK3GUM5XSYEDJ2MTYHHB4DWFKVR5IN62ZUWTEVFMAEHIF655PL4DUQ4T6XAHBMAXTJOOMIS2BIRD3SLPRQZABTWUTFZXMQF2JBUYUZ45SXJIFX2AOFGIHSMH5DY274W4BJXCZULZUMAQQ3JKHQIUJX5NGDZF4HPTHEHI4S6ZJWRJ53SQ767OT63OESXBVR66ZEBPGSEIP2LBXAWWL2F6YHB4R2OOMQWWFVK62GMRZ22DPHHS5ZOQTYU5VHK44Y4OIFVI4SVST6NOQGMWRRS7HAIBF6QP463A5J36RP2XLC2EVSN6FOX4R5G4JMT6TAFIJQMRD4GUNATZ7YDDNNJUGJK27ILNFQLJYD6A7JV7R5ZAL6G6MPT6S3LNNID6ACS6BYE43OQHQ6OUFOQPTGAOWT3BFXEUHPYDSI3CP72S7FVH2BKWLMIBKNSTM272CY7XFONZFUZS6IRPKT3MTI3LOVSJRJK3YUR2TVLSK3EFWJAZRMP2PFLRBEQT6Q2RTCIJI2QB5PBZTSJDTU37J3VHMI6F275AFPMJOOQTLCBTAWD5W2GITDM545CVFCGZ6LYK65GRQPZEISY2HA764K3ST6UUDXFW5LL2VKK2RHA3MNPIIMZPM22BGSZRQEQPGQXEQU3B5LYC42EQMOHCGLRIL6VUAODWYYGASLF2F6LQAE22AXJHPU76HSFGT7DICZ7O6WMUWEMYDOCAP5AB5OFWYRJ7JKR574RB6UE55XPNKXPQP32VYBSGMMGSSGFFQU25AQSNIZYSMRLDIPUY4L35ZMFX4LVS2JTXC4K6NPOH7F3QWE6WT463OETVGIJSYRVU4O4F6YPSBJDN5QC3AFAU5HKJRNOSDAVLNJ7V3FWBUHDL5VY3PEXECOKI6BKGUE27NDKC4DJRRBOTU22CIDTYX4YCGIPGFO4UG2AK5EI54DSU75BCQ5SSFWSKB3JY2SLAA6EJQR7TDSGEOZK7F74Q7UHWVV4YS6GWMOCSCFQF7TMLDA5BB7MZYTX7QSUYNCRUKHOBONPHYSFBIUZON5XBWRBXIIHJGDJYX5BBBCQZWWHJZXGPC7G45VXKT7Y6RUR254FF2OPJGQEHWNKJ47QR3EXQQIA3MYK542XBOAWLOIHSJCP447WF74DDVFK7LM2DRFHW2RVX3EX3C5VXQT6XHJLPED3XJ5EW75323XN2YLVKKNUP45P7GYOORBYMSNB4MM24H3UY7JAMYTIYOBGTMVV2W6PMVFWNPPSE3AXFPS6GNBTQTAUUPJDXFREUMZOZGLD5VGO7PWRSL2GQP7MDY2IPHQCMTCOFYKSPGMI6OFUELITQPFTZUM7V2IZOMPIXKHLTBSRZUFLABZXSPHWZ626OFJ6WI7KUW4AT3KMMEO33FH3QGTOKCAACPQYM7UCFHN5QGQCVB7SEH6WXISV6AJB7LCDLKRM6JCANOBLDD57Z44GHLIYEJI"
  }
}
----

The fields of JSON test vectors are:

`id` :: Numeric identifier of the test vector.
`name` :: Short human readable name.
`description` :: Human readable description of the test.
`content` :: The binary content to be encoded as Base32 (unpadded) string.
`convergence-secret` :: The convergence secret to be used as Base32 string.
`block-size` :: Block size that should be used for encoding in bytes (either 1024 or 32768).
`read-capability` :: JSON map containing the components of the read capability. This is not used in tests but is here as a help for developers.
`urn` :: The ERIS URN of the content.
`blocks` :: A JSON map of blocks required to decode the content given the URN. Key and field are encoded as Base32 strings.

Implementations MUST verify that the content encodes to the URN given the specified block size and convergence secret and verify that given the URN and blocks the content can be decoded.

=== Large content


In order to verify implementations that encode content by streaming (see <<_streaming>>) URNs of large contents that are generated in a specified way are provided:

|===
|Test name | Content size | Block size | URN | Level of root reference
| 100MiB (block size 1KiB) | 100MiB |  1KiB | `urn:erisx2:AACXPZNDNXFLO4IOMF6VIV2ZETGUJEUU7GN4AHPWNKEN6KJMCNP6YNUMVW2SCGZUJ4L3FHIXVECRZQ3QSBOTYPGXHN2WRBMB27NXDTAP24` | 5
| 1GiB (block size 32KiB) | 1GiB | 32KiB | `urn:erisx2:AEBFG37LU5BM5N3LXNPNMGAOQPZ5QTJAV22XEMX3EMSAMTP7EWOSD2I7AGEEQCTEKDQX7WCKGM6KQ5ALY5XJC4LMOYQPB2ZAFTBNDB6FAA` | 2
| 256GiB (block size 32KiB) | 256GiB | 32KiB | `urn:erisx2:AEBZHI55XJYINGLXWKJKZHBIXN6RSNDU233CY3ELFSTQNSVITBSVXGVGBKBCS4P4M5VSAUOZSMVAEC2VDFQTI5SEYVX4DN53FTJENWX4KU` | 3
|===

Content is the ChaCha20 stream using a null nonce and the key which is the Blake2b hash of the UTF-8 encoded test name (e.g. `KEY := Blake2b-256("100MiB (block size 1KiB)")`).

[appendix]
== Changelog

[discrete]
=== link:eris-v0.1.html[v0.1.0 (11. June 2020)]

Initial version.

[discrete]
=== http://purl.org/eris[v0.2.0-draft (UNRELEASED)]

Major update of encoding that removes the _verification capability_ - ability to verify integrity of content without reading content.


[appendix]
== Copyright

This work is licensed under a http://creativecommons.org/licenses/by-sa/4.0/[Creative Commons Attribution-ShareAlike 4.0 International License].

[bibliography]
== References

- [[[content-addressable-rdf]]] openEngiadina. https://openengiadina.net/papers/content-addressable-rdf.html[Content-addressable RDF]. 2020
- [[[rdf-signify]]] openEngiadina. https://openengiadina.net/papers/rdf-signify.html[RDF Signify]. 2020
- [[[Polleres2020]]] Polleres, Kamdar, Fernández, Javier David, Tudorache & Musen. https://epub.wu.ac.at/6371/1/IPM_workingpaper_02_2018.pdf[A more decentralized vision for Linked Data]. 2020
- [[[ECRS]]] Grothoff, Grothoff, Horozov, & Lindgren. https://grothoff.org/christian/ecrs.pdf[An encoding for censorship-resistant sharing]. 2003
- [[[RFC2119]]] S. Bradner. https://tools.ietf.org/html/rfc2119[Key words for use in RFCs to Indicate Requirement Levels]. 1997
- [[[RFC4648]]] S. Josefsson. https://tools.ietf.org/html/rfc4648[The Base16, Base32, and Base64 Data Encodings]. 2006
- [[[RFC7049]]] C. Bormann & P. Hoffman. https://tools.ietf.org/html/rfc7049[Concise Binary Object Representation (CBOR)]. 2013
- [[[RFC7693]]] M-J. Saarinen & J-P. Aumasson. https://tools.ietf.org/html/rfc7693[The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC)]. 2015
- [[[RFC8439]]] Nir & Langley. https://tools.ietf.org/html/rfc8439[ChaCha20 and Poly1305 for IETF Protocols]. 2018
- [[[Zooko2008]]] Zooko Wilcox-O'Hearn. https://tahoe-lafs.org/hacktahoelafs/drew_perttula.html[Drew Perttula and Attacks on Convergent Encryption]. 2008