aboutsummaryrefslogtreecommitdiff
path: root/examples/dedup-fs/Readme.org
blob: c480efc0a5dc3c4c73cf84b7bdeb62385c9ca384 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
#+TITLE: De-duplicated File-Systems with ERIS
#+PROPERTY: header-args:scheme :session *eris-repl* :eval never-export

#+BEGIN_ABSTRACT
This is an experiment to investigate how ERIS can be used to encode file-systems so that the same content appearing in two file-systems is de-duplicated.
#+END_ABSTRACT

* ERIS for De-duplication

ERIS (Encoding for Robust Immutable Storage) defines how any content can be encoded into a set of uniformly sized, encrypted blocks. The original content can be decoded from the blocks only with a special key - a read capability, which can be encoded as an URN.

For example the ERIS URN of the string ~"Hello World!"~ is:

#+BEGIN_SRC scheme :exports both
(use-modules (eris)
             (eris read-capability)
             (ice-9 receive))

(receive (read-capability blocks)
    (eris-encode "Hello World!")
  (read-capability->string read-capability))
#+END_SRC

#+RESULTS:
: urn:erisx2:AAAASDIKPSHNFCH7TZKPNTTSA4AXEPOMMRB4LCA6VC6YXL2YPATZGTBBUWKF7SN555Y7MLVVIPQPFUHUKPE7Y4GVPCETLN5ZMXJME45TKA

Without this URN the blocks are just uniformly sized pieces of gibberish (1KiB or 32KiB).

In order to decode the content we need to be able to de-reference individual blocks by their hash (content-addressing). A simple key-value store suffices, but one can also use systems such as IPFS, GNUNet, Named Data Networking (I think!) or a simple HTTP service to store and transport blocks. ERIS de-couples content from underlying storage and transport networks.

ERIS encodes content by reading fixed-size blocks of content, encrypting them and collecting them together in a tree (see the [[http://purl.org/eris][spec]] for more information).

If the same content is read twice it is encoded into the same block.

We will attempt to encode different formats of archives and images of Guix packages to see how well ERIS de-duplicates shared content.

** Intermezzo: Storing blocks in VHashes

By default the Guile ERIS implementation stores blocks in an alist. We can tell the encoder to store the blocks elsewhere by providing an SRFI-171 reducer that takes care of the blocks.

In this example we will store blocks in a VHash (with blocks as values and the references to the blocks - the Blake2b hash of the block - as keys). This will allow us to do more simpler merging and counting of blocks (merging a VHash is easier than merging an alist).

#+BEGIN_SRC scheme
(use-modules (ice-9 vlist)
             (rnrs bytevectors))

;; special care must be given to use a appropriate hash function...
(define (ref-hash key size)
  (modulo
   (bytevector-uint-ref key 0 (endianness big) 32)
   size))

(define (vhash-block-ref blocks ref)
  (cond
   ((vhash-assoc ref blocks bytevector=? ref-hash) => cdr)
   (else #f)))

(define (vhash-block-cons ref block blocks)
  (if (vhash-block-ref blocks ref)
      blocks
      (vhash-cons ref block blocks ref-hash)))

(define vhash-block-reducer
  (case-lambda

    ;; initialize empty VHash
    (() vlist-null)

    ;; return blocks without post-processing
    ((blocks) blocks)

    ;; add a new block
    ((blocks ref-block)
     (vhash-block-cons (car ref-block) (cdr ref-block) blocks))))

(define (vhash-block-merge vh1 vh2)
  (vhash-fold vhash-block-cons vh1 vh2))
#+END_SRC

#+RESULTS:
: #<unspecified>

** Measuring de-duplication

Some helpers to count de-duplicated blocks:

#+BEGIN_SRC scheme
(use-modules (eris)
             (ice-9 receive)
             (ice-9 format))


(define (eris-encode->vhash-blocks file block-size)
  "Encode file and return the blocks in a VHash"
  (receive (_ blocks)
      (call-with-input-file file
        (lambda (port) (eris-encode port
                                    #:block-size block-size
                                    #:block-reducer vhash-block-reducer))
        #:binary #t)
    blocks))

(define* (measure-eris-de-duplicaiton file-1 file-2 #:key (block-size eris-block-size-large))
  (let* ((blocks-1 (eris-encode->vhash-blocks file-1 block-size))
         (blocks-2 (eris-encode->vhash-blocks file-2 block-size))
         (block-count-1 (vlist-length blocks-1))
         (block-count-2 (vlist-length blocks-2))
         (block-count-total (vlist-length (vhash-block-merge blocks-1 blocks-2)))
         (de-duped (- (+ block-count-1 block-count-2)
                      block-count-total))
         (de-duped-fraction (/ de-duped (min block-count-1 block-count-2))))
    (format #f "De-duplicated ~d blocks (~,2f%), block counts: ~d/~d, block-size: ~d" de-duped (* 100 de-duped-fraction) block-count-1 block-count-2 block-size)))
#+END_SRC

#+RESULTS:
: #<unspecified>

We can use this to measure how much ERIS can de-duplicate.

A simple example:

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton "Readme.org" "Readme.org")
#+END_SRC

#+RESULTS:
: De-duplicated 1 blocks (100.00%), block counts: 1/1, block-size: 32768

ERIS can de-duplicate the same file by 100%.

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton "Readme.org" "../ipfs.org")
#+END_SRC

#+RESULTS:
: De-duplicated 0 blocks (0.00%), block counts: 1/1, block-size: 32768

But nothing between this file and the [[../ipfs.org][IPFS demo]].

* A Tale of Two Guix Packages

Recently Ludovic Courtès has investigated the similarity between subsequent package revisions (see [[https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00258.html][post to guix-devel mailing-list]]) and shown that this can be quite high. For subsequent revisions of the Emacs package ~85% of the total size is the same!

Downloading packages can be made much more efficient if this similarity is used and only differences need to be downloaded.

As example we will use two revisions of the Guix Emacs package.

The current version (emacs-27.1):

#+BEGIN_SRC shell :exports both
guix build emacs --no-offload
#+END_SRC

#+RESULTS:
: /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1

And an older version (emacs-26.3):

#+BEGIN_SRC shell :exports both
guix time-machine -C emacs-26.3-channels.scm --no-offload -- build emacs --no-offload
#+END_SRC

#+RESULTS:
: /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3

The similarity between these two packages is ~85%. This is our theoretical upper-limit.

* Archives, Images and Block Alignment

Currently pre-built Guix packages (substitutes) are provided as archives (in a format similar to tar).

We will test de-duplication with ERIS with EROFS, SquashFS, uncompressed tar and a zstd compressed tar.

** EROFS - Enhanced Read-Only File-System

EROFS is a fairly new read-only, compressed file-system developed at Huawei for Android smartphones. It is in mainline Linux Kernel since v5.4.

EROFS was developed as response to performance issues with SquashFS. SquashFS does not align content in any way. When the SquashFS image is stored on a (physical) block device this results in "I/O amplification" as many blocks are retrieved when only small parts of them are used. EROFS improves by aligning content to a block size of 4KiB (see also the [[https://www.usenix.org/system/files/atc19-gao.pdf][EROFS paper]]).

This alignment will help de-duplication, as ERIS reads content in fixed-sized blocks.

*** Creating Images

We can create EROFS images using the ~mkfs.erofs~ tool from erofs-utils.

Note that we fix the file-system UUID and set all time-stamps to Unix 0 so that images are reproducible.

#+BEGIN_SRC shell :exports both
mkfs.erofs -U c1b9d5a2-f162-11cf-9ece-0020afc76f16 -T 0 -x -1 \
    emacs-27.1.erofs-uncompressed.img \
    /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1

mkfs.erofs -U c1b9d5a2-f162-11cf-9ece-0020afc76f16 -T 0 -zlz4hc \
    emacs-27.1.erofs-lz4hc.img \
    /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1

mkfs.erofs -U c1b9d5a2-f162-11cf-9ece-0020afc76f16  -T 0 -x -1 \
    emacs-26.3.erofs-uncompressed.img \
    /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3

mkfs.erofs -U c1b9d5a2-f162-11cf-9ece-0020afc76f16  -T 0 -zlz4hc \
    emacs-26.3.erofs-lz4hc.img \
    /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3

ls -lh emacs-*.erofs-*.img
#+END_SRC

#+RESULTS:
| -rw-r--r-- | 1 | pm | users | 63M  | Dec | 29 | 11:28 | emacs-26.3.erofs-lz4hc.img        |
| -rw-r--r-- | 1 | pm | users | 128M | Dec | 29 | 11:28 | emacs-26.3.erofs-uncompressed.img |
| -rw-r--r-- | 1 | pm | users | 63M  | Dec | 29 | 11:28 | emacs-27.1.erofs-lz4hc.img        |
| -rw-r--r-- | 1 | pm | users | 106M | Dec | 29 | 11:27 | emacs-27.1.erofs-uncompressed.img |

*** De-duping

Let's try and de-duplicate with ERIS:

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.erofs-uncompressed.img"
 "emacs-27.1.erofs-uncompressed.img")
#+END_SRC

#+RESULTS:
: De-duplicated 67 blocks (2.04%), block counts: 3364/3290, block-size: 32768

Not so good. We used a block size of 32KiB. ERIS also supports a block-size of 1KiB:

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.erofs-uncompressed.img"
 "emacs-27.1.erofs-uncompressed.img"
 #:block-size eris-block-size-small)
#+END_SRC

#+RESULTS:
: De-duplicated 19707 blocks (17.75%), block counts: 113746/111016, block-size: 1024

Much better! These are the two block-sizes that are allowed by the ERIS specification. The Guile implementation has some tricks up it's sleeves and can allows any block size that is a power of two. Let's try with 4KiB which is also the block size used by EROFS:

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.erofs-uncompressed.img"
 "emacs-27.1.erofs-uncompressed.img"
 #:block-size 4096)
#+END_SRC

#+RESULTS:
: De-duplicated 4212 blocks (15.85%), block counts: 27167/26571, block-size: 4096

Slightly worse...probably the alignment along the 4KiB blocks is not perfect and by using a even smaller block size (1KiB) we can de-duplicate more.

Let's try with the LZ4 compressed versions:

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.erofs-lz4hc.img"
 "emacs-27.1.erofs-lz4hc.img")
#+END_SRC

#+RESULTS:
: De-duplicated 5 blocks (0.25%), block counts: 2002/1999, block-size: 32768

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.erofs-lz4hc.img"
 "emacs-27.1.erofs-lz4hc.img"
 #:block-size eris-block-size-small)
#+END_SRC

#+RESULTS:
: De-duplicated 9215 blocks (13.88%), block counts: 66465/66411, block-size: 1024

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.erofs-lz4hc.img"
 "emacs-27.1.erofs-lz4hc.img"
 #:block-size 4096)
#+END_SRC

#+RESULTS:
: De-duplicated 1944 blocks (12.04%), block counts: 16155/16143, block-size: 4096

Again for small block size we have higher levels of de-duplication.

A quick check if we use an ERIS block size larger than the EROFS block size. We would expect the de-duplication level to decrease sharply:

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.erofs-lz4hc.img"
 "emacs-27.1.erofs-lz4hc.img"
 #:block-size (* 8 1024))
#+END_SRC

#+RESULTS:
: De-duplicated 558 blocks (6.94%), block counts: 8045/8038, block-size: 8192

Almost halved...

** SquashFS

#+BEGIN_SRC shell :exports both
mksquashfs /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1 emacs-27.1.squashfs.img
mksquashfs /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3 emacs-26.3.squashfs.img
ls -lh emacs-*.squashfs.img
#+END_SRC

#+RESULTS:
| -rw-r--r-- | 1 | pm | users | 47M | Dec | 29 | 18:41 | emacs-26.3.squashfs.img |
| -rw-r--r-- | 1 | pm | users | 45M | Dec | 29 | 18:41 | emacs-27.1.squashfs.img |


#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.squashfs.img"
 "emacs-27.1.squashfs.img"
 #:block-size eris-block-size-large)
#+END_SRC

#+RESULTS:
: De-duplicated 0 blocks (0.00%), block counts: 1482/1440, block-size: 32768

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.squashfs.img"
 "emacs-27.1.squashfs.img"
 #:block-size eris-block-size-small)
#+END_SRC

#+RESULTS:
: De-duplicated 2 blocks (0.00%), block counts: 50434/48995, block-size: 1024

ERIS can not de-duplicate anything as SquashFS does not align content.

** Tar

#+BEGIN_SRC shell :exports both
tar cf emacs-27.1.tar /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1
tar cf emacs-26.3.tar /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3

ls -lh emacs-*.tar
#+END_SRC

#+RESULTS:
| -rw-r--r-- | 1 | pm | users | 131M | Dec | 29 | 18:43 | emacs-26.3.tar |
| -rw-r--r-- | 1 | pm | users | 109M | Dec | 29 | 18:43 | emacs-27.1.tar |

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.tar"
 "emacs-27.1.tar"
 #:block-size eris-block-size-large)
#+END_SRC

#+RESULTS:
: De-duplicated 1 blocks (0.03%), block counts: 3470/3395, block-size: 32768

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.tar"
 "emacs-27.1.tar"
 #:block-size eris-block-size-small)
#+END_SRC

#+RESULTS:
: De-duplicated 11932 blocks (10.41%), block counts: 117368/114634, block-size: 1024

Not so bad. When using a small block-size of 1KiB ERIS can decode 10%. But the archive is pretty large. All this savings are meaningless as just sending a compressed archive would probably be cheaper.

** Tar.Zstd

#+BEGIN_SRC shell :exports both
zstd emacs-27.1.tar
zstd emacs-26.3.tar

ls -lh emacs-*.tar.zst
#+END_SRC

#+RESULTS:
| -rw-r--r-- | 1 | pm | users | 46M | Dec | 29 | 17:21 | emacs-26.3.tar.zst |
| -rw-r--r-- | 1 | pm | users | 45M | Dec | 29 | 17:11 | emacs-27.1.tar.zst |

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.tar.zst"
 "emacs-27.1.tar.zst"
 #:block-size eris-block-size-large)
#+END_SRC

#+RESULTS:
: De-duplicated 0 blocks (0.00%), block counts: 1455/1443, block-size: 32768

#+BEGIN_SRC scheme :exports both
(measure-eris-de-duplicaiton
 "emacs-26.3.tar.zst"
 "emacs-27.1.tar.zst"
 #:block-size eris-block-size-small)
#+END_SRC

#+RESULTS:
: De-duplicated 0 blocks (0.00%), block counts: 49499/49100, block-size: 1024

Nada de-duplication...

* Conclusion

By using EROFS we were able to de-duplicate ~12% of the content between two very similar revisions of Emacs packages. There is still much room to improve towards the ~85% similarity shown by Ludovic.

Still, maybe nice to see what can be achieved with "off-the-shelve" technology. EROFS is in mainline Linux and you can simply mount an image with ~mount -t erofs -o loop emacs-26.1.erofs-lz4hc.img /mnt/emacs~.

For Guix this experiment is at the intersection of three topics that I find very interesting:

- Peer-to-peer distribution of substitutes
- Minimizing network usage
- Mounting packages instead of unpacking (a la [[https://distr1.org/][distri]])

The approach might be interesting for things other than Guix. For example a file-sharing application in GNUNet has very similar requirements.

Some technical questions that remain:

- Does it make sense to hand-roll a EROFS/Squashfs like file-system that is optimized for de-duplication?
- ERIS currently allows block-size of 1KiB and 32KiB maybe for de-duplicaiton something in-between is better?

Happy hacking!

-pukkamustard

* See also

Further reading and watching.

** [[https://www.usenix.org/system/files/atc19-gao.pdf][EROFS: A Compression-friendly Readonly File System for Resource-scarce Devices]]

The EROFS paper presented at USENIX 2019 that illustrates the motivations behind EROFS and compares performance to other filesystems (including Squashfs).

** [[https://media.ccc.de/v/arch-conf-online-2020-6387-distri-researching-fast-linux-package-management][distri: researching fast Linux package management]]

Talk about the distri package manager at Arch Conf 2020.

Distri does not distribute packages as archives but as Squashfs images. Packages can just be mounted instead of unarchived.

See also the toot and discussion on the Fediverse: https://toot.aquilenet.fr/@civodul/105304162332789048

** [[https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html][When substitute download + decompression is CPU-bound]]

Post on the guix-devel mailing-list on how installing a Guix package can be CPU-bound (due to decompression).

This can be solved by not decompressing packages, but just mounting them (like distri).

** [[https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00258.html][Identical files across subsequent package revisions]]

Post on the guix-devel mailing-list showing how similar different versions of package can be.