aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorpukkamustard <pukkamustard@posteo.net>2020-12-29 19:17:07 +0100
committerpukkamustard <pukkamustard@posteo.net>2020-12-29 19:17:07 +0100
commit2c65c117e74b1b4e2d4fd2133fddd40e8b99e39d (patch)
tree5aec4b6871eef8f1ecd77ec50aba7e17ae4e3d2d
parenta8f3982b3ea0ec8ccaa0c10db529ae6c24dd0e03 (diff)
examples/dedup-fs: init
-rw-r--r--examples/dedup-fs/Readme.org452
1 files changed, 452 insertions, 0 deletions
diff --git a/examples/dedup-fs/Readme.org b/examples/dedup-fs/Readme.org
new file mode 100644
index 0000000..c480efc
--- /dev/null
+++ b/examples/dedup-fs/Readme.org
@@ -0,0 +1,452 @@
+#+TITLE: De-duplicated File-Systems with ERIS
+#+PROPERTY: header-args:scheme :session *eris-repl* :eval never-export
+
+#+BEGIN_ABSTRACT
+This is an experiment to investigate how ERIS can be used to encode file-systems so that the same content appearing in two file-systems is de-duplicated.
+#+END_ABSTRACT
+
+* ERIS for De-duplication
+
+ERIS (Encoding for Robust Immutable Storage) defines how any content can be encoded into a set of uniformly sized, encrypted blocks. The original content can be decoded from the blocks only with a special key - a read capability, which can be encoded as an URN.
+
+For example the ERIS URN of the string ~"Hello World!"~ is:
+
+#+BEGIN_SRC scheme :exports both
+(use-modules (eris)
+ (eris read-capability)
+ (ice-9 receive))
+
+(receive (read-capability blocks)
+ (eris-encode "Hello World!")
+ (read-capability->string read-capability))
+#+END_SRC
+
+#+RESULTS:
+: urn:erisx2:AAAASDIKPSHNFCH7TZKPNTTSA4AXEPOMMRB4LCA6VC6YXL2YPATZGTBBUWKF7SN555Y7MLVVIPQPFUHUKPE7Y4GVPCETLN5ZMXJME45TKA
+
+Without this URN the blocks are just uniformly sized pieces of gibberish (1KiB or 32KiB).
+
+In order to decode the content we need to be able to de-reference individual blocks by their hash (content-addressing). A simple key-value store suffices, but one can also use systems such as IPFS, GNUNet, Named Data Networking (I think!) or a simple HTTP service to store and transport blocks. ERIS de-couples content from underlying storage and transport networks.
+
+ERIS encodes content by reading fixed-size blocks of content, encrypting them and collecting them together in a tree (see the [[http://purl.org/eris][spec]] for more information).
+
+If the same content is read twice it is encoded into the same block.
+
+We will attempt to encode different formats of archives and images of Guix packages to see how well ERIS de-duplicates shared content.
+
+** Intermezzo: Storing blocks in VHashes
+
+By default the Guile ERIS implementation stores blocks in an alist. We can tell the encoder to store the blocks elsewhere by providing an SRFI-171 reducer that takes care of the blocks.
+
+In this example we will store blocks in a VHash (with blocks as values and the references to the blocks - the Blake2b hash of the block - as keys). This will allow us to do more simpler merging and counting of blocks (merging a VHash is easier than merging an alist).
+
+#+BEGIN_SRC scheme
+(use-modules (ice-9 vlist)
+ (rnrs bytevectors))
+
+;; special care must be given to use a appropriate hash function...
+(define (ref-hash key size)
+ (modulo
+ (bytevector-uint-ref key 0 (endianness big) 32)
+ size))
+
+(define (vhash-block-ref blocks ref)
+ (cond
+ ((vhash-assoc ref blocks bytevector=? ref-hash) => cdr)
+ (else #f)))
+
+(define (vhash-block-cons ref block blocks)
+ (if (vhash-block-ref blocks ref)
+ blocks
+ (vhash-cons ref block blocks ref-hash)))
+
+(define vhash-block-reducer
+ (case-lambda
+
+ ;; initialize empty VHash
+ (() vlist-null)
+
+ ;; return blocks without post-processing
+ ((blocks) blocks)
+
+ ;; add a new block
+ ((blocks ref-block)
+ (vhash-block-cons (car ref-block) (cdr ref-block) blocks))))
+
+(define (vhash-block-merge vh1 vh2)
+ (vhash-fold vhash-block-cons vh1 vh2))
+#+END_SRC
+
+#+RESULTS:
+: #<unspecified>
+
+** Measuring de-duplication
+
+Some helpers to count de-duplicated blocks:
+
+#+BEGIN_SRC scheme
+(use-modules (eris)
+ (ice-9 receive)
+ (ice-9 format))
+
+
+(define (eris-encode->vhash-blocks file block-size)
+ "Encode file and return the blocks in a VHash"
+ (receive (_ blocks)
+ (call-with-input-file file
+ (lambda (port) (eris-encode port
+ #:block-size block-size
+ #:block-reducer vhash-block-reducer))
+ #:binary #t)
+ blocks))
+
+(define* (measure-eris-de-duplicaiton file-1 file-2 #:key (block-size eris-block-size-large))
+ (let* ((blocks-1 (eris-encode->vhash-blocks file-1 block-size))
+ (blocks-2 (eris-encode->vhash-blocks file-2 block-size))
+ (block-count-1 (vlist-length blocks-1))
+ (block-count-2 (vlist-length blocks-2))
+ (block-count-total (vlist-length (vhash-block-merge blocks-1 blocks-2)))
+ (de-duped (- (+ block-count-1 block-count-2)
+ block-count-total))
+ (de-duped-fraction (/ de-duped (min block-count-1 block-count-2))))
+ (format #f "De-duplicated ~d blocks (~,2f%), block counts: ~d/~d, block-size: ~d" de-duped (* 100 de-duped-fraction) block-count-1 block-count-2 block-size)))
+#+END_SRC
+
+#+RESULTS:
+: #<unspecified>
+
+We can use this to measure how much ERIS can de-duplicate.
+
+A simple example:
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton "Readme.org" "Readme.org")
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 1 blocks (100.00%), block counts: 1/1, block-size: 32768
+
+ERIS can de-duplicate the same file by 100%.
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton "Readme.org" "../ipfs.org")
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 0 blocks (0.00%), block counts: 1/1, block-size: 32768
+
+But nothing between this file and the [[../ipfs.org][IPFS demo]].
+
+* A Tale of Two Guix Packages
+
+Recently Ludovic Court├Ęs has investigated the similarity between subsequent package revisions (see [[https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00258.html][post to guix-devel mailing-list]]) and shown that this can be quite high. For subsequent revisions of the Emacs package ~85% of the total size is the same!
+
+Downloading packages can be made much more efficient if this similarity is used and only differences need to be downloaded.
+
+As example we will use two revisions of the Guix Emacs package.
+
+The current version (emacs-27.1):
+
+#+BEGIN_SRC shell :exports both
+guix build emacs --no-offload
+#+END_SRC
+
+#+RESULTS:
+: /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1
+
+And an older version (emacs-26.3):
+
+#+BEGIN_SRC shell :exports both
+guix time-machine -C emacs-26.3-channels.scm --no-offload -- build emacs --no-offload
+#+END_SRC
+
+#+RESULTS:
+: /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3
+
+The similarity between these two packages is ~85%. This is our theoretical upper-limit.
+
+* Archives, Images and Block Alignment
+
+Currently pre-built Guix packages (substitutes) are provided as archives (in a format similar to tar).
+
+We will test de-duplication with ERIS with EROFS, SquashFS, uncompressed tar and a zstd compressed tar.
+
+** EROFS - Enhanced Read-Only File-System
+
+EROFS is a fairly new read-only, compressed file-system developed at Huawei for Android smartphones. It is in mainline Linux Kernel since v5.4.
+
+EROFS was developed as response to performance issues with SquashFS. SquashFS does not align content in any way. When the SquashFS image is stored on a (physical) block device this results in "I/O amplification" as many blocks are retrieved when only small parts of them are used. EROFS improves by aligning content to a block size of 4KiB (see also the [[https://www.usenix.org/system/files/atc19-gao.pdf][EROFS paper]]).
+
+This alignment will help de-duplication, as ERIS reads content in fixed-sized blocks.
+
+*** Creating Images
+
+We can create EROFS images using the ~mkfs.erofs~ tool from erofs-utils.
+
+Note that we fix the file-system UUID and set all time-stamps to Unix 0 so that images are reproducible.
+
+#+BEGIN_SRC shell :exports both
+mkfs.erofs -U c1b9d5a2-f162-11cf-9ece-0020afc76f16 -T 0 -x -1 \
+ emacs-27.1.erofs-uncompressed.img \
+ /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1
+
+mkfs.erofs -U c1b9d5a2-f162-11cf-9ece-0020afc76f16 -T 0 -zlz4hc \
+ emacs-27.1.erofs-lz4hc.img \
+ /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1
+
+mkfs.erofs -U c1b9d5a2-f162-11cf-9ece-0020afc76f16 -T 0 -x -1 \
+ emacs-26.3.erofs-uncompressed.img \
+ /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3
+
+mkfs.erofs -U c1b9d5a2-f162-11cf-9ece-0020afc76f16 -T 0 -zlz4hc \
+ emacs-26.3.erofs-lz4hc.img \
+ /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3
+
+ls -lh emacs-*.erofs-*.img
+#+END_SRC
+
+#+RESULTS:
+| -rw-r--r-- | 1 | pm | users | 63M | Dec | 29 | 11:28 | emacs-26.3.erofs-lz4hc.img |
+| -rw-r--r-- | 1 | pm | users | 128M | Dec | 29 | 11:28 | emacs-26.3.erofs-uncompressed.img |
+| -rw-r--r-- | 1 | pm | users | 63M | Dec | 29 | 11:28 | emacs-27.1.erofs-lz4hc.img |
+| -rw-r--r-- | 1 | pm | users | 106M | Dec | 29 | 11:27 | emacs-27.1.erofs-uncompressed.img |
+
+*** De-duping
+
+Let's try and de-duplicate with ERIS:
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.erofs-uncompressed.img"
+ "emacs-27.1.erofs-uncompressed.img")
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 67 blocks (2.04%), block counts: 3364/3290, block-size: 32768
+
+Not so good. We used a block size of 32KiB. ERIS also supports a block-size of 1KiB:
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.erofs-uncompressed.img"
+ "emacs-27.1.erofs-uncompressed.img"
+ #:block-size eris-block-size-small)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 19707 blocks (17.75%), block counts: 113746/111016, block-size: 1024
+
+Much better! These are the two block-sizes that are allowed by the ERIS specification. The Guile implementation has some tricks up it's sleeves and can allows any block size that is a power of two. Let's try with 4KiB which is also the block size used by EROFS:
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.erofs-uncompressed.img"
+ "emacs-27.1.erofs-uncompressed.img"
+ #:block-size 4096)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 4212 blocks (15.85%), block counts: 27167/26571, block-size: 4096
+
+Slightly worse...probably the alignment along the 4KiB blocks is not perfect and by using a even smaller block size (1KiB) we can de-duplicate more.
+
+Let's try with the LZ4 compressed versions:
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.erofs-lz4hc.img"
+ "emacs-27.1.erofs-lz4hc.img")
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 5 blocks (0.25%), block counts: 2002/1999, block-size: 32768
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.erofs-lz4hc.img"
+ "emacs-27.1.erofs-lz4hc.img"
+ #:block-size eris-block-size-small)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 9215 blocks (13.88%), block counts: 66465/66411, block-size: 1024
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.erofs-lz4hc.img"
+ "emacs-27.1.erofs-lz4hc.img"
+ #:block-size 4096)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 1944 blocks (12.04%), block counts: 16155/16143, block-size: 4096
+
+Again for small block size we have higher levels of de-duplication.
+
+A quick check if we use an ERIS block size larger than the EROFS block size. We would expect the de-duplication level to decrease sharply:
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.erofs-lz4hc.img"
+ "emacs-27.1.erofs-lz4hc.img"
+ #:block-size (* 8 1024))
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 558 blocks (6.94%), block counts: 8045/8038, block-size: 8192
+
+Almost halved...
+
+** SquashFS
+
+#+BEGIN_SRC shell :exports both
+mksquashfs /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1 emacs-27.1.squashfs.img
+mksquashfs /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3 emacs-26.3.squashfs.img
+ls -lh emacs-*.squashfs.img
+#+END_SRC
+
+#+RESULTS:
+| -rw-r--r-- | 1 | pm | users | 47M | Dec | 29 | 18:41 | emacs-26.3.squashfs.img |
+| -rw-r--r-- | 1 | pm | users | 45M | Dec | 29 | 18:41 | emacs-27.1.squashfs.img |
+
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.squashfs.img"
+ "emacs-27.1.squashfs.img"
+ #:block-size eris-block-size-large)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 0 blocks (0.00%), block counts: 1482/1440, block-size: 32768
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.squashfs.img"
+ "emacs-27.1.squashfs.img"
+ #:block-size eris-block-size-small)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 2 blocks (0.00%), block counts: 50434/48995, block-size: 1024
+
+ERIS can not de-duplicate anything as SquashFS does not align content.
+
+** Tar
+
+#+BEGIN_SRC shell :exports both
+tar cf emacs-27.1.tar /gnu/store/lhw3zwhzra0w5l8a4jw8fvm58i75xyl8-emacs-27.1
+tar cf emacs-26.3.tar /gnu/store/wsv3m2q0r8sbyyzn1wr7vv2fi51mpfkx-emacs-26.3
+
+ls -lh emacs-*.tar
+#+END_SRC
+
+#+RESULTS:
+| -rw-r--r-- | 1 | pm | users | 131M | Dec | 29 | 18:43 | emacs-26.3.tar |
+| -rw-r--r-- | 1 | pm | users | 109M | Dec | 29 | 18:43 | emacs-27.1.tar |
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.tar"
+ "emacs-27.1.tar"
+ #:block-size eris-block-size-large)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 1 blocks (0.03%), block counts: 3470/3395, block-size: 32768
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.tar"
+ "emacs-27.1.tar"
+ #:block-size eris-block-size-small)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 11932 blocks (10.41%), block counts: 117368/114634, block-size: 1024
+
+Not so bad. When using a small block-size of 1KiB ERIS can decode 10%. But the archive is pretty large. All this savings are meaningless as just sending a compressed archive would probably be cheaper.
+
+** Tar.Zstd
+
+#+BEGIN_SRC shell :exports both
+zstd emacs-27.1.tar
+zstd emacs-26.3.tar
+
+ls -lh emacs-*.tar.zst
+#+END_SRC
+
+#+RESULTS:
+| -rw-r--r-- | 1 | pm | users | 46M | Dec | 29 | 17:21 | emacs-26.3.tar.zst |
+| -rw-r--r-- | 1 | pm | users | 45M | Dec | 29 | 17:11 | emacs-27.1.tar.zst |
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.tar.zst"
+ "emacs-27.1.tar.zst"
+ #:block-size eris-block-size-large)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 0 blocks (0.00%), block counts: 1455/1443, block-size: 32768
+
+#+BEGIN_SRC scheme :exports both
+(measure-eris-de-duplicaiton
+ "emacs-26.3.tar.zst"
+ "emacs-27.1.tar.zst"
+ #:block-size eris-block-size-small)
+#+END_SRC
+
+#+RESULTS:
+: De-duplicated 0 blocks (0.00%), block counts: 49499/49100, block-size: 1024
+
+Nada de-duplication...
+
+* Conclusion
+
+By using EROFS we were able to de-duplicate ~12% of the content between two very similar revisions of Emacs packages. There is still much room to improve towards the ~85% similarity shown by Ludovic.
+
+Still, maybe nice to see what can be achieved with "off-the-shelve" technology. EROFS is in mainline Linux and you can simply mount an image with ~mount -t erofs -o loop emacs-26.1.erofs-lz4hc.img /mnt/emacs~.
+
+For Guix this experiment is at the intersection of three topics that I find very interesting:
+
+- Peer-to-peer distribution of substitutes
+- Minimizing network usage
+- Mounting packages instead of unpacking (a la [[https://distr1.org/][distri]])
+
+The approach might be interesting for things other than Guix. For example a file-sharing application in GNUNet has very similar requirements.
+
+Some technical questions that remain:
+
+- Does it make sense to hand-roll a EROFS/Squashfs like file-system that is optimized for de-duplication?
+- ERIS currently allows block-size of 1KiB and 32KiB maybe for de-duplicaiton something in-between is better?
+
+Happy hacking!
+
+-pukkamustard
+
+* See also
+
+Further reading and watching.
+
+** [[https://www.usenix.org/system/files/atc19-gao.pdf][EROFS: A Compression-friendly Readonly File System for Resource-scarce Devices]]
+
+The EROFS paper presented at USENIX 2019 that illustrates the motivations behind EROFS and compares performance to other filesystems (including Squashfs).
+
+** [[https://media.ccc.de/v/arch-conf-online-2020-6387-distri-researching-fast-linux-package-management][distri: researching fast Linux package management]]
+
+Talk about the distri package manager at Arch Conf 2020.
+
+Distri does not distribute packages as archives but as Squashfs images. Packages can just be mounted instead of unarchived.
+
+See also the toot and discussion on the Fediverse: https://toot.aquilenet.fr/@civodul/105304162332789048
+
+** [[https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00177.html][When substitute download + decompression is CPU-bound]]
+
+Post on the guix-devel mailing-list on how installing a Guix package can be CPU-bound (due to decompression).
+
+This can be solved by not decompressing packages, but just mounting them (like distri).
+
+** [[https://lists.gnu.org/archive/html/guix-devel/2020-12/msg00258.html][Identical files across subsequent package revisions]]
+
+Post on the guix-devel mailing-list showing how similar different versions of package can be.