I managed multicore compression & decompression of a directory with tar and pzstd.
These days, I work with large codebases such as Android or RDK. Sometimes, I need to archieve those codebases.
My PC has 8C/16T CPU (Ryzen 5700G), 64Gbyte DDR4 RAM, SAMSUNG 970 EVO Plus NVME M.2 2TB and I want this archiving operation to be as fast as possible.
In the past, I used tools like pigz, but these days I love to use zstd, and there is a multithread verison of it is available: pzstd.
Generic usage:
tar --use-compress-program pzstd -TAR_ARGUMENTS FINAL_TARBALL_NAME DIRECTORY_TO_ARCHIEVE
Example: I cloned AOSP-13 source via “repo” tool, and it gave me a directory with 100Gbyte size. Lets assume the name of the directory is “AOSP-13”.
tar --use-compress-program pzstd -cvf AOSP-13.tar.zstd AOSP-13/
Or even simpler:
tar -I pzstd -cvf AOSP-13.tar.zstd AOSP-13/
Above AOSP-13 directory size is 171 Gbyte with hundreds of thousands of various text & binary files in it.
stulluk /media/WORK/ANDROID/temp/AOSP-13 $ du -sh
171G .
stulluk /media/WORK/ANDROID/temp/AOSP-13 $ find . -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn
239067 h
119647 c
107300 java
72684 xml
48560 cpp
34947 html
33718 txt
32929 py
29148 png
24600 cc
21136 sha1
19403 md5
18299 go
14180 jar
12855 ll
12477 so
12390 rs
9636 lsdump
8975 bp
8500 aidl
7968 rst
7319 pbtxt
7068 sh
6582 pom
6091 json
5890 md
5566 hpp
5511 dts
5452 a
5405 S
5101 kt
3897 dtsi
3577 yaml
3546 out
3443 s
3198 te
3049 test
2949 mk
2779 smali
2675 in
2626 gitignore
2336 git/info/exclude
2336 git/HEAD
2336 git/description
2336 git/config
2007 cfg
1877 proto
1837 aar
1783 m
1739 gradle
1714 o
1700 bin
1573 jpeg
1572 ttf
1539 frag
1500 dump
1435 inc
1404 pem
1259 3
1207 cs
1202 amber
1193 1
1173 git/refs/remotes/m/t-tv-dev
1173 git/logs/refs/remotes/m/t-tv-dev
1173 git/index
1173 git/FETCH_HEAD
1156 git/packed-refs
1144 rlib
1125 idx
1123 pack
1118 def
1090 properties
1047 js
1045 hal
1044 td
1007 dat
974 otf
973 cmake
947 jpg
938 rscript
901 ko
896 gn
882 gz
871 groovy
828 asm
814 asc
803 apk
797 mm
797 bat
772 expected
738 zip
727 rc
723 sha256
720 crt
710 tmpl
693 yml
684 glsl
672 data
650 svg
648 toml
636 ogg
634 bazel
633 sha512
632 pcap
616 go2
611 ini
593 vert
591 keep
590 err
577 0
569 jmod
564 bzl
562 gif
544 expect
520 bz2
518 g
508 mlir
488 ttx
472 8
465 patch
465 mp4
461 input
456 syms
455 css
448 sksl
444 m4
443 asciipb
432 conf
431 ts
431 d
427 p12
427 2
417 pl
417 4
413 csv
.....
Compressing it with tar & pzstd ( with default params ) takes less than 4 minutes:
stulluk /media/WORK/ANDROID/temp $ /usr/bin/time -v tar -I pzstd -cf AOSP-13.tar.zstd AOSP-13/
Command being timed: "tar -I pzstd -cf AOSP-13.tar.zstd AOSP-13/"
User time (seconds): 719.49
System time (seconds): 224.74
Percent of CPU this job got: 409%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:50.83
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 532216
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 1236154
Voluntary context switches: 5994338
Involuntary context switches: 627436
Swaps: 0
File system inputs: 356223840
File system outputs: 233924608
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
stulluk /media/WORK/ANDROID/temp $
Note that I did not use bash built-in “time” keyword, it doesn’t have ability to report CPU usage.
Generic usage:
tar --use-compress-program pzstd -TAR_ARGUMENTS TARBALL_NAME
Example: I have above AOSP-13.tar.zstd file and I want to decompress it with multiple cores
tar --use-compress-program pzstd -xvf AOSP-13.tar.zstd
Or even simpler:
tar -I pzstd -cvf AOSP-13.tar.zstd
stulluk /media/WORK/ANDROID/temp $ /usr/bin/time -v tar -I pzstd -xf AOSP-13.tar.zstd
Command being timed: "tar -I pzstd -xf AOSP-13.tar.zstd"
User time (seconds): 141.14
System time (seconds): 203.48
Percent of CPU this job got: 168%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:24.63
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 392308
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 2371705
Voluntary context switches: 17620483
Involuntary context switches: 25312
Swaps: 0
File system inputs: 233926600
File system outputs: 356182424
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
stulluk /media/WORK/ANDROID/temp $
Amazing, isn’t it :)
I added following line to my ${HOME}/.bashrc for lazyness:
alias tpz='tar -I pzstd '
After source’ing this .bashrc , I can easily do:
tpz -cvf AOSP-13.tar.zstd AOSP-13
or
tpz -xvf AOSP-13.tar.zstd
Adding “v” to tar arguments slightly reduce the speed in my environment, at around %10-15. I think this is due to the terminal speed of my PC ( I use & love terminator )
Of course there are better compressors than zstd but critical point for me is the speed.
zstd gives a good balance between compress / decompress speed and compression ratio.
Most modern kernels support zstd out-of-the-box and thanks to facebook/meta for this beautiful software.