Please disable your adblock and script blockers to view this page

Broccoli: Syncing faster by syncing less


// Press
Dropbox
Brotli
Nucleus
Lepton
MiB
PutBlock
GetBlock
ECC
Dropbox’s
~30%
Rust
Mehant Baid
O(1


Brotli
Alexey Ivanov
Geoffry Song
John Lai
Rajat Goel
Jongmin Baek
Huffman
Brotli
Golang

No matching tags

No matching tags

No matching tags


Brotli
MiB.
80Mbps

No matching tags

Positivity     37.00%   
   Negativity   63.00%
The New York Times
SOURCE: https://dropbox.tech/infrastructure/-broccoli--syncing-faster-by-syncing-less
Write a review: Hacker News
Summary

Thus, compressing files before syncing them means less data on the wire (decreased bandwidth usage and latency for the end user!) and storing smaller pieces in the back end (increased storage cost savings!). To enable these advancements, we measured several common lossless compression algorithms on the incoming data stream at Dropbox including 7zip, zstd, zlib, and Brotli, and we chose to use a slightly modified Brotli encoder, we call Broccoli, to compress files before syncing.Today, we will dive into the Broccoli encoding, provide an overview of the block sync protocol, and explain how we are using them in conjunction to optimize block data transfers. While Dropbox had done research into generic file compression algorithms as well as Lepton, a novel image recompression algorithm, these techniques did not suit themselves to operating at network speeds on client machines.Our initial research into Brotli was promising, and we identified 4 key advantages.None of the other options we tested checked all five of the boxes above.We codenamed the Brotli compressor in Rust “Broccoli” because of the capability to make Brotli files concatenate with one another (brot-cat-li). We decided on the broccoli package because it is: To unlock multithreaded compression (and concatenate-ability), we would need to compress chunks of the file and generate a valid Brotli output. The client sync protocol consists of two sub-protocols: one to sync file metadata, for example filename and size, and one to sync blocks, which are aligned chunks of up to 4 mebibyte (MiB) of a file. The possibility of compression being a bottleneck was not obvious to us when we started thinking about the problem and served as perfect reminder to constantly challenge our assumptions.While our overall percentage savings was down from ~33% to ~30%, we managed to speed up the large file uploads bandwidth from ~35Mbps to ~50Mbps (at peak) increasing upload link throughput. The other benefit of having the server control the value sent to the client is that we can, with the data available only on the server, decide if compressing is actually the most efficient way to send down the block. It is important to note that in cases where the data is incompressible Brotli adds additional bytes on top to make the size of the compressed block larger than its uncompressed version. We would also like to acknowledge the emeritus of the Sync and Storage teams for their contributions in this area.Brotli compressed data consists of a header and a series of meta-blocks. Each meta-block internally contains its own header (which describes the representation of the compressed part) and the compressed data.

As said here by Rishabh Jain