========= =CONTENT= ========= Quick&Dirty speed comparision between boost::iostream buffered input filters for stream decompression as well as raw reading into a buffered input filter without any decompression. Decompressed input is stored in a std::stringstream. A comparision to system decompression tools piping to /dev/null was added to ensure the code is sane and not skewing results (bzip and gzip files were slightly different so the absolute times aren't really comparable. Input was highly compressible text in the form of a repeatedly concatenated e-book). BZIP2: ====== boost::iostream (bzip2_decompressor() filter) Uncompressed (584.347 MiB) : 21.9679 MiB/s Compressed (132.395 MiB) : 4.97724 MiB/s Took 26.6 seconds time bzcat a.txt.gz > /dev/null real 0m29.128s user 0m28.310s sys 0m0.780s => Uncompressed: 20.0613 MiB/s => Compressed: 4.5452 MiB/s GZIP: ===== boost::iostream (gzip_decompressor() filter) Uncompressed (559.996 MiB) : 75.1673 MiB/s Compressed (212.405 MiB) : 28.5108 MiB/s Took 7.45 seconds time zcat b.txt.gz > /dev/null real 0m5.351s user 0m4.420s sys 0m0.740s => Uncompressed: 104.6526 MiB/s => Compressed: 39.6944 MiB/s (multithreaded) time pigz -d -c b.txt.gz > /dev/null real 0m3.135s user 0m3.370s sys 0m1.260s => Uncompressed: 178.6271 MiB/s => Compressed: 67.7528 MiB/s RAW: ==== When the file has already been read around 660MiB/s ;-) ...after a reboot to clear caches: Uncompressed (559.996 MiB) : 152.173 MiB/s Compressed (559.996 MiB) : 152.173 MiB/s Took 3.68 seconds Conclusion: =========== I think we have a winner here ;-) While the gzip code apparently could be optimized a bit reading the data uncompressed is the way to go. Not only is it faster, it also does not tie up a whole core for decompression that can be used in processing. This tradeoff may change though if the processing turns out to be so cheap, or the harddisk so slow, we become i/o bound. In that case multithreaded decompression might become an option though it is still very expensive if it is supposed to be faster than raw read rate. Appendix: ========= Full code for stream decompression app #include #include #include #include #include #include #include #include using namespace std; namespace io = boost::iostreams; template streamsize fileSize(T &file) { const streampos start = file.tellg(); file.seekg(0, ios::end); const streampos end = file.tellg(); file.seekg(start); return end - start; } int main(int argc, char **argv) { if (argc != 2) return 1; ifstream file(argv[1], ios_base::in | ios_base::binary); if (file.bad()) { cout << "Failed to open " << argv[1] << endl; return 1; } const streamsize fsize = fileSize(file); io::filtering_streambuf in; //in.push(io::gzip_decompressor()); // For gzip decompression //in.push(io::bzip2_decompressor()); // For bzip2 decompression in.push(file); stringstream target; boost::timer timer; io::copy(in, target); const double elapsed = timer.elapsed(); const size_t ucsize = target.str().length(); const double mib_uncompressed = static_cast(ucsize) / 1024 / 1024; const double mib_compressed = static_cast(fsize) / 1024 / 1024; const double mibs_uncompressed = mib_uncompressed / elapsed; const double mibs_compressed = mib_compressed / elapsed; cout << "Uncompressed (" << mib_uncompressed << " MiB) : " << mibs_uncompressed << " MiB/s" << endl; cout << "Compressed (" << mib_compressed << " MiB) : " << mibs_compressed << " MiB/s" << endl; cout << "Took " << elapsed << " seconds" << endl; return 0; }