Projects / Pcompress

Pcompress

Pcompress is an archiver that can do compression/decompression and deduplication in parallel by splitting input data into chunks. It has a modular structure and includes support for multiple algorithms like LZMA, Bzip2, PPMD, LZ4, etc., with KECCAK/BLAKE2/SHA-256/512 chunk checksums. SSE optimizations for the bundled LZMA are included. It also implements chunk-level Content-Aware Deduplication and Delta Compression features based on a Polynomial Fingerprinting scheme. It has low metadata overhead and overlaps I/O and compression to achieve maximum parallelism. It has AES encryption capability and uses Scrypt from Tarsnap to generate per-session unique keys from passwords. It can work in pipe mode, reading from stdin and writing to stdout. It also provides some adaptive compression modes in which a suitable algorithm is chosen per chunk based on heuristics.

Tags
Licenses
Operating Systems
Implementation

RSS Recent releases

  •  02 Mar 2014 18:09

Release Notes: This release fixes several important bugs in the 3.0 version, making it more stable. In addition, automatic scaling of compression buffers based on compression level has been added. This improves compression ratio for the simple usage modes.

  •  12 Jan 2014 23:05

Release Notes: This is a major release featuring complete archiving capability using Libarchive and advanced data type-based compression. Appropriate algorithms for the file type are used along with various filters like PackJPG, Dispack, Delta2, etc. Files are sorted to cluster similar content and improve compression. Data split boundaries are determined from file type and rolling hash changes. Other improvements include performance tweaks, reduced memory consumption, and simplified usage.

  •  06 Sep 2013 02:40

Release Notes: This release fixes several issues, including some corner case crashes and a couple of buffer overflows. Data Deduplication can now be done using blocks as small as 2KB, providing a much higher dedupe ratio than virtually any other deduplication software. Similarity based deduplication performance has been improved. Free memory detection accuracy has also been improved.

  •  10 Aug 2013 07:24

Release Notes: This release fixes a few bugs and provides several improvements in efficiency and performance. The Similarity detection effectiveness for similarity based near-exact deduplication has been improved. At the same time memory requirements for the index has been reduced. Accuracy of data partitioning between threads has been improved. Chunking and indexing performance have been improved and the KMV Sketch computation is now more accurate. This release moves all the core functionality into a shared library in preparation for an API interface that will be introduced in future releases.

  •  28 May 2013 21:30

Release Notes: This is primarily a Bugfix release. It fixes some crashes with invalid input and build problems on Debian6 and older non-SSE4 processors. The Min-heap based Similarity matching for Delta Encoding has been improved and made faster and more accurate. Accuracy of scalable Segmented Global Deduplication has been further improved to be greater than 95%. More testcases have been added.

Screenshot

Project Spotlight

JSONMinify

A JSON+C minifier.

Screenshot

Project Spotlight

FusionDirectory

An infrastructure manager.