Block Sorting treats a body of text as a large block of strings. The strings can be sorted using a reversible algorithm of some kind, yielding a more compressible dataset. The best known example of this technique is the Burrows-Wheeler Transform, or BWT.
The Burrows-Wheeler Transform refers to a reversible transformation that can make a given text more compressible. Once this transform has been applied, the text can be compressed with a combination of Move to Front encoding followed by an entropy encoder.
Contains the code and tells about BWTS which does not need the index. It's what BWT should have been to be length preserving.
Created: 10/01/2008
More...
Tuck is a portable, lossless data compression tool based on the
Burrows-Wheeler transform, thus ensuring good compression ratios.
It currently runs quite slowly (slower than the efficient bzip2 ),
mainly because of the block-sorting algorithms involved.
As of March 2006 (initial release, v0.10) tuck is distributed as a
Mach-O PowerPC command-line program only.
The C source code will be released. A library interface similar to that
of zlib or and libbz2 will be exposed
Created: 10/04/2006
by Vadim YoockinMore...
PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 (ie: anything compressed with pbzip2 can be decompressed with bzip2). Source code available.
Created: 03/03/2006
by Jeff GilchristMore...
SBC is a freeware Burrows Wheeler Transform based file archiver. It's one of the fastest BWT implementations with high compression ratio. SBC also introduces a high security options with a number of strong encryption algorithms. Command-line interface only. Comment from a DCL user: Cool archiver; the author has made great effors to make it really fast and also added a lot of heuristics to behave well on multimedia data.
Created: 24/04/2001
by Mark NelsonMore...
An experimental archiver that uses a BWT algorithm to achieve superior compression. With Zzlib, you can also use Zzip as a library (dll) in one of your program. Source code of Zzip/Zzlib is released under the GNU LGPL.
Created: 19/03/2002
by Mark NelsonMore...
Szip is a freeware portable general purpose lossless compression program. It has a high speed and compression, but high memory demands (up to 20MB) too. The compression is done using a variant of blocksorting, which explains its rather high memory requirements.
Update: Michael Schindler has at long last posted the source code for szip.
Created: 31/10/2002
by Mark NelsonMore...
Arturo Campos provides an easy-to-read explanation of the BWT transformation algorithm. This algorithm achieves very good compression ratios at relatively high speeds.
Created: 17/11/1999
by Mark NelsonMore...
"Universal Data Compression Based on the Burrows and Wheeler Transformation: Theory and Practice" by Bernhard Balkenhol, Stefan Kurtz, 1998. Yet another analysis of the algorithm. Any exciting conclusions?
Created: 01/12/1999
by Mark NelsonMore...
by Balkenhol, Bernhard; Kurtz, Stefan; Shtarkov, Yuri M. The authors discuss some statistical qualities of BWT output and improve on the algorithm's performance. This paper was presented at the 1999 Data Compression Conference.
Created: 05/08/2000
by Mark NelsonMore...
The home page for Julian Seward's great BWT projects: a compression program and a library. A prelease of bzip2 1.0 was released 4/15/00
Created: 15/04/2000
by Mark NelsonMore...
Bernhard have written quite a few papers on data compression. A few of them are in German, a few in English. All in ps.gz format. Bernhard has links to published papers and preprints on this page, be sure to check them all.
Created: 02/04/2000
by Mark NelsonMore...
by Balkenhol, Kurtz, and Shtarkov. A paper that was published at DCC 1999. Some thoughts about BWT and the context tree model, alphabet modification, modification of MTF, and Grouping of symbols.
NOTE:The file type would indicate that this is a compressed file, but it actually appears to be unencoded PS format.
Created: 01/12/1999
by Mark NelsonMore...
A paper by Brent Chapin and Steve Tate describing some improvements in BWT compression. The abstract says that compression can be improved by alphabet ordering and reflecting the sorted binary strings. This version is in PDF format, a PS format is available as well.
Created: 27/11/1999
by Mark NelsonMore...
The MTF algorithm is not very exciting, but it does a really nice job of compressing streams that have been put through the BWT transform. Arturo Campos gives an explanation of how to implement it in this paper.
Created: 17/11/1999
by Mark NelsonMore...
David Scott occupies a unique niche in the world of data compression. He is very interested in how one can adapt existing compression technologies to be bijective, a term you can find defined on his web page. The stated motivation for this is to increase the difficulty of breaking an encrypted version of the file. On this page, David makes a first pass at modifying BWT to be bijective. He doesn't quite get there, but feels that he has taken some good steps towards his goal.
Created: 21/09/2000
by Mark NelsonMore...
Dr. Peter Fenwick's home page. Fenwick has links to several of his compression papers on this page, including several recent papers discussing BWT algorithms.
Created: 01/01/2000
by Mark NelsonMore...
An article by Mark Nelson that appeared in the September 1996 issue of Dr. Dobb's Journal. At the time it appeared, the BWT was relatively unknown among compression enthusiasts. This article includes source code that implements a simple test program that demonstrates BWT compression.
Created: 10/12/1998
by Mark NelsonMore...
GRZipII - is a high-performance file compressor based on Burrows-Wheeler Transform, Schindler Transform, Move-To-Front and Weighted Frequency Counting. It uses The Block-Sorting Lossless Data Compression Algorithm, which has received considerable attention over recent years for both its simplicity and effectiveness. This implementation has compression rate of 2.234 bps on the Calgary Corpus(14 files) without preprocessing filters. There are Windows, Linux, and Dos executables along with the sources.
Created: 23/05/2005
by Ilya GrebnovMore...
Burrows, M. and Wheeler, D.J., A Block-sorting Lossless Data Compression Algorithm, Digital Systems Research Center Research Report 124, May 1994. The original paper on BWT compression. This very effective compression technique was first described in this paper.
Michael Schindler reviewed it thus:Even through it is the first paper about blocksort it is still excellent literature for introduction to this method and its general ideas..
Note: after many years, the DEC web site hosting this paper finally disappeared, not surprisingly. The new link is via the CiteSeer search engine, which I hope is a permanent solution.
Created: 09/06/2002
by Mark NelsonMore...
bwtzip is an ongoing project, distributed under the GNU General Public License, to implement a Burrows-Wheeler compressor in standard, portable C++. It is research-grade in that it is highly modularized and abstracted, so that it is simple to swap out parts of the compressor without affecting anything else. This makes it easy to experiment with different algorithms at different stages of compression.
Looks like Steven T. Lavavej released a new version of bwtzip in early February, 2003. A wide variety of improvements, most of them in implementation - not visible to the end user. A description of recent changes is found here
Created: 11/12/2002
by Mark NelsonMore...
Libarchive is a programming library that can create and read several different streaming archive formats, including most popular tar variants and the POSIX cpio format. The bsdtar program is an implementation of tar(1) that is built on top of libarchive. It started as a test harness, but is quickly moving toward becoming a candidate system tar for FreeBSD
Created: 22/05/2004
by Mark NelsonMore...
This package includes 12Zip and 12Zip2. The first version uses Zip compatible compression, and the second uses a BWT variant.
Version 7.0 of the package is shipping as of May, 2004
Created: 14/05/2004
by Mark NelsonMore...
Tom has posted his source code for embedded BWT compression. Basically, he's trying to pull it off with low amounts of RAM.
Created: 04/05/2004
by Mark NelsonMore...
BAR Archiver is a free, cross-platform file compression and archiving tool written in Java. It outperforms most of the popular file
compression tools available as commercial utilities.
Compression is based on Burrows Wheeler Transformation and modified Huffman entropy coder. The compression ratio is in par
with most of the commercial grade compression utilities.
Created: 11/04/2004
by Mark NelsonMore...
In this article, I examine an indexing method that lets you find any character sequence in the source text in time only proportional to the sequence length using a structure that can compress the entire source text and index into less space than the text alone. This technique is exceptionally fast at detecting and counting occurrences of any string in the source text.
Created: 20/03/2004
by Mark NelsonMore...
WinBZip2 is an small freeware utility for working with files produced by bzip2 utility (usually it's files with .bz2 extension) and also create such files. Main features include the compressing, decompressing and testing the integrity of bzip2 files.
Created: 24/11/2003
by Mark NelsonMore...
Transform is a BWT compressor written by Michael Bone. It supports a number of very interesting features, such as automatic Base 64 representation and image output. Shareware.
Version 1.02 shipped in October, 2003.
Created: 08/11/2003
by Mark NelsonMore...
This article describes STL-compliant iostream implementations that compress and decompress using the deflate and bzip2 algorithms. It makes it really easy to use compressed streams in your C++ app.
Updated July, 2003, to fix a gzip header problem.
Updated August, 2003 to fix a couple of minor problems.
Created: 28/09/2003
by Mark NelsonMore...
UHBC is a blocksorting compressor optimized for high compression ratios. The program uses recent research results by S. Deorowicz and J. Abel for improved second stage processing after the BWT. Some extensions and sophisticated modeling provide top compression ratios, giving the best known result for Calgary Corpus (2.208 bps average). UHBC is free for non-commercial use. No source available.
Reader Grebnov I. had this to say: Compression is very good, but too slow + no source code..
Created: 19/07/2003
by Mark NelsonMore...
by Jurgen Abel and William Teahan. This paper looks at a few different techniques for preprocessing data before performing text compression, and compares the gains achieved when combining the preprocessors with PPM, BWT, and LZ algorithms.
Created: 19/07/2003
by Mark NelsonMore...
The Xceed Streaming Compression Library is a high-performance "raw" compression library. It offers the ability to compress and decompress streaming data, buffers, strings or single files and supports multiple compressed data formats. Unlike the Xceed Zip Compression Library, this ultralight library doesn't offer Zip file handling capabilities.
Created: 21/06/2003
by Mark NelsonMore...
UFUP advertises itself as a "decompression utility that handles common Unix package formats such as bzip2, gzip, and tar."
Created: 30/05/2003
by Mark NelsonMore...
ABC is a free data compression program based on the Burrows-Wheeler transformation. The source code is free for academic, research and educational use as depicted in the Abel Public License (APL). The program is developed in DELPHI as a command line program just like GZIP.
Update: Jurgen has released the source code for ABC at long last! The Delphi source is available for download from the web site and can be used under his own APL.
Created: 25/04/2003
by Mark NelsonMore...
This is another German preprint of Jurgen Abel describing the principles of the Burrows-Wheeler Compression Algorithm. An
implementation of a BWT based compressor with a compression rate of 2.25 bps on the Calgary Corpus is presented. The paper will be
published in the German journal "Informatik - Forschung und Entwicklung", Springer-Verlag Heidelberg, Association for Informatics
(Gesellschaft fur Informatik eV.)
Created: 14/04/2003
by Mark NelsonMore...
A comprehensive set of compression pointers. Unfortunately, Michael is using some sort of software that makes bookmarking into his index impossible. So instead, you must link to the main page, shown here, and locate the links to "Data Compression". Under that he has links to General Resources, Software, and Theory.
Created: 23/03/2003
by Mark NelsonMore...
This preprint of Jurgen Abel gives a short introduction into the BWCA field and proposes several improvements for the WFC stage and the IF stage.
It further introduces a new RLE scheme for bypassing the run length
information around the WFC stage. The paper is the basis of the
BWCA program ABC and the revealed approach achieves a
compression rate of 2.238 bps, which is the best result for a pure
BWCA without any preprocessing before the BWT to date (March 2003).
Created: 22/03/2003
by Mark NelsonMore...
This is a preprint of a paper by Jurgen Abel describing the functionality
of a basic but quite fast Burrows-Wheeler compressor. Jurgen reports that this will be published in PIK, a German-language journal on Communications and Information Theory.
Created: 21/03/2003
by Mark NelsonMore...
Jurgen Abel has done an enormous amount of research on the Burrows-Wheeler Transform, and has published the results on his web site. On this page you will find:
A summary of this compression technique.
Links to over 70 online papers.
Links to at least that many people involved in BWT research or development.
Extensive links to BWT source code.
This web page may now be the definitive source of information for this field.
Created: 18/03/2003
by Mark NelsonMore...
Compressia is a BWT-based archive utility with particularly high compression ratios. On Calgary, Canterbury, ACT-Text and ACT-Exe it surpasses all other BWT utilities. On Canterbury corpus it also surpasses all PPM utilities. Beta version 1.0 is available on the web site as of February, 2003.
DataCompression.info reader Juan L. said It is by far the best I've tried.
Created: 17/02/2003
by Mark NelsonMore...
Aftex Software makes a Java version of Bzip2. This includes input and output stream classes, which can be used in other Java applications. The program has both a GUI and command line interface.
Created: 03/02/2003
by Mark NelsonMore...
Jurgen is the proprietor of
www.data-compression.info,
an excellent resource for developers and researchers. Jurgen has a good supply of links to papers, conferences, books, etc. on the site, as well as executables and source for ABC, a freeware BWT compressor he wote in Delphi.
Created: 10/12/2002
by Mark NelsonMore...
This is a preliminary shot at creating an open source BWT compression engine. Things look very preliminary at this point with just a couple of files available for download and not much message traffic.
Created: 09/12/2002
by Mark NelsonMore...
Karl has created a complete BWT package, and has posted the source on this site. He also has an adaptation of N. Jesper Larsson's
Burrows-Wheeler Suffix Sorting for your perusal.
Created: 30/10/2002
by Mark NelsonMore...
Sax.net Streaming Compression helps you keep your data small and fast. Use high-performance compression and data compression code, using a class library that was designed from the ground up for integration with the Microsoft .NET framework.
In addition to being able to specify whether to prefer speed over size, Sax.net Compression offers you a choice of two compression algorithms: Industry-standard Deflate (ZIP) compression, and the newer Burrows-Wheeler (BZip2) transform, which generates especially great results when compressing XML data.
Created: 30/10/2002
by Mark NelsonMore...
The UPL Compression Library is a high-performance professional compression library. It offers the ability to compress and decompress data, buffers, strings or single files and features the latest innovations in data compression. The library offers eight extremely powerful compression algorithms. Dynamic Huffman, Arithmetic, BWT, Ppm and several Lempel Ziv flavors.
DataCompression.info user John G. had this to say: I was looking for adding a better compression to my Visual Basic project and it worked like a charm. The compression ratio is really good, better than Zip!
Created: 27/08/2002
by Mark NelsonMore...
WinImp has been re-released as freeware. This archiver can create Zip files, and extract from the usual list of Zip, ARJ, RAR, and so on. It includes a couple of proprietary (BWT-based?) algorithms that purport to do well on text files.
DataCompression.info user Mike notes that It includes 2 very good
compression methods and only the new versions of Winrar, Winace or Rk can compete with the compression ratios of Winimp.
Created: 23/08/2002
by Mark NelsonMore...
A feature-rich COM/ActiveX and DLL component for adding Zip and Unzip to your Windows apps with minimum effort. Includes full documentation with many samples and examples for VB, VC++, Delphi and C++Builder to get you started quickly. Supports the latest
zip file format updates including Deflate64 compression and Zip64 headers. Also includes BZip2 compression which is excellent for compressing XML data.
Created: 31/07/2002
by Mark NelsonMore...
YBS is a high-performance archiver based on Burrows-Wheeler Transform and distance coding modelling. It's quite fast even on skew data and achives high compression ratio. Versions are posted here for DOS and Win32. It appears that the last update was in 2000.
Created: 11/07/2002
by Mark NelsonMore...
A post by Edgar Binder discussing his distance coding algorithm. Distance coding is used in place of the MTF stage in BWT compression.
Created: 03/06/2002
by Mark NelsonMore...
This is an academic project. A library and a sample program will be developed, that will implement the Burrows-Wheeler compression algorithm, using C++ and templates. This is the same algorithm for BZip.
Created: 18/01/2002
by Mark NelsonMore...
A version of libzip2 in source format for WinCE, along with demo code and project files. Ciprian Miclaus created this port along with one of zlib, and has made them available for all manking. Thanks Ciprian!
Created: 14/01/2002
by Mark NelsonMore...
This page has links to online versions of Hirosuke Yamamoto's papers on data compression. Papers here on block sorting, coding, and more. The papers are all published in English.
Created: 01/01/2002
by Mark NelsonMore...
This article describes a variant on BWT that doesn't use the cyclical rotations of strings used for BWT, but a different scheme. DCL reader points to a Burrows paper showing that this scheme is suboptimal.
Created: 03/09/2001
by Mark NelsonMore...
A port of Julan Seward's bzip2 program to the Mac. It's free and full source is included if you are the adventurous type.
Created: 25/07/2001
by Mark NelsonMore...
Another paper by Michael Maniscalo discussing a scheme for Run Length Encoding of data that's been put through the blocksort transform. His first paper uses fixed length codes, this uses variable lenghts.
Created: 21/06/2001
by Mark NelsonMore...
This page is devoted to a new compressor called M99. The author says that M99 is a new type of statistical compressor that has speeds rivaling the fastest Huffman coders with ratios of the best statistical modeling programs. Good!
Created: 02/01/2001
by Mark NelsonMore...
A paper by Sebastian Deorowicz. Implementing the BWT transform is nice and simple, but what you do with the transformed data is where all the action is. Traditionally, we use Move To Front followed by an entropy encoder. Sebastian talks about some alternatives that help compression.
Created: 29/11/2000
by Mark NelsonMore...
Paolo Ferragina's research in data structures and string matching naturally lends itself to Data Compression and to the problem of indexing compressed data. See the link to his recent papers on indexing BWT compressed files. He is currently an Associate Professor of Computer Science at the University of Pisa.
Created: 09/11/2000
by Mark NelsonMore...
Michael Schindler's page describing a sorting algorithm that was presented in a poster session at DCC '97. Links to his source code, plus links to the paper and poster in postscript format. Update: Michael has made additional source code available.
Created: 28/09/2000
by Mark NelsonMore...
Giovanni Manzini has published papers covering a few different topics in Data Compression, including several recent works on Burrows-Wheeler algorithms. He is currently an Associate Professor of Computer Science at the Universita degli Studi del Piemonte Orientale in the the most northern reaches of Italy.
Created: 25/09/2000
by Mark NelsonMore...
A short paper discussing various preprocessing techniques that can be used in BWT-based compression programs. RTF format.
Created: 06/09/2000
by Mark NelsonMore...
A paper by Sebastian Deorowicz appears here, which claims to make improvements in efficiency to BWT compression, giving the best ratios of any BWT program on the Calgary Corpus.
Created: 09/07/2000
by Mark NelsonMore...
David Fetter wrote a Linux howto telling you how to compile, install, and use bzip2, the BWT-based freeware archiver.
Created: 16/05/2000
by Mark NelsonMore...
by D. Baron and Y. Bresler. Postscript paper describing a technique that uses BWT to preprocess a sequence of symbols, then mung.
Created: 28/03/2000
by Mark NelsonMore...
This page discusses the various compression schemes used by EGI, Shannon-Fano, BWT, and RLE. EGI is a player of animation sequences.
Created: 14/11/1999
by Mark NelsonMore...
The site bills itself as an animation, but in fact it is just a couple of screen captures that demonstrate an animation.
Created: 22/12/1998
by Mark NelsonMore...
A paper discussing BWT text compression in Proceedings of the 19th Australasian Computer Science Conference, Melbourne, Australia. Jan 31 - Feb 2, 1996.
Created: 08/12/1998
by Mark NelsonMore...
Contains his implementation of the BWT algorithm, in the program bred. Along with this are some notes and papers on his implementation of BWT
Current list of files:
bexp.c
bexp3.c
bred.c
bred.ps
bred.ps.Z
bred2
bred3.c
bred3.ps
exp.c
huff.ps
mintext.ps
mintext.tex
red.c
tea.ps
tub.ps
wake.ps
xtea.ps
xxtea.ps
Created: 30/11/1998
by Mark NelsonMore...