Archive: Better compression


Hi everybody,

After many tests, it seemed that BZIP2 was much better than ZLIB (espacially for text files). Is it possible to include BZIP2 compression in NSIS? Are there any problems in doing this? If so, maybe I can help (a bit because I'm not a hardcore coder ;).

Thanks for the great software ;)


I'm not really an expert on this but you have to keep in mind:
- how big is the decompression routine? NSIS tries to be only 36 KB. Maybe the bzip2 routine is more complex?
- Is Bzip2 open-source/unpatented? It may be open-source, but the advantage of ZLib is that it doesn't contain any patented stuff, thus it can't infringe them or cause legal problems.

Then again, only justin himself knows why he choose ZLib (I've been using it for about a year now and I like it - fast and nice compression).


-The man at http://sources.redhat.com/bzip2/ (homepage of bzip2) says it's patent free "as far as [he] knows"...
-It's open source.

For the routine size, I don't really know but I can make some tests.


I said I was no expert ;)
Really, only Justin can answer this one. And my reasons were only suggestions of course.


From what I've known about bzip2 before (I haven't tried it recently) it is much slower at compression and decompression than zlib, though it does a better job.

It is open-source but not GPL (exact same license as zlib), and I supposed it could be optimized for size to end up as small as or smaller than the current zlib implementation.

I don't know how high the priority for this sort of thing is, since Justin keeps promising a reboot-flagging-and-prompting system (I'm still waiting :)). But it should be do-able.


I actually made a bz2 using nsis 1.1 a while back. The exe header was marginally larger. bz2 appears to be patent free, and is open source (under a similar license to zlib, I believe), but here are the drawbacks:



-Justin

zlib, bzip2?
I like zlib/gzip and bzip2. But I think there is one option overlooked: MS compress/cabarc + standard Data Decompression library (LZInit/LZread/LZCopy/LZClose). It is preinstalled on all win32 platforms.

Pros:
1) Virtually 0 (zero) overhead for installer. You don’t need to carry zlib or bzip2 library.
2) It compresses better than zlib.

Cons:
1) There is no Data Compression library. You have to run "compress" or "cabarc" utilities from NSIS using command line.
2) These utilities are free but not part of any OS. Developer should download them from MS. (Is it possible to redistribute them?)

More on Pro#2. In order to automatically distribute my stuff I use NSIS installer packed in .cab. Initially I created compressed installer with compressed header and cab it using no compression (it will not compress anyway). But I found that if I use _uncompressed_ installer and cab it with highest compression (LZX:21) it is 20% smaller. I run several tests on my files and “compress” was significantly better than zlib.

I think this alternative is worth adding to NSIS. It reduces installer size on both fronts.


I can say only 1 thing: It's a Microsoft compression Lib. Now, how come there aren't any compression libs for it? Because it isn't open-source and it contains loads of patents no doubt.

If you really want the best compression, you'd have to use WinRAR to create self-extracting things, or maybe WinACE. But they add 32 KB of overhead themselves (minimum!).


Are we still talking about Win32 installer? ;)

I don't advocate to use LZ* stuff for all-purpose apps. In my opinion it is smart choice for Win32 installer.

We don't care about load of patents because installer will not carry any proprietary code but calls to standard Win32 API. It uses Win32 API now and nobody got sued. :)

NSIS (compiler) will not have any proprietary code but calls to external utilities. It does it now to compress header. :)

I don't see any problem in this particular case. Possibly WinRAR will compress better but we are concern with overall installer size, don't we? It means that if we are to make judgement on compressors, we have to take into account "compressed data" + "decompression library overhead". The latter is 0 (zero) for LZ* stuff. And it compresses better than currently used zlib. Anyway small decomp lib size will be crucial for small size components.


I know I've seen code floating around for doing
Microsoft LZ compression like the original COMPRESS.EXE
and EXPAND.EXE suite from Windows 3.x.

LZEXPAND.DLL comes with every version of Windows at this
point so it could be used with almost zero overhead. I
wonder if the trade off is worth it.
How big is the zlib code? Would this really buy back
enough space to make it worthwhile?

The new COMPRESS.EXE include a "mszip" compression that
is not backwards compatible with all versions of Windows.
I've not seen code for generating those yet and since
it don't exist everywhere is probably not useful.

I'm not fond of CAB file format and the new MSI crap
but I'll probably have to learn it too.

NSIS is going to continue getting a workout for my
installs at least for the near future.