Archive: VPatch 3 wish list


VPatch 3 wish list
I am currently rewriting VPatch (the patch generator at least) from scratch again in C++. I want to add/fix the following issues:
- the Delphi GenPat cannot handle files larger than 500 MB or so. Maybe even to the point where the files aren't kept completely in memory anymore... but that will hurt performance.
- add storage of target file date (metadata) and restore it on the target file
- use MD5 instead of CRC32 to decrease probability of clashes. I have had some reports of problems with CRC32... but it will make the NSIS plugin slightly bigger probably.

And the rest stays the same.

All this means I have to break compatibility of the current VPatch format. Are there any other feature requests?
What do you think about breaking the format in general?


I want to known what problem with CRC32?


Some people had a problem with the source files in a single patch having the same CRC, and/or the source and target file having the same CRC.
Since file identification is done on just the CRC, that can break things. It would be 'better' to use either MD5 or CRC+filesize.

I'm starting to wonder however, perhaps I should remain backwards compatible somehow to the old format. Breaking everything might not be worth it, and the features needed can be squeezed in as well.

Can anyone tell me if patching files bigger than 2GB is an issue for anyone?

Also, VPatch does not support 'direct patching' where there is just a byte patch and the rest of the file remains the same. For big files, that can be handy, but then there is no way to verify the checksum or undo anything if the patch fails...


GREAT to hear, vpatch will be updated. i think, it's a very useful and powerful patching feature.

Also, VPatch does not support 'direct patching' where there is just a byte patch and the rest of the file remains the same.
this would be useful, indeed, as it may decrease patch's filesize.

Direct patching does not lead to smaller files, as it is really a more limited version of the current patching possible. The difference in patch file size would be about 28 bytes, but this is not compensated by the increase in NSIS plugin size (which would get bigger). Also, I do not know how the situation where patching the file fails should be handled in direct patching: you have modified the original file, but it did not end up the way you wanted it to...

Current version target is now 2.5 instead of 3 because compatibility will remain. New in 2.5 will be: low memory usage! The memory usage is some fixed size (say 16MB or so) plus a certain percentage of the original file, depending on the block size you select. Smaller blocks lead to smaller patches, but uses more memory. I do not yet know how this affects performance.

I have found a case where VPatch will 'break down' and become very slow (the algorithm suddenly becomes quadratic instead of N log N). This happens a lot with big files.

Does anybody know a good checksum function which can convert a block of a certain size (always a power of 2) to a 'unique' checksum which is very different for small changes? MD5 is good, I know, but computing it, how fast is it? I need to do it millions of times so it has to be fast.


The Delphi version used a slightly modified variant of Adler32 (undocumented), the C++ version now uses true Adler32 and the first 8 bytes of a block for block identification.
This greatly reduces the 'quadratic failure' I had before, but I have now fixed it by limiting the number of blocks that can match (to say 100).

To get a speed impression: patching a 37 MB file to a 19 MB file takes around 8 seconds with 2.5 and 107 seconds with 2.1. :D


Target file date storage will mean that the NSIS plugin needs to be extended (a little bit).

I have attached the change needed to the CVS vpatchdll.cpp file to support the upcoming new C++ patch generator. Compatibility with previous versions is maintained, but this updated DLL will be mandatory since now file dates are stored by default in the upcoming patch generator.

Can someone with CVS access upload this file?


I'll upload it when it's all ready.

BTW, why is targetModifiedTime global if it's only used in DoPatch and is always initialized?


Good point. It doesn't have to be. I would have declared it 'inline' together with the initialization, but C doesn't like declaring variables there, only at the start of the function. I kind of forgot that.

I am working on a new distribution with everything (including much better docs). Also, I am going to have a VPatchLib.nsh with a macro which allows easy use of Vpatch, can that go into the Include folder when it's ready?


Originally posted by Koen van de Sande
... can that go into the Include folder when it's ready?
Sure.

I have uploaded a release candidate of VPatch 3. This version is feature-complete and is ready for testing.

You can download it from http://www.tibed.net/files/vpatch3releasecandidate.zip

What's new in v3-RC:
- new patch generator in C++, much faster and lower memory usage
- dates on files to update are preserved (given date of the target file on the build system)
- updated NSIS example using new VPatchLib, which handles error checking for you. Also, it should be clearer than the old example.
- more verbose error messages in the EXE runtime
- new DLL runtime
- fixed incorrect documentation
- GUI now uses the command-line patch generator instead of a built-in version, making results the same (and reducing complexity).

Note: this version has its version number set to 3.0 and will be released as 3.0 if there are no problems. So please remember, you are using a release candidate which might be updated later.