I'm now able to compile (ansi) with MinGW, get latest source from SVN and add these changes:
And you need to rename Call.S to Call.sx if you want to compile system.dll
I did not add the substart change to SVN since it is a bit of a hack, it would be better if we actually fixed the SCons stuff.
Not sure what to do about the Call.S issue, it might be SCons/MinGW version specific.
- NSIS Discussion
- Unicode
Archive: Unicode
jimpark
21st September 2011 11:33 UTC
Let's be fair MSG. If the merging of my code into the trunk is really what's holding the new NSIS up, then it's done. It was done four years ago when I presented my work. The reality is they are reinventing the wheel. They are redoing my work, their way.
And frankly, in the last four years, there hasn't been much work being done on NSIS at all. The biggest being the Unicode port I did four years ago. Just check the code changes here. Apart from the copyright year updates, I never had to struggle very much to update my codebase to reflect the trunk's. Usually, the changes per release was just a few lines here and there.
And really what does it matter now anyway? I personally don't know a major project that uses the ANSI version. Do you? All the products I use, if they use NSIS, use my Unicode version. The only motivation I can think of to use the ANSI version is if I needed to support Windows 98/ME customers. But what are these people connecting to the internet with abandoned OSes anyway? They don't even get security updates from Microsoft!
But you might be right about me being able to do it alone. Perhaps I should ask for help from the people who have been offering. But so far, it hasn't been that much of a workload. That may of course change with the coming Windows 8 support.
MSG
21st September 2011 12:01 UTC
I'm not a programmer, so I can't comment on why work in your branch is or isn't or should or shouldn't be compatible with work in the trunk. The fact remains that the progress in trunk is stalled because of unicode. It's not guaranteed that progress will continue if the stalling factor is removed, but it's pretty certain that progress will not continue as long as it's stalled...
As for the trunk unicodification, if they're redoing your work their way, then wouldn't it be beneficial to all of us if you would help in redoing it? You have the experience, after all. The alternative is either to wait till trunk unicode is finished (unlikely) or to drop trunk development completely.
The latter would be, in my opinion, a bad move. Your work, while cool and very very usable (kudos), is almost source-closing NSIS because it doesn't use the existing NSIS development infrastructure. Fragmentation destroys human & technical resources.
jimpark
21st September 2011 13:55 UTC
Well, open source also has a way of moving forward when the product gets neglected, stalled, etc. It gets forked and the original dies. I don't know if this is what's happening but with the dearth of activity (releases), it's a possibility.
If you are asking me to redo my work, then no. It could be that my approach of how it should be done is different from what they want to do. And in that case, they have to take the time to actually implement it. If you are asking me to help them do their approach when I believe the approach I took is better, again, no. There is no incompatibility with my approach since I've done it and it works. And I think the fact that it's taking more than two years to do what I've done four years ago should speak for itself.
Anders
21st September 2011 20:41 UTC
The official NSIS needs to support other compilers (and POSIX support), not just VS2005+. We cannot rely on the CRT to provide stdio ANSI/UTF8/UTF16 conversion like the fork does...
jimpark
21st September 2011 20:54 UTC
The CRT does not provide UTF-8, UTF-16, UTF-32, and ANSI conversions. Hence, the NSIS Unicode does not rely on them. What you may be facing is the fact that wchar_t is defined as different widths for the different platforms. The CRT in Linux is going to be 32-bits. The CRT on Windows is going to be 16-bits. If you are trying to do cross-platform building of win32 programs, then you will need to make sure that the CRT that you link to will also be for UTF-16LE. Otherwise, all your wide char string functions will not work right because they want wchar_t to be 32-bits. Or you will have to link in ICU which is a huge library and then convert everything to the "native" wchar_t.
I say forget building NSIS on other platforms. Why bother? NSIS is only a Windows installer solution. Build it on Windows. As for other compilers on Windows, I'm sure with a little work people can get my NSIS Unicode to build on gcc, et al. Is this what you guys are stuck with? I don't have any expertise in cross-platform building and I don't have much interest (no need yet) so I won't be much of a help there.
Anders
21st September 2011 23:00 UTC
Originally posted by jimpark
The CRT does not provide UTF-8, UTF-16, UTF-32, and ANSI conversions. Hence, the NSIS Unicode does not rely on them.
Then why are you passing "ccs=<encoding>" to fopen?
jimpark
22nd September 2011 01:03 UTC
You are right, Anders. I do remember using that MS extension to fopen. It was very convenient at the time. Are you guys adamant about not linking to MS CRT? But you aren't adamant about calling Win32 API, right?
Otherwise, you might have to wrap a new class for Unicode text file and simply pass that object around instead of using CRT IO API on the FILE*. That is a more extensive change. But is that all you are stuck on or are there other issues?
LoRd_MuldeR
22nd September 2011 01:21 UTC
Originally posted by jimpark
The CRT does not provide UTF-8, UTF-16, UTF-32, and ANSI conversions. Hence, the NSIS Unicode does not rely on them. What you may be facing is the fact that wchar_t is defined as different widths for the different platforms. The CRT in Linux is going to be 32-bits. The CRT on Windows is going to be 16-bits. If you are trying to do cross-platform building of win32 programs, then you will need to make sure that the CRT that you link to will also be for UTF-16LE. Otherwise, all your wide char string functions will not work right because they want wchar_t to be 32-bits. Or you will have to link in ICU which is a huge library and then convert everything to the "native" wchar_t.
Actually on Linux you would not use
wchar_t at all, because on Linux all the API functions support Unicode via UTF-8 encoding, so you can use
char all the way. On Windows this doesn't work. The ANSI functions in the Win32 API function, that are defined with the
char type, do
not support UTF-8 (and thus do not handle Unicode strings). And the Unicode-enabled functions in the Win32 API are defined with the
wchar_t type and require an UTF-16 encoding. Everything could have been so easy, if M$ had decided to use UTF-8 in their API!
The general solution for cross-platform applications that build on Windows as well as on Linux is using macros. Instead of
char or
wchar_t, you use a TCHAR macro. This macro will expand to
wchar_t on Windows and to
char on Linux. Needless to say that all string manipulation functions need to be replaced with macros too, when going this route. It needs a lot of code change.
The other solution is: Use
char with UTF-8 all the way - even on Windows. With that solution you can use the normal char-based string functions and don't need any macro "magic". However, on the Windows platform, you will need to convert from UTF-8 to UTF-16 before certain API functions. For example, on Windows, you can't pass the UTF-8 string to fopen(), but have to write your own fopen_utf8() that internally converts to UTF-16 and then calls wfopen(). That is what I did with a lot of code that had been ported from Linux to Windows, but did not support Unicode yet. The Win32 API provides the needed functions to convert between UTF-8 and UTF-16.
This is the code I use to enable Unicode support on Windows in applications that use
char all the way*:
http://pastie.org/private/hftbnzujnzfresihpnpa
(* as is the case with most code ported from Linux or written by people who don't care/know about Unicode support)
All fopen()'s must be replaced by fopen_utf8(). Also the "main" function has to be wrapped like this:
http://pastie.org/private/8fdvykqceqekz002bs9xza
Actually I saw that idea in the LAME code first :)
jimpark
22nd September 2011 13:36 UTC
When I was referring to wchar_t width differences, I was referring to string functions found in <wchar.h>. http://pubs.opengroup.org/onlinepubs...h/wchar.h.html
I do think that _wfopen() not existing in Linux is an oversight, but one that can be easily reimplemented. Just convert your wide strings to UTF-8. Converting from UTF-8 to UTF-16 and vice versa is not hard. You can link to ICU, iconv or (if you have access to Windows API) use WideCharToMultiByte / MultiByteToWideChar using CP_UTF8 as your target. Or you could write your own from the Unicode specs which is also not hard -- I've done it before.
But since we are targeting Windows as the working platform, keep all the strings as UTF-16LE. We aren't really porting a Unix program into Windows, it's really the other way around. Also, if we restrict building of NSIS to Windows, then I don't think you have many problems. I think MinGW needs to link to MS CRT (from what I've heard), so you have all of Microsoft's extensions anyway.
I still don't really understand the motivation of trying to build NSIS on Linux. It's an exercise in cross-platform building and it might be interesting to some. But since NSIS itself needs to run on Windows, I don't really see how anyone would benefit from the exercise.
LoRd_MuldeR
22nd September 2011 13:50 UTC
Originally posted by jimpark
I still don't really understand the motivation of trying to build NSIS on Linux. It's an exercise in cross-platform building and it might be interesting to some. But since NSIS itself needs to run on Windows, I don't really see how anyone would benefit from the exercise.
There are quite a few OpenSource projects that originate from the Linux world and the main developers don't care much about Windows. These people often support Windows ports, as long as it doesn't make too much trouble and as long as it can be done from their usual Linux build environment. So cross-compiling Win32 binaries from Linux is okay, having to set-up and use a Windows machine is not. That's why it may be convenient to have a "native" Linux port of MakeNSIS, even though the final installer will never run under Linux (outside of Wine).
Still I think when compiling the EXE headers on Linux one would have to use a cross-compiler anyway (because you need them to be compiled as Win32 binaries). And that cross-compiler would use 16-Bit wchar_t's, just like any C/C++ compiler on Windows...
jimpark
22nd September 2011 14:49 UTC
Excellent points, Lord Mulder. But is that what we are aiming to provide, an MakeNSIS that runs natively on Linux? I've yet to see it. And if that's what we are aiming to do, I think that's admirable. But if we are just trying to build NSIS (including MakeNSIS) that only runs on Windows but built from Linux, who are we doing this work for? Ourselves, the builders of NSIS? But why? I have a Windows box. I need to actually run NSIS to build my setup and test my Windows products.
LoRd_MuldeR
22nd September 2011 15:04 UTC
I agree that being able to cross-compile Win32 binaries of MakeNSIS from Linux isn't very useful. It certainly would be a nice addition, but IMHO it should be "low priority" feature. But then again you would be using the cross-compiler (not the native Linux compiler) to compile MakeNSIS for Win32 from Linux anyway. And again this eliminates most portability issues, I think. So I guess most problems you will be facing will be "MinGW/GCC -vs- MSVC" quirks rather than "Win32 -vs- Linux" issues...
(If something compiles with the MinGW/MSYS compiler on Win32, it probably compiles with the cross-compiler on Linux too)
DrO
22nd September 2011 15:31 UTC
i think the main point is that makensis is already provided with the ability to cross-compile, so dropping that probably isn't going to be liked by those who do rely on it. yes it doesn't quite make sense but it could be from part of an automated build system pulling things from a repository and building from there on a linux machine for example. obviously there was a reason why it was added as support so there must have been the demand.
as for the whole this build vs 'official' talk, i was under the impression when wizou was porting things back that it was generally a like for like thing. though that all seems to have stopped / lost any focus which probably isn't helping things. but i can see why people want to maintain some sort of legacy support as tbh, a unicode build doesn't offer that much advantage when you're only providing something for people where ansi will fit - i don't disagree that unicode is useful but there are reasons why some of us do still prefer to use the 'official' builds compared to the unicode ones.
-daz
LoRd_MuldeR
22nd September 2011 16:09 UTC
First of all, if written properly, the very same code can be compiled as ANSI or as Unicode. So if you still want to support an ANSI version while offering a Unicode support, this is doable. It's not hard to do for new code, though it makes some work to "upgrade" existing code. But if you have to rewrite you code for Unicode support anyway, making it support Unicode and ANSI at the same time is minimal extra work.
But why would you want an ANSI version of NSIS nowadays? Every Windows system that is used nowadays does support Unicode. Even good old Windows 2000 (which is not provided with important security updates anymore for more than a year now!) did support Unicode just fine. I know that there are still a few people using Windows 2000, although nobody can seriously recommend this, but even older Windows versions, like 9x and NT 4.0, definitely are a lost case. Not worth bothering. Really! At the same time the lack of Unicode support causes serious problems on non-archaic systems: Can't install to directories that contain Unicode characters (and these may exist on every system!), can't access or create files whose name contains Unicode characters, can't display text that contains characters not available in the user's current ANSI Codepage (a huge limitation for NSIS' otherwise great translation system). And so on...
Why not make the cut:
Go for NSIS 3.xx to be fully Unicode and keep the 2.xx branch for backward-compatibility (for those who really care). It's not unusual at all in the software world to drop support for outdated OS in never versions and maintain the old version for backward-compatibility for a while.
(BTW: If current NSIS already supports cross-compiling, how will adding Unicode support destroy this?)
http://img18.imageshack.us/img18/885...unicode.th.jpg
Anders
22nd September 2011 17:25 UTC
Originally posted by jimpark
You are right, Anders. I do remember using that MS extension to fopen. It was very convenient at the time. Are you guys adamant about not linking to MS CRT? But you aren't adamant about calling Win32 API, right?
Well, to maintain POSIX support we try not to use too much Win32 stuff in makensis.exe since it has to be in a #ifdef _WIN32 block with a #else for the POSIX analog.
Using the basic CRT stuff is not a problem (fopen etc), but using compiler extensions can be a problem.
So the only solution I see is some sort of wrapper around FILE* that reads ANSI/UTF8/UTF16 and spits out UTF16 on the other side. (I was hoping to add UTF8 support to the ansi version but I'm not sure if that is going to happen (I already have the code for .nlf loading, the normal source files is the problem))
LoRd_MuldeR
22nd September 2011 18:06 UTC
Now I am confused :confused:
Are we still talking about cross-compiling MakeNSIS.exe from a Linux system or are you talking about a native Linux port of MakeNSIS?
Anders
22nd September 2011 20:44 UTC
Originally posted by LoRd_MuldeR
Are we still talking about cross-compiling MakeNSIS.exe from a Linux system or are you talking about a native Linux port of MakeNSIS?
What would be the point of cross-compiling makensis and then having to use wine or whatever to run it? Even if you don't have a cross-compiler you can compile makensis on POSIX but you need to grab the stubs and plugins from somewhere else... (See "G.3 Building on POSIX" in the helpfile)
LoRd_MuldeR
22nd September 2011 21:21 UTC
Originally posted by Anders
What would be the point of cross-compiling makensis and then having to use wine or whatever to run it?
Exactly my point :)
And that's why I was a bit confused about jimpark's statement "MakeNSIS that runs
natively on Linux? I've yet to see it".
Still it is perfectly possible to write code that supports Unicode
and compiles on POSIX (Linux, MacOS X) as well as on Windows.
Two common approaches have been mention here:
http://forums.winamp.com/showpost.ph...&postcount=488
(So while retaining cross-platform compatibility makes things a bit more complicated, it shouldn't be a showstopper for Unicode NSIS)
jimpark
22nd September 2011 21:57 UTC
Sorry, I did not mean to confuse but to really show the lack of utility in having MakeNSIS build on Linux. I think the only use is if someone really wanted to build their own toolchain, including NSIS, on Linux as some sort of a build process as DrO mentioned. I still don't know why anyone would do that. And if someone does do that, does he also cross-build GCC or some other compiler for Windows on Linux and then decide to use that on their Windows builds? The fact is, in order to create a Windows installer, they need to run MakeNSIS on Windows.
Why can't we just forget catering to these hypothetical people. It's not worth stalling the project over. They are getting a Windows setup making toolchain that needs to be built on Windows. They will still need to have a Windows box as part of their resources to actually run MakeNSIS and make their installer. And if a few of these hypothetical people exist, do they matter so much to stall the whole development?
Again, I would not be making this argument if MakeNSIS did run natively on Linux, making NSIS a cross-platform setup builder. But that is not the case.
LoRd_MuldeR
22nd September 2011 22:14 UTC
Originally posted by jimpark
Again, I would not be making this argument if MakeNSIS did run natively on Linux, making NSIS a cross-platform setup builder. But that is not the case.
If I understand Anders correctly, then MakeNSIS already
can be build as a native Linux application (only the EXE headers can't, for obvious reason) and so it
does run natively on Linux. And that, indeed, makes it possible to build Windows applications on Linux (using the cross-compiler) and then package them in an NSIS installer with MakeNSIS for distribution. No Windows machine required.
But once again: That is not the reason that prevents MakeNSIS from supporting Unicode. It is possible to write code that builds natively on Windows, builds natively on Linux
and supports Unicode on both platforms. See the link in my previous post, for example.
Anders
22nd September 2011 22:18 UTC
Originally posted by jimpark
And if a few of these hypothetical people exist, do they matter so much to stall the whole development?
It is in the Debian repository IIRC.
Also, we got a patch from the Fedora/RedHat downstream
this month so they are not hypothetical.
Even if we dropped POSIX support, VC6, VS2003 and MinGW would still need fixing... (I know
you don't care about VC6/VS2k3 but fixing MinGW would probably also fix the older VC's)
Borland C++ and Open Watcom C/C++ don't not have official support and I don't know how long it has been since anyone tried those but they might also have similar issues...
The point is, POSIX is not stalling anything, the usage of MS extensions is the problem.
jimpark
22nd September 2011 22:37 UTC
To help me understand, they can't build exehead on Linux or do they using a cross-platform compiler and somehow with stubbed libs? If they need to build their setups on Linux, can they do it under Wine?
jimpark
22nd September 2011 23:04 UTC
Anders, also not just MS extensions but also Win32 API calls, as well. Has anyone looked into using iconv or ICU for Unicode support? The standard runtime does not provide enough support to do what we want.
LoRd_MuldeR
22nd September 2011 23:07 UTC
Originally posted by jimpark
If they need to build their setups on Linux, can they do it under Wine?
They could. But that's probably not what they want :D :rolleyes:
(Also Wine is limited to x86 systems by the way it works. Linux also runs on many other architectures)
http://img6.imageshack.us/img6/6006/nsiswine.th.jpg
Originally posted by jimpark
Anders, also not just MS extensions but also Win32 API calls, as well. Has anyone looked into using iconv or ICU for Unicode support? The standard runtime does not provide enough support to do what we want.
What exactly is missing?
kichik
23rd September 2011 00:15 UTC
makensis has been fully portable to POSIX since version 2.01. The stubs are either cross compiled with MinGW or copied from the Windows build. Since I've added this feature, I've seen it ported to numerous platforms, adopted by some Linux distributions, ported to Darwin and pop-up on one of the *BSD trees. I received help requests to get it working on AIX and other non-mainstream platforms. The Debian and Redhat ports in particular saw caring maintenance with patches, fixes and even upstream patches.
I never stopped to ask why. People just wanted it. I don't have statistics, so I can't give out exact numbers. But they exist. I think the strongest selling point is for projects mainly aimed at Linux. By giving them the option to build everything on Linux, we make the Windows porters job easier. They don't have to maintain their own systems to build the application and can streamline the Windows port into the main project build cycle.
In the past, MinGW was also the only decent free solution for building on Windows. Visual Studio Express didn't exist or wasn't good enough.
As for Win32 API being used in makensis, any of them that still exist in the code are implemented for non-Win32 platforms. I think the LZMA compression module still has some threading functions implemented using pthreads.
--
All this is just technical details. Regarding the future, I do agree Unicode is required.
jimpark
23rd September 2011 03:10 UTC
Thanks Kichik for the info. As to Lord Mulder's question, the standard library does not have a way to convert from one form of Unicode encoding to another, not to mention the normalization of Unicode. For example, Windows wants UTF-16 little endian and likes precomposed characters. Linux's wide chars are UTF-32 little endian if running Intel but possibly big endian on other processors. Mac OS X uses UTF-8 fully decomposed characters. So now you see the extent of the problem with supporting Unicode and multiple platforms. If we want to use the wchar library in the C runtime, we have to encode the Unicode strings to what they want to see in that platform. If the strings are going to be stored in a binary format that will be used on Windows, such as the strings in the setup, they need to be encoded to UTF-16LE.
The new C++11 standard provides more Unicode support, including expressing Unicode string literals as 8 bit, 16 bit or 32 bit, but we will still need to write some code for byte ordering. But using the new C++ standars also seems to be out of the question if we need to support legacy compilers.
No wonder the project is stalled. It's very difficult to move forward with so many restrictions. If this is what needs to be done, then keep the strings in the preferred encoding on the platform so that all the wchar string functions available since C++03 standard can be used during the NSI script processing. Then when saving the string, save them as UTF-16LE so that the exehead doesn't have to convert the strings. So string saved to disk is in the native encoding of Windows, but the string in memory is what's expected by the wchar functions.
The other option is to stick to a single encoding even in memory but that would mean not being able to use the wchar functions in the C runtime and hence having to reimplement them on platforms if the chosen Unicode encoding isn't what the platform CRT likes.
MSG
23rd September 2011 05:58 UTC
(Call me an uninformed smartypants, but I see Jim say that the problem is very complicated, Mulder says that it's already been solved, and Anders states that the real problem lies elsewhere entirley (namely at ANSI .nsi reading). So while POSIX makensis has been cleared up, I'm wondering if there's still some confusion on what exactly needs to be figured out / agreed upon?)
LoRd_MuldeR
23rd September 2011 10:53 UTC
Originally posted by jimpark
Thanks Kichik for the info. As to Lord Mulder's question, the standard library does not have a way to convert from one form of Unicode encoding to another, not to mention the normalization of Unicode. For example, Windows wants UTF-16 little endian and likes precomposed characters. Linux's wide chars are UTF-32 little endian if running Intel but possibly big endian on other processors. Mac OS X uses UTF-8 fully decomposed characters. So now you see the extent of the problem with supporting Unicode and multiple platforms. If we want to use the wchar library in the C runtime, we have to encode the Unicode strings to what they want to see in that platform. If the strings are going to be stored in a binary format that will be used on Windows, such as the strings in the setup, they need to be encoded to UTF-16LE.
I see that converting between different Unicode encodings might be a problem. On Windows there is a Win32 API function to convert UTF-16 to UTF-8 or some ANSI Codepage (and vice versa). I'm not sure if there is an equivalent on POSIX/Unix systems.
But: I still don't understand why these conversions are needed at all :confused:
On Linux (and other POSIX systems too?) all the API/CRT functions use UTF-8. The NSIS source code files (.nsi) should be encoded in UTF-8 too - Unicode text files almost always are UTF-8 encoded. So on these systems we should be able to use the 'char' type with UFT-8 encoding all the way. And, as UTF-8 is a sequence of individual bytes, there should be no "endianess" problems either. No need to use the 'wchar_t' type with UTF-16 or UTF-32. And no conversions needed anywhere.
Now, on Windows, we can either stick with the 'char' type and use UTF-8 or we can use 'wchar_t' with UFT-16. In the former case we need minimal code change compared to the 'POSIX version', but before each Win32 API we need to convert the argument from UTF-8 to UTF-16 (not really a problem, because the Win32 API also has the required function to convert); it basically needs a simple UTF-8 wrapper function around each Win32 API function (at least around those that deal with strings). In the latter case we would have to replace any 'char' or 'wchar_t' with a TCHAR macro, that expands to 'char' on Linux/POSIX and to 'wchar_t' on Windows, in the code. Some string manipulation functions would have to be replaced by macros too. But generally you can simply put the required macros in a single header file (with a big "#ifdef _WIN32 .... #else .... #endif" around to switch between the different OS) once and include it everywhere...
jimpark
23rd September 2011 11:12 UTC
Now remember that makensis also runs on Windows. In fact, most of the time it does. :) And because all Unicode text must go through UTF-16LE conversion to display correctly, we have this problem. Well, maybe this isn't that bad if we say that makensis is UTF-8 and the Windows GUI shell simply converts UTF-8 to UTF-16LE. The only thing about UTF-8 is that it is so variable in length. MS tries to keep one character per 16 bits which is why it prefers completely precomposed characters, i.e. one code point that has both base and diacritic characters. But UTF-8 is very variable. Each character could be one, two, three, or four bytes longs, or longer in the case with diacritics. And then exehead would have to do the same conversion from UTF-8 to UTF-16. I don't like that the exehead would have to do extra work to deal with all the strings. But this is an option that might then push all the conversion stuff out of the POSIX code and into the Windows code. Anders, this is what you were thinking, right? What did you find?
LoRd_MuldeR
23rd September 2011 11:41 UTC
Originally posted by jimpark
Now remember that makensis also runs on Windows. In fact, most of the time it does. :) And because all Unicode text must go through UTF-16LE conversion to display correctly, we have this problem.
What problem?
On Windows a simple MultiByteToWideChar(CP_UTF8, 0, input, -1, Buffer, BuffSize) will do the conversion.
Originally posted by jimpark
Well, maybe this isn't that bad if we say that makensis is UTF-8 and the Windows GUI shell simply converts UTF-8 to UTF-16LE.
I think you would do the UTF-8 -> UTF-16 conversion yourself before each Win32 API call. The Shell API doesn't handle UTF-8.
Originally posted by jimpark
The only thing about UTF-8 is that it is so variable in length. MS tries to keep one character per 16 bits which is why it prefers completely precomposed characters, i.e. one code point that has both base and diacritic characters. But UTF-8 is very variable. Each character could be one, two, three, or four bytes longs, or longer in the case with diacritics.
Yes, in UTF-8 characters have variable length (in bytes). But that's not a problem. Each byte still "looks" like a normal ANSI character and at the end there always is a NULL terminator. So any function that was designed to handle ASCII characters will handle UTF-8 too. All ASCII characters are even identical between ASCII and UTF-8. Only problem I see with UTF-8 is that we need to allocate some "extra" space (if you want to ensure that
n characters fit in the buffer,
4*n bytes are required). But on the other hand: If you allocate
n bytes buffer, with UTF-16 you can store a maximum of
n/2 characters. With UTF-8 you can store up to
n characters,
maybe fewer...
BTW: UTF-16 also may use
pairs of consecutive 16-Bit words ("surrogate pair"), because 2^16 is not enough for all Unicode characters!
Originally posted by jimpark
And then exehead would have to do the same conversion from UTF-8 to UTF-16. I don't like that the exehead would have to do extra work to deal with all the strings. But this is an option that might then push all the conversion stuff out of the POSIX code and into the Windows code. Anders, this is what you were thinking, right? What did you find?
Now, if we talk about the EXE header again: Here we do not need to care about POSIX at all. It will never run anywhere else but on Windows (Wine). It will be compiled either on Windows or by a cross-compiler. Thus in the EXE header it is safe to write Windows-specific code, i.e. use 'wchar_t' with UTF-16 all the way and call Win32 API functions directly. The only conversions that will be needed is: Strings that have been generated by MakeNSIS and are stored in the installer EXE's "data" section need to be converted from UTF-8 to UTF-16 once. But I doubt that this causes too much performance overhead. And again MultiByteToWideChar() will do the job...
jimpark
23rd September 2011 14:36 UTC
Maybe "problem" was a strong word.
Correct me if I'm wrong but here's what I've gathered so far:
- makensis should only call standard library (no win32 calls if possible).
- if win32/posix is needed, then we will need to do #ifdef for win32 and posix.
- makensis should be able to read various Unicode encoded files. NSI, NLF, NSH etc. These files may be UTF-8, UTF-16LE/BE, UTF-32LE/BE and we should ideally support them all.
- makensisw (Windows GUI) should be able to read outputs from makensis and display the output text to the user.
- exehead will only use win32 and will not link to any external libraries including C runtime to reduce size.
- plugins will need to handle Unicode strings.
I think point 3 is why I think we should consider linking to iconv or ICU. Point 4, I think we can deal with by using MultiByteToWideChar since we should only ever see UTF-8 coming out of makensis. And once we have iconv or ICU, I think we should convert the strings to UTF-16LE when storing them in the data section so that exehead will see UTF-16LE and not have to convert. This also means that for point 6, the plugin writers will only have to deal with the native Windows Unicode encoding.
Also, as a side note, I think we should throw out the possibility of building ANSI while doing this work. There really isn't a good argument for keeping ANSI support and it's a simplification that can help everyone conceptually get their heads around the problems.
LoRd_MuldeR
23rd September 2011 14:55 UTC
To #2
If not done already, it may be wise to refactor OS-specific stuff into some "OS Support" class/file (i.e. have a "os_support.h" with the function declarations and have several "os_support_win32.cpp", "os_support_linux.cpp", etc. files with their OS-specific implementations), rather than have a lot of #ifdef's all over the place...
To #3
Why is that? Why not say that all text files (NSI, NLF, NSH) have to be UTF-8 and that's it? If required, any encoding can be converted to UTF-8 easily with a command-line utility, like iconv. Anyway, I can't remember to see any "plain text" Unicode files with a different encoding than UTF-8 in the wild. Is this really used? And do we really need to support it ???
To #4
Does makensisw need to compile natively under POSIX too? I don't think so. People that use Linux & Co generally know how to use a console! Especially because NSIS is targeted at developers! Anyway, the one and only way, that I am aware of, to properly output Unicode characters to the Windows console is outputting UTF-8 to the STDOUT. However you must also call SetConsoleOutputCP() and enable UTF-8. output. There is no UTF-16 mode for SetConsoleOutputCP()! Last but not least, you must not use printf() and friends, because they screw up UTF-8 strings with their "translation" functions. Use WriteFile() instead and write to the handle you got via GetStandardHandle(). This of course means that the GUI applications, that redirects the console application's output, has to treat the text as UTF-8 too.
(BTW: You may think that you can write UTF-16 strings to the Windows console by using wprintf(), but that's not the case. It internally converts to the ANSI Codepage and replaces all characters by '?' that are not available in the current ANSI Codepage *d'oh*)
jimpark
23rd September 2011 15:02 UTC
Yes, we need #3 because UTF-16LE is the default encoding for anything Unicode on Windows. When you opened up a resource file in Developer Studio, you will find that it is encoded in UTF-16LE by default. If you save your text file as Unicode in Windows notepad, it is UTF-16LE by default.
Also, we may need to access data from other Unicode resources like OpenType fonts. By default, for the Windows platform, the encoding of strings in the names section is UTF-16BE. (In fact, everything in an OpenType font file is big endian.) So if we need to get some data from other sources, likely, we will need to convert.
LoRd_MuldeR
23rd September 2011 15:09 UTC
Originally posted by jimpark
Yes, we need #3 because UTF-16LE is the default encoding for anything Unicode on Windows. When you opened up a resource file in Developer Studio, you will find that it is encoded in UTF-16LE by default. If you save your text file as Unicode in Windows notepad, it is UTF-16LE by default.
We should think about this. Any halfway decent Text/Code editor can save as UTF-8. Even the Windows Notepad can ;)
And, as said before, it is easy to convert between the different encodings
before handing the file to MakeNSIS.
Also, if you don't restrict NSI/NLF/NSH files to a specific encoding, how do you detect a file's actual encoding at runtime ???
(I know that you may be able to guess the encoding via BOM character, but the BOM may be missing)
Originally posted by jimpark
Also, we may need to access data from other Unicode resources like OpenType fonts. By default, for the Windows platform, the encoding of strings in the names section is UTF-16BE. (In fact, everything in an OpenType font file is big endian.) So if we need to get some data from other sources, likely, we will need to convert.
That could be an issue :(
BTW: Why does MakeNSIS have to deal with OpenType font files?
[EDIT]
After a quick Google search I found this:
http://utfcpp.svn.sourceforge.net/vi...?revision=HEAD
jimpark
23rd September 2011 16:19 UTC
Frankly, NSIS doesn't really need to handle OpenType files. But it would be nice to. I've gotten requests to get version information and font name information so that a font can be updated for install. It's a perfectly valid thing to install fonts. And so I've provided those capabilities in the last release of Unicode NSIS.
And all the existing users of Unicode NSIS probably have their NSI files as UTF-16LE.
Anyway, I have no qualms about linking to new libraries. I don't want to write Unicode conversion code myself, especially if I have to support Unicode normalization. If makensis runs on Mac OS X, for example, it decomposes all the Unicode strings so even if you convert the strings to UTF-16LE, the decomposed character may not render correctly on Windows and will probably fail string comparisons with those entered by the user since Windows prefers composed characters. So normalization is also an issue. This is a consequence of Unicode support plus multi-platform support.
LoRd_MuldeR
24th September 2011 00:00 UTC
Back to another topic for a moment:
Originally posted by jimpark
I just released a beta version of 2.46.3. I've still compiled it using MSVC 2010 and I did not need the EncoderPointer.lib since I do not link to the standard library in exehead. So NSIS itself still requires Windows XP+ to build installers but the installer it generates should be able to run under Windows 2000. Lord Mulder, can you check this when you have the time? I've also added GetFontName and GetFontNameLocal to round out the font commands.
Can you please also re-compile the plug-ins? I noticed the nsExec plug-in fails on Win2k.
By looking at the file dates, it seems the plug-ins included in v2.46.3 are the same as in v2.46.2.
I reverted to the 'nsExec.dll' from your v2.46.1 release and the problem is gone...
[EDIT]
Okay, it seems that 'nsExec.dll' creates a temporary executable that does the actual job.
While I don't understand why that is done, it explains the problem:
The temporary EXE won't run on Windows 2000 because its
OperatingSystemVersion field is
5.1.
(...because it has been compiled by the VisualStudio 2010 compiler)
jimpark
24th September 2011 11:43 UTC
Thanks for the report, Lord Mulder. I thought I build everything with the correct SUBSYSTEM setting. I will check on that again. And I will clean out the objs. But you will have to wait until Monday for the new build.
jimpark
28th September 2011 13:44 UTC
I've updated the build and uploaded 2.46.3 Beta 2. Please let me know if you experience any problems.
Anders
28th September 2011 23:40 UTC
The next version of VS will drop XP support as well: http://connect.microsoft.com/VisualS...details/690617
LoRd_MuldeR
29th September 2011 10:59 UTC
Originally posted by jimpark
I've updated the build and uploaded 2.46.3 Beta 2. Please let me know if you experience any problems.
Thanks! I will give it a try, as soon as I have some spare time...
[EDIT] Just quick note: The 'uninst' stub still has a file date from 2002
[/EDIT]
Originally posted by Anders
The next version of VS will drop XP support as well: http://connect.microsoft.com/VisualS...details/690617
:igor:
Too bad. This would make the next VS useless for most developers for a long time. While the market share of Win7 is growing, XP still has around 40% (and Vista never got any noteworthy market share). Unless, of course, there will be a workaround to fix XP-compatibility.
- NSIS Discussion
- Unicode
Archive: Unicode
jimpark
29th September 2011 15:19 UTC
That's too bad. Visual Studio 2011 is supposed to support std::atomics and a lot of the std threading library. I was looking forward to that. No XP support would effectively kill it for us also. We need to support WinXP. Time to write to Microsoft.
mrjohn
4th October 2011 09:21 UTC
Originally posted by LoRd_MuldeR
There has been a new Unicode NSIS release shortly:
http://code.google.com/p/unsis/downloads/list
Unicode Setup from this location is detected as Adware : :(
http://www.virustotal.com/file-scan/report.html?id=82b3056fbbcf76cc6e177a22f48d0f48aa46e769039495ca2651ac29aa5e8c0b-1317715970
mrjohn
4th October 2011 13:39 UTC
This is Avira response :
Dear Sir or Madam,thank you for your email to Avira's virus lab.
Tracking number: INC00846451.
We received the following archive files:
File ID Filename Size (Byte) Result
26326890 suspect_FALSE.zip 1.69 MB OK
A listing of files contained inside archives alongside their results can be found below:
File ID Filename Size (Byte) Result
26326891 nsis-2.46.3-Unico...up.exe 1.71 MB MALWARE
Please find a detailed report concerning each individual sample below:
Filename Result
nsis-2.46.3-Unico...up.exe MALWARE
The file 'nsis-2.46.3-Unicode-setup.exe' has been determined to be 'MALWARE'.Our analysts named the threat ADWARE/Adware.Gen.This file is detected by a special detection routine from the engine module.
Please note that Avira's proactive heuristic detection module AHeAD detected this threat up front without the latest VDF update as: ADWARE/Adware.Gen.
jimpark
5th October 2011 19:23 UTC
Originally posted by mrjohn
Unicode Setup from this location is detected as Adware : :(
http://www.virustotal.com/file-scan/report.html?id=82b3056fbbcf76cc6e177a22f48d0f48aa46e769039495ca2651ac29aa5e8c0b-1317715970
I've checked the link and saw that they had listed nsis-2.46.3-Unicode-setup.zip which is not a name of any file I've uploaded. The MD5 digest it has listed do not match any file I've uploaded either. So I can only conclude that whatever file they've tested is not mine.
LoRd_MuldeR
5th October 2011 20:27 UTC
I'd write a mail to virus_malware@avira.com in order to clearify that.
In my experience they are quite responsive...
jimpark
11th October 2011 12:47 UTC
I wrote to avira and they verified that the Unicode NSIS files are showing as being clean.
Zinthose
12th October 2011 16:25 UTC
Can we get a direct link? I can't find the download link on the site.
Можем ли мы получить прямую ссылку? Я не могу найти ссылку на скачивание на сайте.
Yathosho
12th October 2011 16:28 UTC
Originally posted by Zinthose
Can we get a direct link? I can't find the download link on the site.
Можем ли мы получить прямую ссылку? Я не могу найти ссылку на скачивание на сайте.
http://code.google.com/p/unsis/downloads/list
Zinthose
12th October 2011 16:41 UTC
Originally posted by Yathosho
http://code.google.com/p/unsis/downloads/list
.... OPPS.... :stare:
I meant to post this in another topic... CURSE you multi-tabbed browsing!!
Yathosho
12th October 2011 17:56 UTC
it seems obvious, but is that ANSI build fully compatible with the official nsis? just asking, cause i'd prefer to install it over my current installation and not in a seperate folder.
jimpark
12th October 2011 18:17 UTC
Yes, the ANSI build should be a superset of the official NSIS build.
fhkd
18th October 2011 11:43 UTC
GetVersion.exe is not a valid Win32 application.
Hello,
I compiled following Code with Unicode NSIS 2.46.3:
!define File "program.exe"
OutFile "GetVersion.exe"
Function .onInit
## Get file version
GetDllVersion "${File}" $R0 $R1
IntOp $R2 $R0 / 0x00010000
IntOp $R3 $R0 & 0x0000FFFF
IntOp $R4 $R1 / 0x00010000
IntOp $R5 $R1 & 0x0000FFFF
StrCpy $R1 "$R2.$R3.$R4.$R5"
## Write it to a !define for use in main script
FileOpen $R0 "DefineValues.txt" w
FileWrite $R0 '!define PRODUCT_VERSION "$R1" $\n'
FileClose $R0
Abort
FunctionEnd
Section
SectionEnd
I get a GetVersion.exe, but if I try to start it from the Windows Explorer, I get the message
GetVersion.exe is not a valid Win32 application.
If I use NSIS 2.46, it works fine and I get DefinesValues.txt with the entry !define PRODUCT_VERSION ....
I know the post
Originally posted by vcoder
This script work well on ANSI version of NSIS and failed on Unicode version: ...
I use Windows 7 64 bit, and other more complex Setup-Skript get compiled well with Unicode NSIS.
LoRd_MuldeR
18th October 2011 12:08 UTC
I saw this once. Try to include at least one FILE in your installer. Can be any non-empty dummy file.
@jimpark: Any ideas?
fhkd
18th October 2011 14:34 UTC
Originally posted by LoRd_MuldeR
I saw this once. Try to include at least one FILE in your installer. Can be any non-empty dummy file.
@jimpark: Any ideas?
Thanks for the answer.
I tried first following solution:
Section
!tempfile DUMMYFILE
!appendfile "${DUMMYFILE}" "${DUMMYFILE}"
File "${DUMMYFILE}"
!delfile "${DUMMYFILE}"
!undef DUMMYFILE
SectionEnd
It doesn't help, also the dummyfile was packed.
But after I tried
Section
File "program.exe"
SectionEnd
it works, also it is not nice.
Thanks for the help!
jimpark
18th October 2011 15:36 UTC
Interesting. I will look into it.
fhkd
19th October 2011 15:15 UTC
Hello again!
While I tried to implement the solution with File "program.exe", I noticed the command !define /product_version in the NSIS User Manual.
Solutions like
Originally posted by vcoder
OutFile "GetVersion.exe"
with GetDLLVersion or GetDLLVersionLocal are for getting the version from a file on the building machine as a compiler constant during compile time, I think.
With the command !define /product_version it's quite easier.
So I write a little NSIS header
Attachment 49224 with the macros GetFileVersionLocal and GetProductVersionLocal.
Now I can get a constant with the version number without makeing a dummy setup.
!insertmacro GetProductVersionLocal "$%windir%\system32\kernel32.dll" version
!echo "${version_0}.${version_1}.${version_2}.${version_3}"
!echo "${version}"
For a description see the header file.
Yathosho
3rd November 2011 00:19 UTC
when trying to compile a script on windows 2003 server, i get this error:
"The procedure entry point EncodePointer could not be located in the dynamic link library KERNEL32.dll"
some seconds later a second message pops up:
"Unable to initialize MakeNSIS. Please verify that makensis.exe is in the same directory as makensisw.exe"
(it is in the same directory)
Afrow UK
3rd November 2011 00:21 UTC
SP1 installed?
Minimum supported server
Windows Server 2008, Windows Server 2003 with SP1
Stu
LoRd_MuldeR
3rd November 2011 00:54 UTC
Originally posted by Yathosho
when trying to compile a script on windows 2003 server, i get this error:
"The procedure entry point EncodePointer could not be located in the dynamic link library KERNEL32.dll"
some seconds later a second message pops up:
"Unable to initialize MakeNSIS. Please verify that makensis.exe is in the same directory as makensisw.exe"
(it is in the same directory)
Please see my post here:
http://forums.winamp.com/showpost.ph...&postcount=474
In short: Binaries compiled with VS2010 don't run on systems prior to WinXP with SP-2, unless countermeasures are taken.
(Probably not a big deal for MakeNSIS, which runs on developer machine only, but important for the EXE stubs)
Yathosho
3rd November 2011 12:29 UTC
Originally posted by Afrow UK
Minimum supported server
Windows Server 2008, Windows Server 2003 with SP1
i wonder where you even found that, such things should be mentioned on the website. anyway, i'm only using win 2003 because i have no legit copy of windows xp. so there should be no troubles when using xp (sp3)?
Afrow UK
3rd November 2011 12:37 UTC
Originally posted by Yathosho
i wonder where you even found that, such things should be mentioned on the website. anyway, i'm only using win 2003 because i have no legit copy of windows xp. so there should be no troubles when using xp (sp3)?
EncodePointer function's MSDN page.
Stu
LoRd_MuldeR
3rd November 2011 12:50 UTC
Originally posted by Yathosho
i wonder where you even found that, such things should be mentioned on the website. anyway, i'm only using win 2003 because i have no legit copy of windows xp. so there should be no troubles when using xp (sp3)?
This is not a limitation of NSIS or Unicode NSIS in general. It's just a limitation of the Visual C++ 2010 CRT libraries. And, as
jimpark switeched to VS2010 for his latest builds, these builds will now require Windows XP with SP-2 or later. To make it clear again: This only applies to MakeNSIS, not to the rersulting installer EXE. The created installer EXE even runs on Windows 2000...
(Just be sure you really use the latest Unicode NSIS. There was a version that is broken with Win2k!)
Anders
17th November 2011 22:48 UTC
Stdin handling is broken:
makensis - < Examples\example1.nsi
Larsen
19th January 2012 09:12 UTC
@Jim: Thanks a lot for the unicode version! It made my supporting other languages in my installer a lot easier.
I didn´t read through all of the previous 13 pages: Is there already a date when unicode will be available in the vanilla NSIS version? I read something about MakeNSIS v2.50...
MSG
19th January 2012 10:44 UTC
There is no date, nor is there any progress to speak of concerning the unicodification of NSIS trunk. If it will ever happen, it won't be any time soon.
Larsen
19th January 2012 11:41 UTC
That´s a pity =(
DrO
19th January 2012 11:45 UTC
as there's no proper focus on getting things done, different people want it to be done in different ways (i.e. some just want to go all unicode and leave at that, others want to sort out what is basically a final ansi version, etc etc) and there's no consensus on how it should be done which makes what MSG said pretty reasonable for there being no eta.
-daz
Larsen
19th January 2012 11:59 UTC
What about branching? NSIS 3 would be unicode and NSIS 2 could still get bug fixes.
DrO
19th January 2012 12:17 UTC
Originally posted by DrO
as there's no proper focus on getting things done
that is the biggest issue. and branching won't help - in fact i think that it's gotten us into this mess to begin with as people are using the unofficial unicode version yet expect support for it, etc. doing another branch with all of the issues i mentioned is just going to be even more of a pain for everyone concerned, ignoring the fact of who is going to do it?
-daz
Yathosho
19th January 2012 12:53 UTC
Originally posted by DrO
as there's no proper focus on getting things done, different people want it to be done in different ways [...] there's no consensus on how it should be done
even open-source projects need decisions and people who make them. so why not let the community decide which direction nsis will go or who will be the decision-makers? that only leaves the question who should be able to vote.
DrO
19th January 2012 13:02 UTC
i'm not disagreeing with that point and things can or should be decided on, it still requires people to do things and that seems to be the biggest issue with getting things actually put into place and moving.
hell i'd love to help out (like i did many years back) but this pesky thing known as work gets in the way.
no one is disagreeing that a proper unicode version needs to be done. but how that is to be done (which is more about the implementation than what the end user actually sees) is the biggest stopping point it seems. and with your point, no one is gong to be able to do that if there's no proper structure on who is in proper control of things.
between jim's version, the stuff wizou started to do, what anders has been doing, we've got 3 instances of things being all over the shop without any true focus, not quite doing things the same and just overall causing more confusion - look at the number of posts from people who cannot get plug-ins to work as they think this unicode one is official and don't know what is / isn't correct.
it's really just a big cockup when you look at it all overall which is more detrimental to the community in the long run it seems (going on what i've seen from people's posts over the last few years).
-daz
Anders
19th January 2012 21:16 UTC
The major problem is that the unicode fork uses MS 2005+ CRT specific stuff and the official NSIS supports compiling (and building) on VC6, MinGW and POSIX. (The unicode fork is broken in certain places (scroll up to my previous post) since it just relies on the MS code to do all conversion for it) The current code also mixes WCHAR and wchar_t which is a major no-no on POSIX, on Windows they are both UTF16LE but wchar_t can be any type on other platforms...
Except for the UTF8 langfile support in the ANSI build, all the other stuff I have been doing is generic and should benefit both.
LoRd_MuldeR
10th March 2012 12:23 UTC
@jimpark:
I am running into a little problem with latest Unicode NSIS (v2.46.4) release:
My code looks like this:
!searchreplace PRODUCT_VERSION_DATE "${LAMEXP_DATE}" "-" "."
LAMEXP_DATE contains something like "2012-03-10"
The expected output in PRODUCT_VERSION_DATE is "2012.03.10" (right?), but I only get "2012." :confused:
Went back to Maknsis.exe from Unicode NSIS v2.46.3 release and the issue is gone.
Can you look into this? Thanks! :)
georgik
12th May 2013 06:17 UTC
Unicode Python plugin for Unicode NSIS
Hi!
I patched Python plugin for NSIS. Now it also supports Unicode NSIS.
Here is more information about that: http://georgik.sinusgear.com/2013/05...-unicode-nsis/
You can find projet at github - nsPythonUnicode
It was tested on Windows XP, Vista, 7, 8, 2003, 2008, 2012.
Happy Python coding with Unicode NSIS. ;-)
MaGoth
25th July 2013 15:54 UTC
Hi all, :)
On the official website there is a new version NSIS 3.0a1 (Released July 14, 2013).
Do you plan to update to version Unicode NSIS? :rolleyes:
redxii
26th July 2013 05:30 UTC
3.0 has unicode support already.
roytam1
29th October 2013 04:14 UTC
Unicode NSIS 2.46.5 fails with packhdr again?
I:\>upx --best --force "C:\DOCUME~1\User\LOCALS~1\Temp\exehead.tmp"
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2013
UPX 3.09w Markus Oberhumer, Laszlo Molnar & John Reiser Feb 18th 2013
File size Ratio Format Name
-------------------- ------ ----------- -----------
upx: C:\DOCUME~1\User\LOCALS~1\Temp\exehead.tmp: CantPackException: superfluous data between sections
Packed 1 file: 0 ok, 1 error.