Archive: Plugin for Unicode files conversion


Plugin for Unicode files conversion
  Features:
-Convert file from Unicode to ANSI
-Convert file from ANSI to Unicode
   Conversions supported:
&nbsp;&nbsp;&nbsp;"UTF-8"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<-> ANSI
&nbsp;&nbsp;&nbsp;"UTF-16LE" <-> ANSI
&nbsp;&nbsp;&nbsp;"UTF-16BE" <-> ANSI

-Get file unicode type:
&nbsp;&nbsp;&nbsp;"NONE"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- None Unicode
&nbsp;&nbsp;&nbsp;"UTF-8"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- 8-bit Variable Width (Web)
&nbsp;&nbsp;&nbsp;"UTF-16LE|UCS-2LE" - 16-bit Little Endian (Default for Windows)
&nbsp;&nbsp;&nbsp;"UTF-16BE|UCS-2BE" - 16-bit Big Endian (Default for Linux)
&nbsp;&nbsp;&nbsp;"UTF-32LE|UCS-4LE" - 32-bit Little Endian
&nbsp;&nbsp;&nbsp;"UTF-32BE|UCS-4BE" - 32-bit Big Endian

"unicode" v1.0


Many thanks for this. I needed exactly this one.


Re:Plugin for Unicode files conversion
  Hi,

I used this plugin to convert an .inf file from Unicode to ANSI to search for some value in the .inf file using the standard (non-Unicode) NSIS version.

Worked fine on the systems we tested here (German, English), but a customer reported that the application would stop (exception) on a chinese windows system (IDENTICAL .inf file).

Testing showed that japanese systems were affected too.

I started debugging, made a debug version of the DLL and a small test program. I ended debugging on the japanese and chinese system finding that the Call to kernel32::WideCharToMultiByte delivers the exception.
According to Microsoft documentation of that function the call made in unicode.dll is correct (using CP_ACP and no user defined replacement).
I limited the conversion to the starting file part woth NO language specific unicode character, and the function succeeds. IMHO must be some problem of the function with the ANSI codepage on these systems.

Playing around with some option flags, I could avoid the exeception, but the conversion did NOT take place (0 chars converted).

In my case a working function FileUnicode2UTF8 could have helped. I just could'n t figure out how to dimension the buffer as a UTF-8 char might take up multiple byte chars.

IMHO:This Unicode conversion plugin will get more important when the Unicode version of NSIS will be more widely used.
(Or does the NSIS Unicode branch supply its own conversion tools/keywords?)


Is there a way to convert from UTF-16LE to UTF-8?


Originally posted by akopts
Is there a way to convert from UTF-16LE to UTF-8?
See
Comment in NSIS Unicode Thread

Updated version V1.1 available
 

Originally posted by AxelMock
See
Comment in NSIS Unicode Thread
The updated version V1.1 is available.
Wiki updated too.
New function: FileUnicode2UTF8

Bugs, the following line:
unicode::FileUnicode2Ansi "$EXEDIR\UTF-16LE.txt" "$EXEDIR\Temp.txt" "UTF-16LE"

unicode.dll v1.0 : adds a question-mark to the beginning of 'Temp.txt'
unicode.dll v1.1 : just creates an empty 'Temp.txt' file, but returns 0

Where 'UTF-16LE' is the file from the example script, but it seems to happen to all utf-16LE files !


FileUnicode2UTF8 fails in NSIS Unicode
  Hi,

I am working on an installer for the NSIS Unicode build. I wanted to use FileUnicode2UTF8 to convert a text file from UCS-2 LE to UTF-8. The background is I want to write a user selected folder path into a configuration file that uses UTF-8 encoding.

However, I just can't get the plug-in to convert the file, I always get error code 2. Is this a problem of NSIS Unicode or is my script file wrong (see attached file)?


unicode plug-in does not work in Unicode Nsis !
  As there where a few utf-16 functions missing in TextFunc.nsh, I updated it one day and included some extras for ${FileRecode}

Aswell I made a couple of minor adjustments to your script:


nsh

Section ""
; Write Settings.ini:
FileOpen $0 "$EXEDIR\Settings.ini" w
StrCpy $R0"Folder=セちさ" ; Sample unicode string with 3 Japanese characters
FileWriteWord$0 0xFEFF ; write the BOM
FileWriteUTF16LE$0 $R0
FileClose$0
; Convert file from Unicode to UTF-8
StrCpy$0 "$EXEDIR\Settings.ini"
StrCpy $1 "$EXEDIR\Settings2.ini"
StrCpy $2 "ToUTF8"
CopyFiles /SILENT $0 $1
${FileRecode} $1 $2
; Print some information
DetailPrint 'FileRecode to UTF8 "$0" "$1" $2'
DetailPrint "$3"
DetailPrint ""
>SectionEnd
>
BTW: I remember I used to get the occasional crash whenever I done repetitive re-coding. Although I found out how to fix it I did not get around to it yet. If you're experiencing any problems with it I will put it higher on my priority list .

Thanks for the quick reply :). Your script is a big help. However, as you said, there still seems to be a bug in the conversion.

When I run the script, a few random characters are added to the end of the converted text file. The number of random characters differs from run to run (usually 3), sometimes no characters are added and the converted file is fine.


Hmm....
  I never experienced random characters on the end of the file.

Just to make sure, you are not talking about the BOM for utf-8 (  ) ?
Which re-coding are you doing ? From utf-16LE to utf-8 ?

Anyway, I will look into it these days... (shouldn't be that much work)

Edit: Just to remind you, your Japanese characters take 6 bytes in utf-16, but 9 bytes in utf-8.
utf-8:
EF BB BF 46 6F 6C 64 65 72 3D E3 82 BB E3 81 A1 E3 81 95 = Folder=セちさ

utf-16LE:
FF FE 46 00 6F 00 6C 00 64 00 65 00 72 00 3D 00 BB 30 61 30 55 30 = ÿþF.o.l.d.e.r.=.»0a0U0


No, I don't think it's a byte order mark, just a few random bytes (and the bom at the beginning of the Settings2.ini is correct.). Notepad++ also shows them as additional characters.

I modified the script so the same string is written with line terminator three times into the Settings.ini, which seems to provoke this bug more often. See the attached script and output files (I compiled the script with NSIS Unicode 2.46 and launched the exe on Windows 7 and XP). I hope this help to track the error. Thanks for taking a look into this issue :)


Hmm... I see what you mean now.

So long for my testing :(

I first have to finish the project I'm working on these days, then by the weekend I will have time to look at this issue.

thanx for reporting it


Fixed ?
  I was not allocating any space to fit the terminating null-byte before ReadFile(), so the string ended in any kind of random characters.

Not sure how I could have missed that before :confused:

Anyway, give it ago (the script you send me is working fine for me)


Cool, thanks alot! Now the installer of my plotter app can write a user selected path into the settings file even when it's Chinese. You really helped me out :)


Thanks a lot gringoloco023!