Plugin for Unicode files conversion

Instructor
12th June 2005 14:50 UTC

Plugin for Unicode files conversion
Features:
-Convert file from Unicode to ANSI
-Convert file from ANSI to Unicode
   Conversions supported:
   "UTF-8"     <-> ANSI
   "UTF-16LE" <-> ANSI
   "UTF-16BE" <-> ANSI

-Get file unicode type:
   "NONE"                    - None Unicode
   "UTF-8"                   - 8-bit Variable Width (Web)
   "UTF-16LE|UCS-2LE" - 16-bit Little Endian (Default for Windows)
   "UTF-16BE|UCS-2BE" - 16-bit Big Endian (Default for Linux)
   "UTF-32LE|UCS-4LE" - 32-bit Little Endian
   "UTF-32BE|UCS-4BE" - 32-bit Big Endian

"unicode" v1.0

tpr
18th November 2008 12:30 UTC

Many thanks for this. I needed exactly this one.

AxelMock
26th February 2009 15:04 UTC

Re:Plugin for Unicode files conversion
Hi,

I used this plugin to convert an .inf file from Unicode to ANSI to search for some value in the .inf file using the standard (non-Unicode) NSIS version.

Worked fine on the systems we tested here (German, English), but a customer reported that the application would stop (exception) on a chinese windows system (IDENTICAL .inf file).

Testing showed that japanese systems were affected too.

I started debugging, made a debug version of the DLL and a small test program. I ended debugging on the japanese and chinese system finding that the Call to kernel32::WideCharToMultiByte delivers the exception.
According to Microsoft documentation of that function the call made in unicode.dll is correct (using CP_ACP and no user defined replacement).
I limited the conversion to the starting file part woth NO language specific unicode character, and the function succeeds. IMHO must be some problem of the function with the ANSI codepage on these systems.

Playing around with some option flags, I could avoid the exeception, but the conversion did NOT take place (0 chars converted).

In my case a working function FileUnicode2UTF8 could have helped. I just could'n t figure out how to dimension the buffer as a UTF-8 char might take up multiple byte chars.

IMHO:This Unicode conversion plugin will get more important when the Unicode version of NSIS will be more widely used.
(Or does the NSIS Unicode branch supply its own conversion tools/keywords?)

akopts
27th February 2009 16:21 UTC

Is there a way to convert from UTF-16LE to UTF-8?

AxelMock
2nd March 2009 16:20 UTC

Originally posted by akopts
Is there a way to convert from UTF-16LE to UTF-8?

See
Comment in NSIS Unicode Thread

AxelMock
2nd March 2009 17:24 UTC

Updated version V1.1 available

Originally posted by AxelMock
See
Comment in NSIS Unicode Thread

The updated version V1.1 is available.
Wiki updated too.
New function: FileUnicode2UTF8

gringoloco023
29th July 2010 22:38 UTC

Bugs, the following line:
unicode::FileUnicode2Ansi "$EXEDIR\UTF-16LE.txt" "$EXEDIR\Temp.txt" "UTF-16LE"

unicode.dll v1.0 : adds a question-mark to the beginning of 'Temp.txt'
unicode.dll v1.1 : just creates an empty 'Temp.txt' file, but returns 0

Where 'UTF-16LE' is the file from the example script, but it seems to happen to all utf-16LE files !

koelzk
1st September 2010 16:42 UTC

FileUnicode2UTF8 fails in NSIS Unicode
Hi,

I am working on an installer for the NSIS Unicode build. I wanted to use FileUnicode2UTF8 to convert a text file from UCS-2 LE to UTF-8. The background is I want to write a user selected folder path into a configuration file that uses UTF-8 encoding.

However, I just can't get the plug-in to convert the file, I always get error code 2. Is this a problem of NSIS Unicode or is my script file wrong (see attached file)?

gringoloco023
1st September 2010 17:57 UTC

unicode plug-in does not work in Unicode Nsis !
As there where a few utf-16 functions missing in TextFunc.nsh, I updated it one day and included some extras for ${FileRecode}

Aswell I made a couple of minor adjustments to your script:


nsh

Section ""

   ; Write Settings.ini:

   FileOpen $0 "$EXEDIR\Settings.ini" w

    StrCpy $R0"Folder=セちさ" ; Sample unicode string with 3 Japanese characters

    FileWriteWord$0 0xFEFF ; write the BOM

    FileWriteUTF16LE$0 $R0

    FileClose$0

   ; Convert file from Unicode to UTF-8

    StrCpy$0 "$EXEDIR\Settings.ini"

   StrCpy $1 "$EXEDIR\Settings2.ini"

   StrCpy $2 "ToUTF8"

   CopyFiles /SILENT $0 $1

   ${FileRecode} $1 $2

   ; Print some information

    DetailPrint 'FileRecode to UTF8 "$0" "$1" $2'

   DetailPrint "$3"

   DetailPrint ""

>SectionEnd 

>

BTW: I remember I used to get the occasional crash whenever I done repetitive re-coding. Although I found out how to fix it I did not get around to it yet. If you're experiencing any problems with it I will put it higher on my priority list .

koelzk
1st September 2010 20:06 UTC

Thanks for the quick reply :). Your script is a big help. However, as you said, there still seems to be a bug in the conversion.

When I run the script, a few random characters are added to the end of the converted text file. The number of random characters differs from run to run (usually 3), sometimes no characters are added and the converted file is fine.

gringoloco023
1st September 2010 21:08 UTC

Hmm....
I never experienced random characters on the end of the file.

Just to make sure, you are not talking about the BOM for utf-8 ( ï»¿ ) ?
Which re-coding are you doing ? From utf-16LE to utf-8 ?

Anyway, I will look into it these days... (shouldn't be that much work)

Edit: Just to remind you, your Japanese characters take 6 bytes in utf-16, but 9 bytes in utf-8.
utf-8:
EF BB BF 46 6F 6C 64 65 72 3D E3 82 BB E3 81 A1 E3 81 95 = ï»¿Folder=ã‚»ã¡ã•

utf-16LE:
FF FE 46 00 6F 00 6C 00 64 00 65 00 72 00 3D 00 BB 30 61 30 55 30 = ÿþF.o.l.d.e.r.=.»0a0U0

koelzk
2nd September 2010 17:12 UTC

No, I don't think it's a byte order mark, just a few random bytes (and the bom at the beginning of the Settings2.ini is correct.). Notepad++ also shows them as additional characters.

I modified the script so the same string is written with line terminator three times into the Settings.ini, which seems to provoke this bug more often. See the attached script and output files (I compiled the script with NSIS Unicode 2.46 and launched the exe on Windows 7 and XP). I hope this help to track the error. Thanks for taking a look into this issue :)

gringoloco023
3rd September 2010 12:45 UTC

Hmm... I see what you mean now.

So long for my testing :(

I first have to finish the project I'm working on these days, then by the weekend I will have time to look at this issue.

thanx for reporting it

gringoloco023
4th September 2010 21:55 UTC

Fixed ?
I was not allocating any space to fit the terminating null-byte before ReadFile(), so the string ended in any kind of random characters.

Not sure how I could have missed that before :confused:

Anyway, give it ago (the script you send me is working fine for me)

koelzk
5th September 2010 20:44 UTC

Cool, thanks alot! Now the installer of my plotter app can write a user selected path into the settings file even when it's Chinese. You really helped me out :)

Legace
30th September 2010 14:13 UTC

Thanks a lot gringoloco023!