I make a script to convert .reg to .nsi
Simple to use.
I don't known how to read a Unicode file. FileRead onlu return a ANSI string. May be MultiByteToWideChar, WideCharToMultiByte and System plugins can do this. But I don't known how to use it.
Archive: I make a script to convert .reg to .nsi
I make a script to convert .reg to .nsi
Simple to use.
I don't known how to read a Unicode file. FileRead onlu return a ANSI string. May be MultiByteToWideChar, WideCharToMultiByte and System plugins can do this. But I don't known how to use it.
You can use FileReadByte
-Stu
But how to convert binary to character?
Use IntFmt with "%c" as the format string:
IntFmt $1 "%c" $2
will convert the binary value in $2 into a character in $1
Originally posted by pengyouASCII is OK. But I try to convert "604F7D59" (in Chinese is ÄãºÃ), return "\".
Use IntFmt with "%c" as the format string:
IntFmt $1 "%c" $2
will convert the binary value in $2 into a character in $1
IntFmt $1 "%c" $2
just converts a single byte into an ASCII character. NSIS uses MBCS instead of Unicode (so it can be used on Win9x systems).
Have you tried a forum search for "Unicode" references?
There is something in the Archive which converts registry data or .reg files but I do not know if it will do what you want:
http://nsis.sourceforge.net/archive/...php?pageid=296
FileOpen $0 unicode-file r
# don't forget to skip marker here
FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8
IntOp $1 $1 | $2
IntFmt $1 %lc $1
IntOp $1 $1 << 8Ahnnn, kichik. I don't think this ever appeared on the docs. That's why only the developer knows totally his/her program features.
Oh kichik, you are great, this code can read a Unicode file, ASCII and Chinese character.
Hola a Tod@s
In this function:
FileOpen $0 unicode-file r
# don't forget to skip marker here
FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8
IntOp $1 $1 | $2
IntFmt $1 %lc $1
What does this line ?
IntOp $1 $1 << 8
I search in the manual but i don´t found the << operator.
Thanks in advance
See deguix post (above)
-Stu
<< is bit shift, you can see NSIS.chm
Hola a Tod@s
The problem was that i was worked with the version 2.00.
I updated my version of nsis to the last (2.04) and there´s no problem with the documentation.
Thanks a lot
Hola a Tod@s
Hello Bluenet ¡¡ I updated your function CheckFileEncoding a little and i add some documentation :
; Usage : Push file
; Call CheckFileEncoding
; Pop $R0
;
; At this moment $R0 is:
; EBCDIC (the encoding declaration must be read)
; UCS-4 (little-endian)
; UCS-4 (big-endian)
; UTF-16 (big-endian:the encoding declaration must be read)
; UTF-16 (little-endian:the encoding declaration must be read)
; UTF-16BE
; UTF-16LE
; UTF-8 (BOM)
; UTF-8
; UTF-8 (Is the default value)
Function CheckFileEncoding
; This fuction is based on the read of the Byte Order Mark (BOM) : a file
; signature that indicates the file encode (see XML Spec Appendices F1).
; (This function is a variation of the ChekFileEncoding made for Bluenet,).
; NOTE :
; - Little-endian ::: Describes a computer architecture . The PDP-11 and VAX
; families of computers and Intel microprocessors are little-endian.
; - Big-endian ::: Describes a computer architecture .The IBM 370 family,
; the PDP-10, the Motorola microprocessor families, and most of the various
; RISC designs are big-endian
; - The term ANSI is used to collectively refer to all the non-Unicode single
; and multi-byte character sets used in Windows operating systems. These include
; the single-byte systems for Europe and the "double-byte" systems for Chinese
; and Japanese which actually use one or two bytes per character
; - UCS ::: Universal Character Set (UCS-2 => 2 bytes ; UCS-4 => 4 bytes)
; - UTF ::: Unicode transformation Format (UTF-8 , UTF-16 )
Exch $R5 ; $R5 have the name of the file
Push $R0
Push $R1
Push $R2
Push $R3
Push $R4
FileOpen $R0 $R5 r
FileReadByte $R0 $R1 ;$R1 have the first byte of the file
FileSeek $R0 1
FileReadByte $R0 $R2 ;$R2 have the second byte of the file
FileSeek $R0 2
FileReadByte $R0 $R3 ;$R3 have the third byte of the file
FileSeek $R0 3
FileReadByte $R0 $R4 ;$R4 have the fourth byte of the file
FileClose $R0
StrCpy $R0 $R1$R2$R3$R4
StrCmp $R0 76111167148 0 +3 ;hex = 4C 6F A7 94
StrCpy $R0 "EBCDIC (the encoding declaration must be read)"
Goto end
StrCmp $R0 25525400 0 +3 ;hex = FF FE 00 00 (BOM)
StrCpy $R0 "UCS-4 (little-endian)"
Goto end
StrCmp $R0 00254255 0 +3 ;hex = 00 00 FE FF (BOM)
StrCpy $R0 "UCS-4 (big-endian)"
Goto end
StrCmp $R0 060063 0 +3 ;hex = 00 3C 00 3F
StrCpy $R0 "UTF-16 (big-endian:the encoding declaration must be read)"
Goto end
StrCmp $R0 600630 0 +3 ;hex = 3C 00 3F 00
StrCpy $R0 "UTF-16 (little-endian:the encoding declaration must be read)"
Goto end
StrCpy $R0 $R1$R2
StrCmp $R0 255254 0 +3 ;hex = FF FE (BOM)
StrCpy $R0 "UTF-16LE" ;In Windows Unicode => UTF-16 ? ;
Goto end
StrCmp $R0 254255 0 +3 ;hex = FE FF (BOM)
StrCpy $R0 "UTF-16BE" ;In Windows Unicode => UTF-16 ?
Goto end
StrCpy $R0 $R1$R2$R3
StrCmp $R0 239187191 0 +3 ;hex = EF BB BF (BOM)
StrCpy $R0 "UTF-8 (BOM)"
Goto end
StrCpy $R0 "UTF-8" ;Default value acording to the XML spec (the encoding declaration must be read)
end:
Pop $R5
Pop $R4
Pop $R3
Pop $R2
Pop $R1
Push $R0
FunctionEnd
Thanks kike_velez for improvement.
And may be some problem in this function, the Var you pop order not incorrect. Rewrite here:
Exch $R0 ; $R0 have the name of the file
Push $R1
Push $R2
Push $R3
Push $R4
FileOpen $R0 $R0 r ;the name of the file not using
...........
Pop $R4
Pop $R3
Pop $R2
Pop $R1
Exch $R0
The Default value should be ANSI, because in ASCII character ANSI = UTF-8, but in Asia language ANSI != UTF-8.
Thanks Bluenet
I make the variations that you say to me . But the last mail i don´t understand. Can you explain a little ?
The Default value should be ANSI, because in ASCII character ANSI = UTF-8, but in Asia language ANSI != UTF-8.
Hola a Tod@s:
The updated funcion
; Usage : Push file
; Call CheckFileEncoding
; Pop $R0
;
; At this moment $R0 is:
; EBCDIC (the encoding declaration must be read)
; UCS-4 (little-endian)
; UCS-4 (big-endian)
; UTF-16 (big-endian:the encoding declaration must be read)
; UTF-16 (little-endian:the encoding declaration must be read)
; UTF-16BE
; UTF-16LE
; UTF-8 (BOM)
; UTF-8
; UTF-8 (Is the default value)
Function CheckFileEncoding
; This fuction is based on the read of the Byte Order Mark (BOM) : a file
; signature that indicates the file encode (see XML Spec Appendices F1).
; (This function is a variation of the ChekFileEncoding made for Bluenet,
; it´s part of his reg2nsis utility).
; NOTE :
; - Little-endian ::: Describes a computer architecture . The PDP-11 and VAX
; families of computers and Intel microprocessors are little-endian.
; - Big-endian ::: Describes a computer architecture .The IBM 370 family,
; the PDP-10, the Motorola microprocessor families, and most of the various
; RISC designs are big-endian
; - The term ANSI is used to collectively refer to all the non-Unicode single
; and multi-byte character sets used in Windows operating systems. These include
; the single-byte systems for Europe and the "double-byte" systems for Chinese
; and Japanese which actually use one or two bytes per character
; - UCS ::: Universal Character Set (UCS-2 => 2 bytes ; UCS-4 => 4 bytes)
; - UTF ::: Unicode transformation Format (UTF-8 , UTF-16 )
Exch $R0
Push $R1
Push $R2
Push $R3
Push $R4
FileOpen $R0 $R0 r ; the name of the file is not using
FileReadByte $R0 $R1 ;$R1 have the first byte of the file
FileSeek $R0 1
FileReadByte $R0 $R2 ;$R2 have the second byte of the file
FileSeek $R0 2
FileReadByte $R0 $R3 ;$R3 have the third byte of the file
FileSeek $R0 3
FileReadByte $R0 $R4 ;$R4 have the fourth byte of the file
FileClose $R0
StrCpy $R0 $R1$R2$R3$R4
StrCmp $R0 76111167148 0 +3 ;hex = 4C 6F A7 94
StrCpy $R0 "EBCDIC (the encoding declaration must be read)"
Goto end
StrCmp $R0 25525400 0 +3 ;hex = FF FE 00 00 (BOM)
StrCpy $R0 "UCS-4 (little-endian)"
Goto end
StrCmp $R0 00254255 0 +3 ;hex = 00 00 FE FF (BOM)
StrCpy $R0 "UCS-4 (big-endian)"
Goto end
StrCmp $R0 060063 0 +3 ;hex = 00 3C 00 3F
StrCpy $R0 "UTF-16 (big-endian:the encoding declaration must be read)"
Goto end
StrCmp $R0 600630 0 +3 ;hex = 3C 00 3F 00
StrCpy $R0 "UTF-16 (little-endian:the encoding declaration must be read)"
Goto end
StrCpy $R0 $R1$R2
StrCmp $R0 255254 0 +3 ;hex = FF FE (BOM)
StrCpy $R0 "UTF-16LE" ;In Windows Unicode => UTF-16 ? . Least significant byte is written first. ;
Goto end
StrCmp $R0 254255 0 +3 ;hex = FE FF (BOM)
StrCpy $R0 "UTF-16BE" ;In Windows Unicode => UTF-16 ? . Most significant byte is written first.
Goto end
StrCpy $R0 $R1$R2$R3
StrCmp $R0 239187191 0 +3 ;hex = EF BB BF (BOM)
StrCpy $R0 "UTF-8 (BOM)" ;A character could be written in one, two or three bytes
Goto end
StrCpy $R0 "ANSI" ;The Default value should be ANSI, because
;in ASCII character ANSI = UTF-8,
;but in Asia language ANSI != UTF-8.
end:
Pop $R4
Pop $R3
Pop $R2
Pop $R1
Exch $R0
FunctionEnd
Hola
Another question . If a unicode file is for exemple a sequence of this type :
3C 00 EF 00 ....
Why this function ?
FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8
IntOp $1 $1 | $2
IntFmt $1 %lc $1
And not this ?
FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntFmt $1 %lc $2
The problem is to jump the 00s or there is something more ?
Thanks in advance
See the picture, the English character Unicode format is xx00xx00, but Chinese isn't.
And English character in ANSI same as UTF-8, but Chinese isn't.
Thanks Bluenet for your explanation . I was ignorant about this . :eek:
I think that is more complicated for you to do programs that for me. I am wrong ?
Best regards
Please Bluenet , can you atach a sample of file in unicode chinese to play with it.
Thanks
What sample? In this code
FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8
IntOp $1 $1 | $2
IntFmt $1 %lc $1
$1 is a ANSI character convert from Unicode
Thanks bluenet for your reply.
The problem is that i don´t understand this :
FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8 *1
IntOp $1 $1 | $2 *2
IntFmt $1 %lc $1
If for exemple we have this sequence of the file :
60 4F 7D 59
$2 = 60 and $1 = 4F after the two sentences (*1 y *2)
we have (i think) $1 = 4F 60 .
I think that only change the sequence of bytes. And why is this ansi?
Thanks
IntFmt is same as C++ wsprintf function.
I don't known why. But after convert to ANSI you can deal with FileWrite or StrCpy and so on.
Hello Bluenet
Thanks a lot for you patience