Archive: I make a script to convert .reg to .nsi


I make a script to convert .reg to .nsi
Simple to use.
I don't known how to read a Unicode file. FileRead onlu return a ANSI string. May be MultiByteToWideChar, WideCharToMultiByte and System plugins can do this. But I don't known how to use it.


You can use FileReadByte

-Stu


But how to convert binary to character?


Use IntFmt with "%c" as the format string:

IntFmt $1 "%c" $2

will convert the binary value in $2 into a character in $1


Originally posted by pengyou
Use IntFmt with "%c" as the format string:

IntFmt $1 "%c" $2

will convert the binary value in $2 into a character in $1
ASCII is OK. But I try to convert "604F7D59" (in Chinese is ÄãºÃ), return "\".

:rolleyes:

IntFmt $1 "%c" $2

just converts a single byte into an ASCII character. NSIS uses MBCS instead of Unicode (so it can be used on Win9x systems).

Have you tried a forum search for "Unicode" references?

There is something in the Archive which converts registry data or .reg files but I do not know if it will do what you want:
http://nsis.sourceforge.net/archive/...php?pageid=296


FileOpen $0 unicode-file r
# don't forget to skip marker here
FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8
IntOp $1 $1 | $2
IntFmt $1 %lc $1

IntOp $1 $1 << 8
Ahnnn, kichik. I don't think this ever appeared on the docs. That's why only the developer knows totally his/her program features.

Oh kichik, you are great, this code can read a Unicode file, ASCII and Chinese character.


Hola a Tod@s

In this function:

FileOpen $0 unicode-file r
# don't forget to skip marker here
FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8
IntOp $1 $1 | $2
IntFmt $1 %lc $1


What does this line ?

IntOp $1 $1 << 8

I search in the manual but i don´t found the << operator.

Thanks in advance


See deguix post (above)

-Stu


<< is bit shift, you can see NSIS.chm


Hola a Tod@s

The problem was that i was worked with the version 2.00.
I updated my version of nsis to the last (2.04) and there´s no problem with the documentation.

Thanks a lot


Hola a Tod@s

Hello Bluenet ¡¡ I updated your function CheckFileEncoding a little and i add some documentation :

; Usage : Push file
; Call CheckFileEncoding
; Pop $R0
;
; At this moment $R0 is:
; EBCDIC (the encoding declaration must be read)
; UCS-4 (little-endian)
; UCS-4 (big-endian)
; UTF-16 (big-endian:the encoding declaration must be read)
; UTF-16 (little-endian:the encoding declaration must be read)
; UTF-16BE
; UTF-16LE
; UTF-8 (BOM)
; UTF-8
; UTF-8 (Is the default value)
Function CheckFileEncoding

; This fuction is based on the read of the Byte Order Mark (BOM) : a file
; signature that indicates the file encode (see XML Spec Appendices F1).
; (This function is a variation of the ChekFileEncoding made for Bluenet,).

; NOTE :
; - Little-endian ::: Describes a computer architecture . The PDP-11 and VAX
; families of computers and Intel microprocessors are little-endian.

; - Big-endian ::: Describes a computer architecture .The IBM 370 family,
; the PDP-10, the Motorola microprocessor families, and most of the various
; RISC designs are big-endian

; - The term ANSI is used to collectively refer to all the non-Unicode single
; and multi-byte character sets used in Windows operating systems. These include
; the single-byte systems for Europe and the "double-byte" systems for Chinese
; and Japanese which actually use one or two bytes per character

; - UCS ::: Universal Character Set (UCS-2 => 2 bytes ; UCS-4 => 4 bytes)

; - UTF ::: Unicode transformation Format (UTF-8 , UTF-16 )

Exch $R5 ; $R5 have the name of the file
Push $R0
Push $R1
Push $R2
Push $R3
Push $R4

FileOpen $R0 $R5 r
FileReadByte $R0 $R1 ;$R1 have the first byte of the file
FileSeek $R0 1
FileReadByte $R0 $R2 ;$R2 have the second byte of the file
FileSeek $R0 2
FileReadByte $R0 $R3 ;$R3 have the third byte of the file
FileSeek $R0 3
FileReadByte $R0 $R4 ;$R4 have the fourth byte of the file

FileClose $R0

StrCpy $R0 $R1$R2$R3$R4
StrCmp $R0 76111167148 0 +3 ;hex = 4C 6F A7 94
StrCpy $R0 "EBCDIC (the encoding declaration must be read)"
Goto end
StrCmp $R0 25525400 0 +3 ;hex = FF FE 00 00 (BOM)
StrCpy $R0 "UCS-4 (little-endian)"
Goto end
StrCmp $R0 00254255 0 +3 ;hex = 00 00 FE FF (BOM)
StrCpy $R0 "UCS-4 (big-endian)"
Goto end
StrCmp $R0 060063 0 +3 ;hex = 00 3C 00 3F
StrCpy $R0 "UTF-16 (big-endian:the encoding declaration must be read)"
Goto end
StrCmp $R0 600630 0 +3 ;hex = 3C 00 3F 00
StrCpy $R0 "UTF-16 (little-endian:the encoding declaration must be read)"
Goto end
StrCpy $R0 $R1$R2
StrCmp $R0 255254 0 +3 ;hex = FF FE (BOM)
StrCpy $R0 "UTF-16LE" ;In Windows Unicode => UTF-16 ? ;
Goto end
StrCmp $R0 254255 0 +3 ;hex = FE FF (BOM)
StrCpy $R0 "UTF-16BE" ;In Windows Unicode => UTF-16 ?
Goto end
StrCpy $R0 $R1$R2$R3
StrCmp $R0 239187191 0 +3 ;hex = EF BB BF (BOM)
StrCpy $R0 "UTF-8 (BOM)"
Goto end
StrCpy $R0 "UTF-8" ;Default value acording to the XML spec (the encoding declaration must be read)
end:

Pop $R5
Pop $R4
Pop $R3
Pop $R2
Pop $R1
Push $R0

FunctionEnd


Thanks kike_velez for improvement.
And may be some problem in this function, the Var you pop order not incorrect. Rewrite here:

Exch $R0 ; $R0 have the name of the file
Push $R1
Push $R2
Push $R3
Push $R4

FileOpen $R0 $R0 r ;the name of the file not using

...........

Pop $R4
Pop $R3
Pop $R2
Pop $R1
Exch $R0


The Default value should be ANSI, because in ASCII character ANSI = UTF-8, but in Asia language ANSI != UTF-8.


Thanks Bluenet

I make the variations that you say to me . But the last mail i don´t understand. Can you explain a little ?

The Default value should be ANSI, because in ASCII character ANSI = UTF-8, but in Asia language ANSI != UTF-8.



Thanks in advance

Hola a Tod@s:

The updated funcion

; Usage : Push file
; Call CheckFileEncoding
; Pop $R0
;
; At this moment $R0 is:
; EBCDIC (the encoding declaration must be read)
; UCS-4 (little-endian)
; UCS-4 (big-endian)
; UTF-16 (big-endian:the encoding declaration must be read)
; UTF-16 (little-endian:the encoding declaration must be read)
; UTF-16BE
; UTF-16LE
; UTF-8 (BOM)
; UTF-8
; UTF-8 (Is the default value)
Function CheckFileEncoding

; This fuction is based on the read of the Byte Order Mark (BOM) : a file
; signature that indicates the file encode (see XML Spec Appendices F1).
; (This function is a variation of the ChekFileEncoding made for Bluenet,
; it´s part of his reg2nsis utility).

; NOTE :
; - Little-endian ::: Describes a computer architecture . The PDP-11 and VAX
; families of computers and Intel microprocessors are little-endian.

; - Big-endian ::: Describes a computer architecture .The IBM 370 family,
; the PDP-10, the Motorola microprocessor families, and most of the various
; RISC designs are big-endian

; - The term ANSI is used to collectively refer to all the non-Unicode single
; and multi-byte character sets used in Windows operating systems. These include
; the single-byte systems for Europe and the "double-byte" systems for Chinese
; and Japanese which actually use one or two bytes per character

; - UCS ::: Universal Character Set (UCS-2 => 2 bytes ; UCS-4 => 4 bytes)

; - UTF ::: Unicode transformation Format (UTF-8 , UTF-16 )


Exch $R0
Push $R1
Push $R2
Push $R3
Push $R4


FileOpen $R0 $R0 r ; the name of the file is not using
FileReadByte $R0 $R1 ;$R1 have the first byte of the file
FileSeek $R0 1
FileReadByte $R0 $R2 ;$R2 have the second byte of the file
FileSeek $R0 2
FileReadByte $R0 $R3 ;$R3 have the third byte of the file
FileSeek $R0 3
FileReadByte $R0 $R4 ;$R4 have the fourth byte of the file

FileClose $R0

StrCpy $R0 $R1$R2$R3$R4
StrCmp $R0 76111167148 0 +3 ;hex = 4C 6F A7 94
StrCpy $R0 "EBCDIC (the encoding declaration must be read)"
Goto end
StrCmp $R0 25525400 0 +3 ;hex = FF FE 00 00 (BOM)
StrCpy $R0 "UCS-4 (little-endian)"
Goto end
StrCmp $R0 00254255 0 +3 ;hex = 00 00 FE FF (BOM)
StrCpy $R0 "UCS-4 (big-endian)"
Goto end
StrCmp $R0 060063 0 +3 ;hex = 00 3C 00 3F
StrCpy $R0 "UTF-16 (big-endian:the encoding declaration must be read)"
Goto end
StrCmp $R0 600630 0 +3 ;hex = 3C 00 3F 00
StrCpy $R0 "UTF-16 (little-endian:the encoding declaration must be read)"
Goto end
StrCpy $R0 $R1$R2
StrCmp $R0 255254 0 +3 ;hex = FF FE (BOM)
StrCpy $R0 "UTF-16LE" ;In Windows Unicode => UTF-16 ? . Least significant byte is written first. ;
Goto end
StrCmp $R0 254255 0 +3 ;hex = FE FF (BOM)
StrCpy $R0 "UTF-16BE" ;In Windows Unicode => UTF-16 ? . Most significant byte is written first.
Goto end
StrCpy $R0 $R1$R2$R3
StrCmp $R0 239187191 0 +3 ;hex = EF BB BF (BOM)
StrCpy $R0 "UTF-8 (BOM)" ;A character could be written in one, two or three bytes
Goto end
StrCpy $R0 "ANSI" ;The Default value should be ANSI, because
;in ASCII character ANSI = UTF-8,
;but in Asia language ANSI != UTF-8.
end:

Pop $R4
Pop $R3
Pop $R2
Pop $R1
Exch $R0

FunctionEnd


Hola

Another question . If a unicode file is for exemple a sequence of this type :

3C 00 EF 00 ....

Why this function ?

FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8
IntOp $1 $1 | $2
IntFmt $1 %lc $1

And not this ?

FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntFmt $1 %lc $2

The problem is to jump the 00s or there is something more ?

Thanks in advance


See the picture, the English character Unicode format is xx00xx00, but Chinese isn't.
And English character in ANSI same as UTF-8, but Chinese isn't.


Thanks Bluenet for your explanation . I was ignorant about this . :eek:

I think that is more complicated for you to do programs that for me. I am wrong ?

Best regards


Please Bluenet , can you atach a sample of file in unicode chinese to play with it.

Thanks


What sample? In this code

FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8
IntOp $1 $1 | $2
IntFmt $1 %lc $1

$1 is a ANSI character convert from Unicode


Thanks bluenet for your reply.

The problem is that i don´t understand this :

FileReadByte $0 $2
FileReadByte $0 $1
FileClose $0
IntOp $1 $1 << 8 *1
IntOp $1 $1 | $2 *2
IntFmt $1 %lc $1

If for exemple we have this sequence of the file :

60 4F 7D 59

$2 = 60 and $1 = 4F after the two sentences (*1 y *2)
we have (i think) $1 = 4F 60 .

I think that only change the sequence of bytes. And why is this ansi?

Thanks


IntFmt is same as C++ wsprintf function.
I don't known why. But after convert to ANSI you can deal with FileWrite or StrCpy and so on.


Hello Bluenet


Thanks a lot for you patience