Special Characters Displaying As Question Marks

Spilly
30th April 2008 01:07 UTC

Special Characters Displaying As Question Marks
Hey guys,
I'm working on an english language installer that should display properly on a chinese language OS (Win XP).

All the text displays correctly, except the copyright character and three bullet points, which display as question marks.

Is this unicode related or something I'm doing wrong?

Thanks

Pidgeot
30th April 2008 09:00 UTC

Those characters are most likely not in the Chinese codepage (Copyright isn't, at least - not sure about your bullet points, since I can't see them). You need to either emulate these characters using something else (for example, using (c) instead of Â©), or make it a Unicode script and use the Unicode version of NSIS (http://forums.winamp.com/showthread....hreadid=277381).

If you want to check if a character appears in a code page, use Character Map. Select a Chinese font, and in Advanced view, select one of the Chinese Windows character sets (depending on the one you'll be using). Then type part of the character name in the Search box. If the character appears, you can use it without Unicode.

Spilly
30th April 2008 18:12 UTC

Isn't there a way through MS Shell Dlg to get these characters to display properly? I thought that was the whole point of the logical fonts, so that the English character set could be correctly displayed on any system.

For instance, on the Chinese system I'm testing on, it has MS Shell DLG mapped to Microsoft Sans Serif and MS Shell Dlg 2 to Tahoma. Microsoft Sans Serif has the copyright symbol, although I don't see the bullet point and Tahoma has both.

What am I missing?

Pidgeot
30th April 2008 19:01 UTC

The issue is not the font, but the code page (or character encoding, if you prefer).

Essentially, a code page maps a given sequence of bytes to a specific character. Obviously, the mapping is different between codepages - 0xA0 can map to one character in one code page, but another character in another.

When you're using the ANSI version of NSIS, the system translates your byte sequences to characters based on the codepage suited for the language it expects (on NT-based OS'es, this is set in the Control Panel). If it encounters a byte it can't make sense of (for example, if there is no mapping), then it will substitute a question mark.

This also means that any characters that don't exist in the code page won't be written as expected - this is why NSIS won't show Chinese characters if it tries to use an English codepage (as is usual for an English system).

The fact that you have fonts that contain the character you want is irrelevant - even if it's the same font - because the system doesn't know a byte sequence that maps to that character.

Unicode solves this issue because it includes all characters - Arabic, Russian, Chinese, you name it. Every character has a byte sequence associated with it in a Unicode-based code page, allowing you to mix and match characters exactly as you need to - and you don't have to make configuration changes or copy your installer to a Chinese system to see if the Chinese text appears properly.