[closed] Viewing files as UTF-8?

Post your new features ideas and suggestions here.
Post Reply
dandv
Posts: 23
Joined: Fri Jan 07, 2005 4:02 am
Location: San Jose, CA
Contact:

[closed] Viewing files as UTF-8?

Post by dandv »

Hi,

Is there a way to view files as UTF-8?

For example, a file containing "m├óine" ('m', 0xC3, 0xA2, "'ine") should display "mâine" ("tomorrow" in Romanian).

This would be an extremely handy addition for software localization engineers like me :-) And anyone dealing with files in non-English languages, I guess.

Thanks,
Dan

grigsoft
Site Admin
Posts: 1673
Joined: Tue Sep 23, 2003 7:37 pm
Contact:

Post by grigsoft »

In fact at least Unicode version should correctly identify and handle UTF8 files. Have you tried it?

Guest

Post by Guest »

grigsoft wrote:In fact at least Unicode version should correctly identify and handle UTF8 files. Have you tried it?
Thanks for the tip, the Unicode version correctly displayed all UTF-8 characters I tested after I changed the font from Courier (which didn't support ş and ţ) to Courier New. BTW, you might want to set Courier New as the default fixed font.
grigsoft wrote:In fact at least Unicode version should correctly identify and handle UTF8 files.
How exactly does it identify UTF8 files without a BOM at the beginning? By scanning for bytes > 127 or generally for valid UTF8? The Unicode version did show the files as UTF8, but what if the user wants to view the raw bytes (no UTF8 interpretation, such as if they need a specific code page), is there an option for this in the Unicode version?

Thanks,
Dan

grigsoft
Site Admin
Posts: 1673
Joined: Tue Sep 23, 2003 7:37 pm
Contact:

Post by grigsoft »

No, currently there is no option to view Unicode/UTF files as plain text, I think I will add this feature. To identify text as UTF it analyzes first several K for valid UTF sequences. It can also idetify such sequence in the middle of file and toggles to UTF mode later.

dandv
Posts: 23
Joined: Fri Jan 07, 2005 4:02 am
Location: San Jose, CA
Contact:

Post by dandv »

In the Unicode version of CompareIt!, could View->View Whitespace also mark as whitespace the "Unicode non breaking space" character (U+00A0) ?

That might help,
Dan Dascalescu

grigsoft
Site Admin
Posts: 1673
Joined: Tue Sep 23, 2003 7:37 pm
Contact:

Post by grigsoft »

Thank you, Dan! I didn't know this. I will try to do it.

Post Reply