Wide character: Difference between revisions

Content deleted Content added

Inline

Revision as of 20:02, 15 December 2009

Wide character is a computer programming term. It is a vague term used to represent a datatype that is richer than the traditional (8-bit) characters. It is not the same thing as Unicode.

wchar_t is a data type in ANSI/ISO C, ANSI/ISO C++, and some other programming languages that is intended to represent wide characters.

The Unicode standard 4.0 says that

"ANSI/ISO C leaves the semantics of the wide character set to the specific implementation but requires that the characters from the portable C execution set correspond to their wide character equivalents by zero extension."

and that

"The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers."

Under Win32, wchar_t is 16 bits wide and represents a UTF-16 code unit. On Unix-like systems wchar_t is commonly 32 bits wide and represents a UTF-32 code unit.

In ANSI C library header files, <wchar.h> and <wctype.h> deal with the wide characters.

Functions

There are several functions in C's stdlib.h to help with wchar_t's.

wctomb() - wide character to multibyte character ^[1]
mbtowc() - multibyte character to wide char ^[2]
wcstombs() - wide-char string to multibyte character string ^[3]
mbstowcs() - multibyte character string to wide-char string ^[4]
mblen() - number of bytes in a multibyte character ^[5]

The author of GNU libc advises to avoid these due to the 'state' mechanism they involve, and instead suggests the 'restartable' mbsrtowcs et al functions. ^[6]

External links

The Unicode Standard, Version 4.0 - online edition

Notes

^ C++ Resources Network - wctomb, access 2009 12 15
^ C++ Resources Network - mbtowc, access 2009 12 15
^ C++ Resources Network - wcstombs, access 2009 12 15
^ C++ Resources Network - mbstowcs, access 2009 12 15
^ C++ Resources Network - mblen, access 2009 12 15
^ GNU.org libc source code, libc/stdlib/mbstowcs.c, accessed 2009 12 15

This programming-language-related article is a stub. You can help Wikipedia by expanding it.

[1] C++ Resources Network - wctomb, access 2009 12 15

[2] C++ Resources Network - mbtowc, access 2009 12 15

[3] C++ Resources Network - wcstombs, access 2009 12 15

[4] C++ Resources Network - mbstowcs, access 2009 12 15

[5] C++ Resources Network - mblen, access 2009 12 15

[6] GNU.org libc source code, libc/stdlib/mbstowcs.c, accessed 2009 12 15

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 23: / Line 23: @@
 * mbstowcs() - [[multibyte character]] string to wide-char string <ref>[http://www.cplusplus.com/reference/clibrary/cstdlib/mbtowc/ C++ Resources Network - mbstowcs], access 2009 12 15</ref>
 * mblen() - number of bytes in a [[multibyte character]] <ref>[http://www.cplusplus.com/reference/clibrary/cstdlib/mbtowc/ C++ Resources Network - mblen], access 2009 12 15</ref>
+The author of [[GNU]] [[libc]] advises to avoid these due to the 'state' mechanism they involve, and instead suggests the 'restartable' mbsrtowcs et al functions. <ref>[http://cvs.savannah.gnu.org/viewvc/libc/stdlib/mbstowcs.c?revision=1.9&root=libc&view=markup GNU.org libc source code, libc/stdlib/mbstowcs.c], accessed 2009 12 15</ref>
 ==External links==