C wide string handling: Difference between revisions
→Problems: this is also true for any type that is implementation-defined, such as int, or even char (which on some platforms is larger than 8 bits). No reasons to explicitly criticize wchar_t |
→Problems: false, byte strings can not contain UTF-16 |
||
Line 11: | Line 11: | ||
The inconvenience of handling varied multibyte characters can be eliminated by using characters that are simply a uniform number of bytes, which therefore makes the string an array of a larger data type. [[ANSI C]] provides a type that allows storage of characters as uniform sized data objects called [[wide characters]].<ref>http://books.google.co.in/books?id=4Mfe4sAMFUYC&pg=PT26&lpg=PT26&dq=wide+characters+as+superset&source=bl&ots=tPLP1nN4qh&sig=f2W0ys85Ms9lRdT4HBEf_yoNL2U&hl=en&ei=H3SMTpiJFpGzrAfzvOycAg&sa=X&oi=book_result&ct=result&resnum=2&sqi=2&ved=0CCIQ6AEwAQ#v=onepage&q&f=false</ref> |
The inconvenience of handling varied multibyte characters can be eliminated by using characters that are simply a uniform number of bytes, which therefore makes the string an array of a larger data type. [[ANSI C]] provides a type that allows storage of characters as uniform sized data objects called [[wide characters]].<ref>http://books.google.co.in/books?id=4Mfe4sAMFUYC&pg=PT26&lpg=PT26&dq=wide+characters+as+superset&source=bl&ots=tPLP1nN4qh&sig=f2W0ys85Ms9lRdT4HBEf_yoNL2U&hl=en&ei=H3SMTpiJFpGzrAfzvOycAg&sa=X&oi=book_result&ct=result&resnum=2&sqi=2&ved=0CCIQ6AEwAQ#v=onepage&q&f=false</ref> |
||
==Problems== |
|||
The 16-bit size is now almost always used for [[UTF-16]], which is a variable-length encoding, thus removing the advantage of using <code>wchar_t</tt> over byte strings. |
|||
==Declarations and Definitions== |
==Declarations and Definitions== |
Revision as of 01:39, 24 December 2011
It has been suggested that this article be merged into C string handling. (Discuss) Proposed since December 2011. |
C standard library (libc) |
---|
General topics |
Miscellaneous headers |
C wide string handling refers to a group of functions implementing operations on wide strings in the C Standard Library. Various operations, such as copying, concatenation, tokenization and searching are supported.[1]
The only support in the C programming language itself for wide strings is that the compiler will translate a quoted wide string constant in the source into a null-terminated wide string stored in static memory.
Wide Characters
C is a programming language that was developed in an environment where the dominant character set was the 7-bit ASCII code. Hence since then the 8-bit byte is the most common unit of encoding. However when a software is developed for an international purpose, it has to be able to represent more than 256 different characters. For example character encoding schemes to represent the Indian, Chinese, Japanese writing systems should be available. This can only be done by using more than one byte per character. Initial versions used variable numbers of bytes, primarily so that ASCII characters could remain using one byte per character for compatibility.
The inconvenience of handling varied multibyte characters can be eliminated by using characters that are simply a uniform number of bytes, which therefore makes the string an array of a larger data type. ANSI C provides a type that allows storage of characters as uniform sized data objects called wide characters.[2]
Declarations and Definitions
Macros
The standard header wchar.h contains the definitions or declarations of some constants.
- NULL
- It is a Null pointer constant. It never points to a real object.
- WCHAR_MIN
- It indicates the lower limit or the minimum value for the type wchar_t.
- WCHAR_MAX
- It indicates the upper limit or the maximum value for the type wchar_t.
- WEOF
- It defines the return value of the type wint_t but the value does not correspond to any member of the extended character set. WEOF indicates the end of a character stream, the end of file(EOF) or an error case.[3]
Data Types
- mbstate_t
- A variable of type mbstate_t contains all the information about the conversion state required from one call to a function to the other.
- size_t
- It is a size/count type, that stores the result or the returned value of the size of operator.
- wchar_t
- An object of type wchar_t can hold a wide character. It is also required for declaring or referencing wide characters and wide strings.
- wint_t
- This type is an integer type that can hold any value corresponding to the members of the extended character set. It can hold all values of the type wchar_t as well as the value of the macro WEOF. This type is unchanged by integral promotions.
Functions
- Wide string manipulation
wcscpy
- copies one wide string to anotherwcsncpy
- writes exactly n characters to a wide string, copying from given string or adding nullswcscat
- appends one wide string to anotherwcsncat
- appends no more than n characters from one wide string to anotherwcsxfrm
- transforms a wide string according to the current locale
- Wide string examination
wcslen
- returns the length of a wide stringwcscmp
- compares two wide stringswcsncmp
- compares a specific number of characters in two wide stringswcscoll
- compares two wide strings according to the current localewcschr
- finds the first occurrence of a character in a wide stringwcsrchr
- finds the last occurrence of a character in a wide stringwcsspn
- finds in a wide string the first occurrence of a character not in a set of characterswcscspn
- finds in a wide string the last occurrence of a character not in a set of characterswcspbrk
- finds in a wide string the first occurrence of a character in a set of characterswcsstr
- finds in a wide string the first occurrence of a substringwcstok
- finds in a wide string the next occurrence of a token
- Memory manipulation
wmemset
- fills a buffer with a repeated wide characterwmemcpy
- copies one buffer to anotherwmemmove
- copies one buffer to another, possibly overlapping, bufferwmemcmp
- compares two bufferswmemchr
- finds the first occurrence of a wide character in a buffer
- Conversion Functions
mbtowc
- converts the first multibyte character in a string to the matching wide character.wctomb
- converts a wide character to the matching multibyte character
References
- ^ ISO/IEC 9899:1999 specification (PDF). p. 371, § 7.24.4 "General wide string utilities". Retrieved 30 November 2011.
- ^ http://books.google.co.in/books?id=4Mfe4sAMFUYC&pg=PT26&lpg=PT26&dq=wide+characters+as+superset&source=bl&ots=tPLP1nN4qh&sig=f2W0ys85Ms9lRdT4HBEf_yoNL2U&hl=en&ei=H3SMTpiJFpGzrAfzvOycAg&sa=X&oi=book_result&ct=result&resnum=2&sqi=2&ved=0CCIQ6AEwAQ#v=onepage&q&f=false
- ^ http://publib.boulder.ibm.com/infocenter/zos/v1r12/index.jsp?topic=%2Fcom.ibm.zos.r12.bpxbd00%2Fwcharh.htm