Jump to content

C wide string handling: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Problems: this is also true for any type that is implementation-defined, such as int, or even char (which on some platforms is larger than 8 bits). No reasons to explicitly criticize wchar_t
Problems: false, byte strings can not contain UTF-16
Line 11: Line 11:


The inconvenience of handling varied multibyte characters can be eliminated by using characters that are simply a uniform number of bytes, which therefore makes the string an array of a larger data type. [[ANSI C]] provides a type that allows storage of characters as uniform sized data objects called [[wide characters]].<ref>http://books.google.co.in/books?id=4Mfe4sAMFUYC&pg=PT26&lpg=PT26&dq=wide+characters+as+superset&source=bl&ots=tPLP1nN4qh&sig=f2W0ys85Ms9lRdT4HBEf_yoNL2U&hl=en&ei=H3SMTpiJFpGzrAfzvOycAg&sa=X&oi=book_result&ct=result&resnum=2&sqi=2&ved=0CCIQ6AEwAQ#v=onepage&q&f=false</ref>
The inconvenience of handling varied multibyte characters can be eliminated by using characters that are simply a uniform number of bytes, which therefore makes the string an array of a larger data type. [[ANSI C]] provides a type that allows storage of characters as uniform sized data objects called [[wide characters]].<ref>http://books.google.co.in/books?id=4Mfe4sAMFUYC&pg=PT26&lpg=PT26&dq=wide+characters+as+superset&source=bl&ots=tPLP1nN4qh&sig=f2W0ys85Ms9lRdT4HBEf_yoNL2U&hl=en&ei=H3SMTpiJFpGzrAfzvOycAg&sa=X&oi=book_result&ct=result&resnum=2&sqi=2&ved=0CCIQ6AEwAQ#v=onepage&q&f=false</ref>

==Problems==

The 16-bit size is now almost always used for [[UTF-16]], which is a variable-length encoding, thus removing the advantage of using <code>wchar_t</tt> over byte strings.


==Declarations and Definitions==
==Declarations and Definitions==

Revision as of 01:39, 24 December 2011

C wide string handling refers to a group of functions implementing operations on wide strings in the C Standard Library. Various operations, such as copying, concatenation, tokenization and searching are supported.[1]

The only support in the C programming language itself for wide strings is that the compiler will translate a quoted wide string constant in the source into a null-terminated wide string stored in static memory.

Wide Characters

C is a programming language that was developed in an environment where the dominant character set was the 7-bit ASCII code. Hence since then the 8-bit byte is the most common unit of encoding. However when a software is developed for an international purpose, it has to be able to represent more than 256 different characters. For example character encoding schemes to represent the Indian, Chinese, Japanese writing systems should be available. This can only be done by using more than one byte per character. Initial versions used variable numbers of bytes, primarily so that ASCII characters could remain using one byte per character for compatibility.

The inconvenience of handling varied multibyte characters can be eliminated by using characters that are simply a uniform number of bytes, which therefore makes the string an array of a larger data type. ANSI C provides a type that allows storage of characters as uniform sized data objects called wide characters.[2]

Declarations and Definitions

Macros

The standard header wchar.h contains the definitions or declarations of some constants.

NULL
It is a Null pointer constant. It never points to a real object.
WCHAR_MIN
It indicates the lower limit or the minimum value for the type wchar_t.
WCHAR_MAX
It indicates the upper limit or the maximum value for the type wchar_t.
WEOF
It defines the return value of the type wint_t but the value does not correspond to any member of the extended character set. WEOF indicates the end of a character stream, the end of file(EOF) or an error case.[3]

Data Types

mbstate_t
A variable of type mbstate_t contains all the information about the conversion state required from one call to a function to the other.
size_t
It is a size/count type, that stores the result or the returned value of the size of operator.
wchar_t
An object of type wchar_t can hold a wide character. It is also required for declaring or referencing wide characters and wide strings.
wint_t
This type is an integer type that can hold any value corresponding to the members of the extended character set. It can hold all values of the type wchar_t as well as the value of the macro WEOF. This type is unchanged by integral promotions.

Functions

Wide string manipulation
  • wcscpy - copies one wide string to another
  • wcsncpy - writes exactly n characters to a wide string, copying from given string or adding nulls
  • wcscat - appends one wide string to another
  • wcsncat - appends no more than n characters from one wide string to another
  • wcsxfrm - transforms a wide string according to the current locale
Wide string examination
  • wcslen - returns the length of a wide string
  • wcscmp - compares two wide strings
  • wcsncmp - compares a specific number of characters in two wide strings
  • wcscoll - compares two wide strings according to the current locale
  • wcschr - finds the first occurrence of a character in a wide string
  • wcsrchr - finds the last occurrence of a character in a wide string
  • wcsspn - finds in a wide string the first occurrence of a character not in a set of characters
  • wcscspn - finds in a wide string the last occurrence of a character not in a set of characters
  • wcspbrk - finds in a wide string the first occurrence of a character in a set of characters
  • wcsstr - finds in a wide string the first occurrence of a substring
  • wcstok - finds in a wide string the next occurrence of a token
Memory manipulation
  • wmemset - fills a buffer with a repeated wide character
  • wmemcpy - copies one buffer to another
  • wmemmove - copies one buffer to another, possibly overlapping, buffer
  • wmemcmp - compares two buffers
  • wmemchr - finds the first occurrence of a wide character in a buffer
Conversion Functions
  • mbtowc - converts the first multibyte character in a string to the matching wide character.
  • wctomb - converts a wide character to the matching multibyte character

References

  1. ^ ISO/IEC 9899:1999 specification (PDF). p. 371, § 7.24.4 "General wide string utilities". Retrieved 30 November 2011.
  2. ^ http://books.google.co.in/books?id=4Mfe4sAMFUYC&pg=PT26&lpg=PT26&dq=wide+characters+as+superset&source=bl&ots=tPLP1nN4qh&sig=f2W0ys85Ms9lRdT4HBEf_yoNL2U&hl=en&ei=H3SMTpiJFpGzrAfzvOycAg&sa=X&oi=book_result&ct=result&resnum=2&sqi=2&ved=0CCIQ6AEwAQ#v=onepage&q&f=false
  3. ^ http://publib.boulder.ibm.com/infocenter/zos/v1r12/index.jsp?topic=%2Fcom.ibm.zos.r12.bpxbd00%2Fwcharh.htm