c wide string handling
C standard library (libc) |
---|
General topics |
Miscellaneous headers |
wchar.h is a header file in the C standard library. It is a part of the extension to the C programming language standard done in 1995. It contains extended multibyte and wide character utilities. The standard header <wchar.h> is included to perform input and output operations on wide streams. It can also be used to manipulate the wide strings.[1]
Wide Characters
C is a programming language that was developed in an environment where the dominant character set was the 7-bit ASCII code. Hence since then the 8-bit byte is the most common unit of encoding. However when a software is developed for an international purpose, it has to be able to represent different characters. For example character encoding schemes to represent the Indian, Chinese, Japanese writing systems should be available. The inconvenience of handling such varied multibyte characters can be eliminated by using characters that are simply a uniform number of bytes. ANSI C provides a type that allows manipulation of variable width characters as uniform sized data objects called wide characters. The wide character set is a superset of already existing character sets, including the 7-bit ASCII.[2]
Declarations and Definitions
Macros
The standard header wchar.h contains the definitions or declarations of some constants.
- NULL
- It is a Null pointer constant. It never points to a real object.
- WCHAR_MIN
- It indicates the lower limit or the minimum value for the type wchar_t.
- WCHAR_MAX
- It indicates the upper limit or the maximum value for the type wchar_t.
- WEOF
- It defines the return value of the type wint_t but the value does not correspond to any member of the extended character set. WEOF indicates the end of a character stream, the end of file(EOF) or an error case.[3]
Data Types
- mbstate_t
- A variable of type mbstate_t contains all the information about the conversion state required from one call to a function to the other.
- size_t
- It is a size/count type, that stores the result or the returned value of the size of operator.
- wchar_t
- An object of type wchar_t can hold a wide character. It is also required for declaring or referencing wide characters and wide strings.
- wint_t
- This type is an integer type that can hold any value corresponding to the members of the extended character set. It can hold all values of the type wchar_t as well as the value of the macro WEOF. This type is unchanged by integral promotions.
Functions
Wide-character string functions
Name | Notes |
---|---|
wchar_t *wcscat(wchar_t *s1, const wchar_t *s2);
|
copies wide string that s2 points to, to the end of the wide string that s1 points to. |
wchar_t *wcschr(const wchar_t *s, wchar_t c);
|
searches the wide string s for the wide character c. |
int wcscmp(const wchar_t *s1, const wchar_t *s2);
|
compares two wide strings that s1 and s2 point to. |
int wcscoll(const wchar_t *s1, const wchar_t *s2);
|
compares two wide strings s1 and s2 using current locale's collating order. |
wchar_t *wcscpy(wchar_t *s1, const wchar_t s2);
|
copies the wide string that s2 points to , to the location that s1 points to. |
size_t wcscspn(const wchar_t *s1, const wchar_t *s2);
|
searches for the very first element of s1 that equals any one of the elements of s2. |
size_t wcslen(const wchar_t *s);
|
returns the number of wide characters(excluding the terminating null wide charater) in the wide string that s points to. |
Wide-character array functions
Name | Notes |
---|---|
wchar_t *wmemchr(const wchar_t *s, wchar_t c, size_t n);
|
searches for the first element of the array of size n and that s points to, that equals c. |
int wmemcmp(const wchar_t *s1, const wchar_t *s2, size_t n);
|
compares the successive elements from two arrays that s1 and s2 point to, until it finds elements that are not equal. |
wchar_t *wmemcpy(wchar_t *s1, const wchar_t *s2, size_t n);
|
copies n wide characters from the array pointed to by s2 to the wide characters in an array pointed to by s1. If objects in s1 and s2 overlap, the behavior is undefined. |
wchar *wmemmove(wchar_t *s1, const wchar_t *s2, size_t n);
|
works like wmemcpy function even if objects in arrays s1 and s2 overlap. |
wchar_t *wmemset(wchar_t *s, wchar_t c, size_t n)
|
sets the first n elements of the array that s points to, to the wide character c. |
Conversion Functions
Name | Notes |
---|---|
wint_t btowc(int c);
|
returns the result after converting c into its wide character equivalent and on error returns WEOF. |
int wctob(wint_t c);
|
returns the one byte or multibyte equivalent of c and on error returns WEOF. |
References
- ^ http://www.qnx.com/developers/docs/6.4.1/dinkum_en/c99/wchar.html
- ^ http://books.google.co.in/books?id=4Mfe4sAMFUYC&pg=PT26&lpg=PT26&dq=wide+characters+as+superset&source=bl&ots=tPLP1nN4qh&sig=f2W0ys85Ms9lRdT4HBEf_yoNL2U&hl=en&ei=H3SMTpiJFpGzrAfzvOycAg&sa=X&oi=book_result&ct=result&resnum=2&sqi=2&ved=0CCIQ6AEwAQ#v=onepage&q&f=false
- ^ http://publib.boulder.ibm.com/infocenter/zos/v1r12/index.jsp?topic=%2Fcom.ibm.zos.r12.bpxbd00%2Fwcharh.htm
- ^ http://www.prenhall.com/jaeschke/pdf/w.pdf