Unicode

Unicode — Unicode and UTF-8 utility functions.

Synopsis

typedef             raptor_unichar;
int                 raptor_unicode_utf8_string_put_char (raptor_unichar c,
                                                         unsigned char *output,
                                                         size_t length);
int                 raptor_unicode_utf8_string_get_char (const unsigned char *input,
                                                         size_t length,
                                                         raptor_unichar *output);
int                 raptor_unicode_is_xml11_namestartchar
                                                        (raptor_unichar c);
int                 raptor_unicode_is_xml10_namestartchar
                                                        (raptor_unichar c);
int                 raptor_unicode_is_xml11_namechar    (raptor_unichar c);
int                 raptor_unicode_is_xml10_namechar    (raptor_unichar c);
int                 raptor_unicode_check_utf8_string    (const unsigned char *string,
                                                         size_t length);
int                 raptor_unicode_utf8_strlen          (const unsigned char *string,
                                                         size_t length);
size_t              raptor_unicode_utf8_substr          (unsigned char *dest,
                                                         size_t *dest_length_p,
                                                         const unsigned char *src,
                                                         size_t src_length,
                                                         int startingLoc,
                                                         int length);

Description

Functions to support converting to and from Unicode written in UTF-8 which is the native internal string format of all the redland libraries. Includes checking for Unicode names using either the XML 1.0 or XML 1.1 rules.

Details

raptor_unichar

typedef unsigned long raptor_unichar;

raptor Unicode codepoint

raptor_unicode_utf8_string_put_char ()

int                 raptor_unicode_utf8_string_put_char (raptor_unichar c,
                                                         unsigned char *output,
                                                         size_t length);

Encode a Unicode character to a UTF-8 string

If output is NULL, then will calculate the length rather than perform the encoding. This can be used by the called to allocate space and then re-call this function with the new buffer.

`c` :	Unicode character
`output` :	UTF-8 string buffer or NULL
`length` :	length of output buffer
Returns :	number of bytes encoded to output buffer or <0 on failure

raptor_unicode_utf8_string_get_char ()

int                 raptor_unicode_utf8_string_get_char (const unsigned char *input,
                                                         size_t length,
                                                         raptor_unichar *output);

Decode a UTF-8 encoded string to get a Unicode character.

If output is NULL, then will calculate the number of bytes that will be used from the input buffer and not perform the conversion.

`input` :	UTF-8 string buffer
`length` :	buffer size
`output` :	Pointer to the Unicode character or NULL
Returns :	bytes used from input buffer or <0 on failure: -1 input buffer too short or length error, -2 overlong UTF-8 sequence, -3 illegal code positions, -4 code out of range U+0000 to U+10FFFF. In cases -2, -3 and -4 the coded character is stored in the output.

raptor_unicode_is_xml11_namestartchar ()

int                 raptor_unicode_is_xml11_namestartchar
                                                        (raptor_unichar c);

Check if Unicode character is legal to start an XML 1.1 Name

See Namespaces in XML 1.1 REC 2004-02-04 NameStartChar updating Extensible Markup Language (XML) 1.1 REC 2004-02-04 sec 2.3, [4a] excluding the ':'

`c` :	Unicode character to check
Returns :	non-0 if legal

raptor_unicode_is_xml10_namestartchar ()

int                 raptor_unicode_is_xml10_namestartchar
                                                        (raptor_unichar c);

Check if Unicode character is legal to start an XML 1.0 Name

See Namespaces in XML REC 1999-01-14 updating Extensible Markup Language (XML) 1.0 (Third Edition) REC 2004-02-04 excluding the ':'

`c` :	Unicode character to check
Returns :	non-0 if legal

raptor_unicode_is_xml11_namechar ()

int                 raptor_unicode_is_xml11_namechar    (raptor_unichar c);

Check if a Unicode codepoint is a legal to continue an XML 1.1 Name

See Namespaces in XML 1.1 REC 2004-02-04 updating Extensible Markup Language (XML) 1.0 (Third Edition) REC 2004-02-04 sec 2.3, [4a] excluding the ':'

`c` :	Unicode character
Returns :	non-0 if legal

raptor_unicode_is_xml10_namechar ()

int                 raptor_unicode_is_xml10_namechar    (raptor_unichar c);

Check if a Unicode codepoint is a legal to continue an XML 1.0 Name

See Namespaces in XML REC 1999-01-14 NCNameChar updating Extensible Markup Language (XML) 1.0 (Third Edition) REC 2004-02-04 excluding the ':'

`c` :	Unicode character
Returns :	non-0 if legal

raptor_unicode_check_utf8_string ()

int                 raptor_unicode_check_utf8_string    (const unsigned char *string,
                                                         size_t length);

Check a string is valid Unicode UTF-8.

`string` :	UTF-8 string
`length` :	length of string
Returns :	Non 0 if the string is UTF-8

raptor_unicode_utf8_strlen ()

int                 raptor_unicode_utf8_strlen          (const unsigned char *string,
                                                         size_t length);

Calculate the number of Unicode characters in the given UTF-8 encoded buffer

`string` :	buffer
`length` :	buffer length
Returns :	number of characters or <0 if sequence is invalid

raptor_unicode_utf8_substr ()

size_t              raptor_unicode_utf8_substr          (unsigned char *dest,
                                                         size_t *dest_length_p,
                                                         const unsigned char *src,
                                                         size_t src_length,
                                                         int startingLoc,
                                                         int length);

Get a unicode (UTF-8) substring of an existing UTF-8 string

If dest is NULL, returns the number of bytes needed to write and does no work.

`dest` :	destination string buffer to write to (or NULL)
`dest_length_p` :	location to store actual destination length (or NULL)
`src` :	source string
`src_length` :	source length in bytes
`startingLoc` :	starting location offset 0 for first Unicode character
`length` :	number of Unicode characters to copy at offset `startingLoc` (or < 0)
Returns :	number of bytes used in destination string or 0 on failure

			Raptor RDF Syntax Library Manual
Top \| Description