![]() |
![]() |
![]() |
Raptor RDF Syntax Library Manual | ![]() |
---|---|---|---|---|
Top | Description |
typedef raptor_unichar; int raptor_unicode_utf8_string_put_char (raptor_unichar c
,unsigned char *output
,size_t length
); int raptor_unicode_utf8_string_get_char (const unsigned char *input
,size_t length
,raptor_unichar *output
); int raptor_unicode_is_xml11_namestartchar (raptor_unichar c
); int raptor_unicode_is_xml10_namestartchar (raptor_unichar c
); int raptor_unicode_is_xml11_namechar (raptor_unichar c
); int raptor_unicode_is_xml10_namechar (raptor_unichar c
); int raptor_unicode_check_utf8_string (const unsigned char *string
,size_t length
); int raptor_unicode_utf8_strlen (const unsigned char *string
,size_t length
); size_t raptor_unicode_utf8_substr (unsigned char *dest
,size_t *dest_length_p
,const unsigned char *src
,size_t src_length
,int startingLoc
,int length
);
Functions to support converting to and from Unicode written in UTF-8 which is the native internal string format of all the redland libraries. Includes checking for Unicode names using either the XML 1.0 or XML 1.1 rules.
int raptor_unicode_utf8_string_put_char (raptor_unichar c
,unsigned char *output
,size_t length
);
Encode a Unicode character to a UTF-8 string
If output
is NULL, then will calculate the length rather than
perform the encoding. This can be used by the called to allocate
space and then re-call this function with the new buffer.
|
Unicode character |
|
UTF-8 string buffer or NULL |
|
length of output buffer |
Returns : |
number of bytes encoded to output buffer or <0 on failure |
int raptor_unicode_utf8_string_get_char (const unsigned char *input
,size_t length
,raptor_unichar *output
);
Decode a UTF-8 encoded string to get a Unicode character.
If output is NULL, then will calculate the number of bytes that will be used from the input buffer and not perform the conversion.
|
UTF-8 string buffer |
|
buffer size |
|
Pointer to the Unicode character or NULL |
Returns : |
bytes used from input buffer or <0 on failure: -1 input buffer too short or length error, -2 overlong UTF-8 sequence, -3 illegal code positions, -4 code out of range U+0000 to U+10FFFF. In cases -2, -3 and -4 the coded character is stored in the output. |
int raptor_unicode_is_xml11_namestartchar
(raptor_unichar c
);
Check if Unicode character is legal to start an XML 1.1 Name
See Namespaces in XML 1.1 REC 2004-02-04 NameStartChar updating Extensible Markup Language (XML) 1.1 REC 2004-02-04 sec 2.3, [4a] excluding the ':'
|
Unicode character to check |
Returns : |
non-0 if legal |
int raptor_unicode_is_xml10_namestartchar
(raptor_unichar c
);
Check if Unicode character is legal to start an XML 1.0 Name
See Namespaces in XML REC 1999-01-14 updating Extensible Markup Language (XML) 1.0 (Third Edition) REC 2004-02-04 excluding the ':'
|
Unicode character to check |
Returns : |
non-0 if legal |
int raptor_unicode_is_xml11_namechar (raptor_unichar c
);
Check if a Unicode codepoint is a legal to continue an XML 1.1 Name
See Namespaces in XML 1.1 REC 2004-02-04 updating Extensible Markup Language (XML) 1.0 (Third Edition) REC 2004-02-04 sec 2.3, [4a] excluding the ':'
|
Unicode character |
Returns : |
non-0 if legal |
int raptor_unicode_is_xml10_namechar (raptor_unichar c
);
Check if a Unicode codepoint is a legal to continue an XML 1.0 Name
See Namespaces in XML REC 1999-01-14 NCNameChar updating Extensible Markup Language (XML) 1.0 (Third Edition) REC 2004-02-04 excluding the ':'
|
Unicode character |
Returns : |
non-0 if legal |
int raptor_unicode_check_utf8_string (const unsigned char *string
,size_t length
);
Check a string is valid Unicode UTF-8.
|
UTF-8 string |
|
length of string |
Returns : |
Non 0 if the string is UTF-8 |
int raptor_unicode_utf8_strlen (const unsigned char *string
,size_t length
);
Calculate the number of Unicode characters in the given UTF-8 encoded buffer
|
buffer |
|
buffer length |
Returns : |
number of characters or <0 if sequence is invalid |
size_t raptor_unicode_utf8_substr (unsigned char *dest
,size_t *dest_length_p
,const unsigned char *src
,size_t src_length
,int startingLoc
,int length
);
Get a unicode (UTF-8) substring of an existing UTF-8 string
If dest
is NULL, returns the number of bytes needed to write and
does no work.
|
destination string buffer to write to (or NULL) |
|
location to store actual destination length (or NULL) |
|
source string |
|
source length in bytes |
|
starting location offset 0 for first Unicode character |
|
number of Unicode characters to copy at offset startingLoc (or < 0) |
Returns : |
number of bytes used in destination string or 0 on failure |