CoreComponents 3.0.0
A Modern C++ Toolkit
|
Iterate code points of an UTF-8 encoded string. More...
#include <cc/Utf8Iterator>
Public Types | |
using | iterator_category = std::bidirectional_iterator_tag |
using | value_type = char32_t |
using | difference_type = std::ptrdiff_t |
using | pointer = void |
using | reference = char32_t |
Public Member Functions | |
Utf8Iterator () | |
Create an invalid iterator. | |
Utf8Iterator (const char *start, const char *end, const char *pos) | |
Create a new iterator. | |
Utf8Iterator (const Utf8Iterator &b)=default | |
Utf8Iterator & | operator++ () |
Prefix increment operator: step a single code point forward | |
Utf8Iterator & | operator-- () |
Prefix decrement operator: step a single code point backward | |
Utf8Iterator | operator++ (int) |
Postfix increment operator: step a single code point forward and return old position. | |
Utf8Iterator | operator-- (int) |
Postfix decrement operator: step a single code point backward and return old position. | |
Utf8Iterator & | operator+= (std::ptrdiff_t d) |
Assignment addition operator: step in forward direction a given distance d. | |
Utf8Iterator & | operator-= (std::ptrdiff_t d) |
Assignment substraction operator: step in backward direction a given distance d. | |
Utf8Iterator | operator+ (std::ptrdiff_t d) const |
Addition operator: get iterator in forward direction at given distance d. | |
Utf8Iterator | operator- (std::ptrdiff_t d) const |
Substraction operator: get iterator in backward direction at given distance d. | |
std::ptrdiff_t | operator- (const Utf8Iterator &b) const |
Difference operator: compute distance in number of characters. | |
operator bool () const | |
Cast to bool operator: indicate if this iterator can step forward another code point. | |
char32_t | operator* () const |
Dereference operator: decode current code point. | |
std::ptrdiff_t | operator+ () const |
Unary plus operator: return the current decoding position as a byte offset. | |
bool | operator== (const Utf8Iterator &b) const |
Compare for equality. | |
bool | operator!= (const Utf8Iterator &b) const |
Compare for in-equality. | |
Iterate code points of an UTF-8 encoded string.
The Utf8Iterator allows iterating Unicode characters of an UTF-8 byte sequence. The iterator will always halt at the string boundaries. If stepping over the string boundary the iterator will automatically switch to an invalid state delivering zero code points.
If placed at a string's terminating zero character it is possible to step backwards into the string.
Illegal code sequences are overrun without error. In forward iteration corrupted code prefixes may lead to at maximum one correct character overrun and up to 3 additional corrupt characters delivered. Any bit error outside the code prefixes will lead to at most one illegal character delivered – with one exception: switching to all bits zero (a zero byte is always string terminating).
Utf8Iterator | ( | ) |
Create an invalid iterator.
Utf8Iterator | ( | const char * | start, |
const char * | end, | ||
const char * | pos ) |
Create a new iterator.
start | Pointer to start of UTF-8 encoded string |
end | Pointer to end of UTF-8 encoded string (behind last valid byte) |
pos | Current position within the UTF-8 encoded string |
Utf8Iterator & operator++ | ( | ) |
Prefix increment operator: step a single code point forward
Utf8Iterator & operator-- | ( | ) |
Prefix decrement operator: step a single code point backward
Utf8Iterator operator++ | ( | int | ) |
Postfix increment operator: step a single code point forward and return old position.
Utf8Iterator operator-- | ( | int | ) |
Postfix decrement operator: step a single code point backward and return old position.
Utf8Iterator & operator+= | ( | std::ptrdiff_t | d | ) |
Assignment addition operator: step in forward direction a given distance d.
Utf8Iterator & operator-= | ( | std::ptrdiff_t | d | ) |
Assignment substraction operator: step in backward direction a given distance d.
Utf8Iterator operator+ | ( | std::ptrdiff_t | d | ) | const |
Addition operator: get iterator in forward direction at given distance d.
Utf8Iterator operator- | ( | std::ptrdiff_t | d | ) | const |
Substraction operator: get iterator in backward direction at given distance d.
std::ptrdiff_t operator- | ( | const Utf8Iterator & | b | ) | const |
Difference operator: compute distance in number of characters.
|
explicit |
Cast to bool operator: indicate if this iterator can step forward another code point.
char32_t operator* | ( | ) | const |
Dereference operator: decode current code point.
std::ptrdiff_t operator+ | ( | ) | const |
Unary plus operator: return the current decoding position as a byte offset.
bool operator== | ( | const Utf8Iterator & | b | ) | const |
Compare for equality.
bool operator!= | ( | const Utf8Iterator & | b | ) | const |
Compare for in-equality.