CoreComponents 3.0.0
A Modern C++ Toolkit
Loading...
Searching...
No Matches
Utf8Iterator Class Reference

Iterate code points of an UTF-8 encoded string. More...

#include <cc/Utf8Iterator>

Public Types

using iterator_category = std::bidirectional_iterator_tag
 
using value_type = char32_t
 
using difference_type = std::ptrdiff_t
 
using pointer = void
 
using reference = char32_t
 

Public Member Functions

 Utf8Iterator ()
 Create an invalid iterator.
 
 Utf8Iterator (const char *start, const char *end, const char *pos)
 Create a new iterator.
 
 Utf8Iterator (const Utf8Iterator &b)=default
 
Utf8Iteratoroperator++ ()
 Prefix increment operator: step a single code point forward
 
Utf8Iteratoroperator-- ()
 Prefix decrement operator: step a single code point backward
 
Utf8Iterator operator++ (int)
 Postfix increment operator: step a single code point forward and return old position.
 
Utf8Iterator operator-- (int)
 Postfix decrement operator: step a single code point backward and return old position.
 
Utf8Iteratoroperator+= (std::ptrdiff_t d)
 Assignment addition operator: step in forward direction a given distance d.
 
Utf8Iteratoroperator-= (std::ptrdiff_t d)
 Assignment substraction operator: step in backward direction a given distance d.
 
Utf8Iterator operator+ (std::ptrdiff_t d) const
 Addition operator: get iterator in forward direction at given distance d.
 
Utf8Iterator operator- (std::ptrdiff_t d) const
 Substraction operator: get iterator in backward direction at given distance d.
 
std::ptrdiff_t operator- (const Utf8Iterator &b) const
 Difference operator: compute distance in number of characters.
 
 operator bool () const
 Cast to bool operator: indicate if this iterator can step forward another code point.
 
char32_t operator* () const
 Dereference operator: decode current code point.
 
std::ptrdiff_t operator+ () const
 Unary plus operator: return the current decoding position as a byte offset.
 
bool operator== (const Utf8Iterator &b) const
 Compare for equality.
 
bool operator!= (const Utf8Iterator &b) const
 Compare for in-equality.
 

Detailed Description

Iterate code points of an UTF-8 encoded string.

The Utf8Iterator allows iterating Unicode characters of an UTF-8 byte sequence. The iterator will always halt at the string boundaries. If stepping over the string boundary the iterator will automatically switch to an invalid state delivering zero code points.

If placed at a string's terminating zero character it is possible to step backwards into the string.

Illegal code sequences are overrun without error. In forward iteration corrupted code prefixes may lead to at maximum one correct character overrun and up to 3 additional corrupt characters delivered. Any bit error outside the code prefixes will lead to at most one illegal character delivered – with one exception: switching to all bits zero (a zero byte is always string terminating).

Constructor & Destructor Documentation

◆ Utf8Iterator() [1/2]

Create an invalid iterator.

◆ Utf8Iterator() [2/2]

Utf8Iterator ( const char * start,
const char * end,
const char * pos )

Create a new iterator.

Parameters
startPointer to start of UTF-8 encoded string
endPointer to end of UTF-8 encoded string (behind last valid byte)
posCurrent position within the UTF-8 encoded string

Member Function Documentation

◆ operator++() [1/2]

Utf8Iterator & operator++ ( )

Prefix increment operator: step a single code point forward

◆ operator--() [1/2]

Utf8Iterator & operator-- ( )

Prefix decrement operator: step a single code point backward

◆ operator++() [2/2]

Utf8Iterator operator++ ( int )

Postfix increment operator: step a single code point forward and return old position.

◆ operator--() [2/2]

Utf8Iterator operator-- ( int )

Postfix decrement operator: step a single code point backward and return old position.

◆ operator+=()

Utf8Iterator & operator+= ( std::ptrdiff_t d)

Assignment addition operator: step in forward direction a given distance d.

◆ operator-=()

Utf8Iterator & operator-= ( std::ptrdiff_t d)

Assignment substraction operator: step in backward direction a given distance d.

◆ operator+() [1/2]

Utf8Iterator operator+ ( std::ptrdiff_t d) const

Addition operator: get iterator in forward direction at given distance d.

◆ operator-() [1/2]

Utf8Iterator operator- ( std::ptrdiff_t d) const

Substraction operator: get iterator in backward direction at given distance d.

◆ operator-() [2/2]

std::ptrdiff_t operator- ( const Utf8Iterator & b) const

Difference operator: compute distance in number of characters.

◆ operator bool()

operator bool ( ) const
explicit

Cast to bool operator: indicate if this iterator can step forward another code point.

◆ operator*()

char32_t operator* ( ) const

Dereference operator: decode current code point.

◆ operator+() [2/2]

std::ptrdiff_t operator+ ( ) const

Unary plus operator: return the current decoding position as a byte offset.

◆ operator==()

bool operator== ( const Utf8Iterator & b) const

Compare for equality.

◆ operator!=()

bool operator!= ( const Utf8Iterator & b) const

Compare for in-equality.