ICU 76.1 76.1
unistr.h File Reference

C++ API: Unicode String. More...

#include "unicode/utypes.h"
#include <cstddef>
#include <string_view>
#include "unicode/char16ptr.h"
#include "unicode/rep.h"
#include "unicode/std_string.h"
#include "unicode/stringpiece.h"
#include "unicode/bytestream.h"

Go to the source code of this file.

Data Structures

class  icu::UnicodeString
 UnicodeString is a string class that stores Unicode characters directly and provides similar functionality as the Java String and StringBuffer/StringBuilder classes. More...
 

Namespaces

namespace  icu
 File coll.h.
 

Macros

#define US_INV   icu::UnicodeString::kInvariant
 Constant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a Unicode string from an invariant-character char * string.
 
#define UNICODE_STRING(cs, _length)
 Obsolete macro approximating UnicodeString literals.
 
#define UNICODE_STRING_SIMPLE(cs)
 Unicode String literals in C++.
 
#define UNISTR_FROM_CHAR_EXPLICIT
 This can be defined to be empty or "explicit".
 
#define UNISTR_FROM_STRING_EXPLICIT
 This can be defined to be empty or "explicit".
 
#define UNISTR_OBJECT_SIZE   64
 Desired sizeof(UnicodeString) in bytes.
 

Typedefs

typedef int32_t UStringCaseMapper(int32_t caseLocale, uint32_t options, icu::BreakIterator *iter, char16_t *dest, int32_t destCapacity, const char16_t *src, int32_t srcLength, icu::Edits *edits, UErrorCode &errorCode)
 Internal string case mapping function type.
 

Functions

U_CAPI int32_t u_strlen (const UChar *s)
 
U_COMMON_API UnicodeString icu::operator+ (const UnicodeString &s1, const UnicodeString &s2)
 Creates a new UnicodeString from the concatenation of two others.
 
template<typename S , typename = std::enable_if_t<ConvertibleToU16StringView<S>>>
UnicodeString icu::operator+ (const UnicodeString &s1, const S &s2)
 Creates a new UnicodeString from the concatenation of a UnicodeString and s2 which is, or which is implicitly convertible to, a std::u16string_view or (if U_SIZEOF_WCHAR_T==2) std::wstring_view.
 
U_COMMON_API UnicodeString icu::unistr_internalConcat (const UnicodeString &s1, std::u16string_view s2)
 

Detailed Description

C++ API: Unicode String.

Definition in file unistr.h.

Macro Definition Documentation

◆ UNICODE_STRING

#define UNICODE_STRING ( cs,
_length )
Value:
icu::UnicodeString(true, u ## cs, _length)
UnicodeString is a string class that stores Unicode characters directly and provides similar function...
Definition unistr.h:296

Obsolete macro approximating UnicodeString literals.

Prior to the availability of C++11 and u"UTF-16 string literals", this macro was provided for portability and efficiency when initializing UnicodeStrings from literals.

Since C++17 and ICU 76, you can use UTF-16 string literals with compile-time length determination:

UnicodeString str(u"literal");
if (str == u"other literal") { ... }

The string parameter must be a C string literal. The length of the string, not including the terminating NUL, must be specified as a constant.

Stable
ICU 2.0

Definition at line 121 of file unistr.h.

◆ UNICODE_STRING_SIMPLE

#define UNICODE_STRING_SIMPLE ( cs)
Value:
#define UNICODE_STRING(cs, _length)
Obsolete macro approximating UnicodeString literals.
Definition unistr.h:121

Unicode String literals in C++.

Obsolete macro approximating UnicodeString literals. See UNICODE_STRING.

The string parameter must be a C string literal.

Stable
ICU 2.0
See also
UNICODE_STRING

Definition at line 135 of file unistr.h.

◆ UNISTR_FROM_CHAR_EXPLICIT

#define UNISTR_FROM_CHAR_EXPLICIT

This can be defined to be empty or "explicit".

If explicit, then the UnicodeString(char16_t) and UnicodeString(UChar32) constructors are marked as explicit, preventing their inadvertent use.

Stable
ICU 49

Definition at line 150 of file unistr.h.

◆ UNISTR_FROM_STRING_EXPLICIT

#define UNISTR_FROM_STRING_EXPLICIT

This can be defined to be empty or "explicit".

If explicit, then the UnicodeString(const char *) and UnicodeString(const char16_t *) constructors are marked as explicit, preventing their inadvertent use.

In particular, this helps prevent accidentally depending on ICU conversion code by passing a string literal into an API with a const UnicodeString & parameter.

Stable
ICU 49

Definition at line 170 of file unistr.h.

◆ UNISTR_OBJECT_SIZE

#define UNISTR_OBJECT_SIZE   64

Desired sizeof(UnicodeString) in bytes.

It should be a multiple of sizeof(pointer) to avoid unusable space for padding. The object size may want to be a multiple of 16 bytes, which is a common granularity for heap allocation.

Any space inside the object beyond sizeof(vtable pointer) + 2 is available for storing short strings inside the object. The bigger the object, the longer a string that can be stored inside the object, without additional heap allocation.

Depending on a platform's pointer size, pointer alignment requirements, and struct padding, the compiler will usually round up sizeof(UnicodeString) to 4 * sizeof(pointer) (or 3 * sizeof(pointer) for P128 data models), to hold the fields for heap-allocated strings. Such a minimum size also ensures that the object is easily large enough to hold at least 2 char16_ts, for one supplementary code point (U16_MAX_LENGTH).

sizeof(UnicodeString) >= 48 should work for all known platforms.

For example, on a 64-bit machine where sizeof(vtable pointer) is 8, sizeof(UnicodeString) = 64 would leave space for (64 - sizeof(vtable pointer) - 2) / U_SIZEOF_UCHAR = (64 - 8 - 2) / 2 = 27 char16_ts stored inside the object.

The minimum object size on a 64-bit machine would be 4 * sizeof(pointer) = 4 * 8 = 32 bytes, and the internal buffer would hold up to 11 char16_ts in that case.

See also
U16_MAX_LENGTH
Stable
ICU 56

Definition at line 208 of file unistr.h.

◆ US_INV

#define US_INV   icu::UnicodeString::kInvariant

Constant to be used in the UnicodeString(char *, int32_t, EInvariant) constructor which constructs a Unicode string from an invariant-character char * string.

About invariant characters see utypes.h. This constructor has no runtime dependency on conversion code and is therefore recommended over ones taking a charset name string (where the empty string "" indicates invariant-character conversion).

Stable
ICU 3.2

Definition at line 98 of file unistr.h.

Typedef Documentation

◆ UStringCaseMapper

typedef int32_t UStringCaseMapper(int32_t caseLocale, uint32_t options, icu::BreakIterator *iter, char16_t *dest, int32_t destCapacity, const char16_t *src, int32_t srcLength, icu::Edits *edits, UErrorCode &errorCode)

Internal string case mapping function type.

All error checking must be done. src and dest must not overlap.

Internal
Do not use. This API is for internal use only.

Definition at line 71 of file unistr.h.