Class Utf8.Processor
- java.lang.Object
-
- com.google.protobuf.Utf8.Processor
-
- Direct Known Subclasses:
Utf8.SafeProcessor
,Utf8.UnsafeProcessor
- Enclosing class:
- Utf8
abstract static class Utf8.Processor extends java.lang.Object
A processor of UTF-8 strings, providing methods for checking validity and encoding.
-
-
Constructor Summary
Constructors Constructor Description Processor()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description (package private) abstract java.lang.String
decodeUtf8(byte[] bytes, int index, int size)
Decodes the given byte array slice into aString
.(package private) java.lang.String
decodeUtf8(java.nio.ByteBuffer buffer, int index, int size)
Decodes the given portion of theByteBuffer
into aString
.(package private) java.lang.String
decodeUtf8Default(java.nio.ByteBuffer buffer, int index, int size)
DecodesByteBuffer
instances using theByteBuffer
API rather than potentially faster approaches.(package private) abstract java.lang.String
decodeUtf8Direct(java.nio.ByteBuffer buffer, int index, int size)
Decodes directByteBuffer
instances intoString
.(package private) abstract int
encodeUtf8(java.lang.CharSequence in, byte[] out, int offset, int length)
Encodes an input character sequence (in
) to UTF-8 in the target array (out
).(package private) void
encodeUtf8(java.lang.CharSequence in, java.nio.ByteBuffer out)
Encodes an input character sequence (in
) to UTF-8 in the target buffer (out
).(package private) void
encodeUtf8Default(java.lang.CharSequence in, java.nio.ByteBuffer out)
Encodes the input character sequence to aByteBuffer
instance using theByteBuffer
API, rather than potentially faster approaches.(package private) abstract void
encodeUtf8Direct(java.lang.CharSequence in, java.nio.ByteBuffer out)
Encodes the input character sequence to a directByteBuffer
instance.(package private) boolean
isValidUtf8(byte[] bytes, int index, int limit)
Returnstrue
if the given byte array slice is a well-formed UTF-8 byte sequence.(package private) boolean
isValidUtf8(java.nio.ByteBuffer buffer, int index, int limit)
Returnstrue
if the given portion of theByteBuffer
is a well-formed UTF-8 byte sequence.(package private) abstract int
partialIsValidUtf8(int state, byte[] bytes, int index, int limit)
Tells whether the given byte array slice is a well-formed, malformed, or incomplete UTF-8 byte sequence.(package private) int
partialIsValidUtf8(int state, java.nio.ByteBuffer buffer, int index, int limit)
Indicates whether or not the given buffer contains a valid UTF-8 string.private static int
partialIsValidUtf8(java.nio.ByteBuffer buffer, int index, int limit)
Performs validation forByteBuffer
instances using theByteBuffer
API rather than potentially faster approaches.(package private) int
partialIsValidUtf8Default(int state, java.nio.ByteBuffer buffer, int index, int limit)
Performs validation forByteBuffer
instances using theByteBuffer
API rather than potentially faster approaches.(package private) abstract int
partialIsValidUtf8Direct(int state, java.nio.ByteBuffer buffer, int index, int limit)
Performs validation for directByteBuffer
instances.
-
-
-
Method Detail
-
isValidUtf8
final boolean isValidUtf8(byte[] bytes, int index, int limit)
Returnstrue
if the given byte array slice is a well-formed UTF-8 byte sequence. The range of bytes to be checked extends from indexindex
, inclusive, tolimit
, exclusive.This is a convenience method, equivalent to
partialIsValidUtf8(bytes, index, limit) == Utf8.COMPLETE
.
-
partialIsValidUtf8
abstract int partialIsValidUtf8(int state, byte[] bytes, int index, int limit)
Tells whether the given byte array slice is a well-formed, malformed, or incomplete UTF-8 byte sequence. The range of bytes to be checked extends from indexindex
, inclusive, tolimit
, exclusive.- Parameters:
state
- eitherUtf8.COMPLETE
(if this is the initial decoding operation) or the value returned from a call to a partial decoding method for the previous bytes- Returns:
Utf8.MALFORMED
if the partial byte sequence is definitely not well-formed,Utf8.COMPLETE
if it is well-formed (no additional input needed), or if the byte sequence is "incomplete", i.e. apparently terminated in the middle of a character, an opaque integer "state" value containing enough information to decode the character when passed to a subsequent invocation of a partial decoding method.
-
isValidUtf8
final boolean isValidUtf8(java.nio.ByteBuffer buffer, int index, int limit)
Returnstrue
if the given portion of theByteBuffer
is a well-formed UTF-8 byte sequence. The range of bytes to be checked extends from indexindex
, inclusive, tolimit
, exclusive.This is a convenience method, equivalent to
partialIsValidUtf8(bytes, index, limit) == Utf8.COMPLETE
.
-
partialIsValidUtf8
final int partialIsValidUtf8(int state, java.nio.ByteBuffer buffer, int index, int limit)
Indicates whether or not the given buffer contains a valid UTF-8 string.- Parameters:
buffer
- the buffer to check.- Returns:
true
if the given buffer contains a valid UTF-8 string.
-
partialIsValidUtf8Direct
abstract int partialIsValidUtf8Direct(int state, java.nio.ByteBuffer buffer, int index, int limit)
Performs validation for directByteBuffer
instances.
-
partialIsValidUtf8Default
final int partialIsValidUtf8Default(int state, java.nio.ByteBuffer buffer, int index, int limit)
Performs validation forByteBuffer
instances using theByteBuffer
API rather than potentially faster approaches. This first completes validation for the current character (provided bystate
) and then finishes validation for the sequence.
-
partialIsValidUtf8
private static int partialIsValidUtf8(java.nio.ByteBuffer buffer, int index, int limit)
Performs validation forByteBuffer
instances using theByteBuffer
API rather than potentially faster approaches.
-
decodeUtf8
abstract java.lang.String decodeUtf8(byte[] bytes, int index, int size) throws InvalidProtocolBufferException
Decodes the given byte array slice into aString
.- Throws:
InvalidProtocolBufferException
- if the byte array slice is not valid UTF-8.
-
decodeUtf8
final java.lang.String decodeUtf8(java.nio.ByteBuffer buffer, int index, int size) throws InvalidProtocolBufferException
Decodes the given portion of theByteBuffer
into aString
.- Throws:
InvalidProtocolBufferException
- if the portion of the buffer is not valid UTF-8.
-
decodeUtf8Direct
abstract java.lang.String decodeUtf8Direct(java.nio.ByteBuffer buffer, int index, int size) throws InvalidProtocolBufferException
Decodes directByteBuffer
instances intoString
.- Throws:
InvalidProtocolBufferException
-
decodeUtf8Default
final java.lang.String decodeUtf8Default(java.nio.ByteBuffer buffer, int index, int size) throws InvalidProtocolBufferException
DecodesByteBuffer
instances using theByteBuffer
API rather than potentially faster approaches.- Throws:
InvalidProtocolBufferException
-
encodeUtf8
abstract int encodeUtf8(java.lang.CharSequence in, byte[] out, int offset, int length)
Encodes an input character sequence (in
) to UTF-8 in the target array (out
). For a string, this method is similar to
but is more efficient in both time and space. One key difference is that this method requires paired surrogates, and therefore does not support chunking. Whilebyte[] a = string.getBytes(UTF_8); System.arraycopy(a, 0, bytes, offset, a.length); return offset + a.length;
String.getBytes(UTF_8)
replaces unpaired surrogates with the default replacement character, this method throwsUtf8.UnpairedSurrogateException
.To ensure sufficient space in the output buffer, either call
Utf8.encodedLength(java.lang.CharSequence)
to compute the exact amount needed, or leave room forUtf8.MAX_BYTES_PER_CHAR * sequence.length()
, which is the largest possible number of bytes that any input can be encoded to.- Parameters:
in
- the input character sequence to be encodedout
- the target arrayoffset
- the starting offset inbytes
to start writing atlength
- the length of thebytes
, starting fromoffset
- Returns:
- the new offset, equivalent to
offset + Utf8.encodedLength(sequence)
- Throws:
Utf8.UnpairedSurrogateException
- ifsequence
contains ill-formed UTF-16 (unpaired surrogates)java.lang.ArrayIndexOutOfBoundsException
- ifsequence
encoded in UTF-8 is longer thanbytes.length - offset
-
encodeUtf8
final void encodeUtf8(java.lang.CharSequence in, java.nio.ByteBuffer out)
Encodes an input character sequence (in
) to UTF-8 in the target buffer (out
). Upon returning from this method, theout
position will point to the position after the last encoded byte. This method requires paired surrogates, and therefore does not support chunking.To ensure sufficient space in the output buffer, either call
Utf8.encodedLength(java.lang.CharSequence)
to compute the exact amount needed, or leave room forUtf8.MAX_BYTES_PER_CHAR * in.length()
, which is the largest possible number of bytes that any input can be encoded to.- Parameters:
in
- the source character sequence to be encodedout
- the target buffer- Throws:
Utf8.UnpairedSurrogateException
- ifin
contains ill-formed UTF-16 (unpaired surrogates)java.lang.ArrayIndexOutOfBoundsException
- ifin
encoded in UTF-8 is longer thanout.remaining()
-
encodeUtf8Direct
abstract void encodeUtf8Direct(java.lang.CharSequence in, java.nio.ByteBuffer out)
Encodes the input character sequence to a directByteBuffer
instance.
-
encodeUtf8Default
final void encodeUtf8Default(java.lang.CharSequence in, java.nio.ByteBuffer out)
Encodes the input character sequence to aByteBuffer
instance using theByteBuffer
API, rather than potentially faster approaches.
-
-