Class RopeByteString
- All Implemented Interfaces:
Serializable,Iterable<Byte>
ByteStrings formed by concatenation of other ByteStrings, without
copying the data in the pieces. The concatenation is represented as a tree whose leaf nodes are
each a ByteString.LeafByteString.
Most of the operation here is inspired by the now-famous paper BAP95 Ropes: an Alternative to Strings hans-j. boehm, russ atkinson and michael plass
The algorithms described in the paper have been implemented for character strings in
com.google.common.string.Rope and in the c++ class cord.cc.
Fundamentally the Rope algorithm represents the collection of pieces as a binary tree. BAP95 uses a Fibonacci bound relating depth to a minimum sequence length, sequences that are too short relative to their depth cause a tree rebalance. More precisely, a tree of depth d is "balanced" in the terminology of BAP95 if its length is at least F(d+2), where F(n) is the n-th Fibonacci number. Thus for depths 0, 1, 2, 3, 4, 5,... we have minimum lengths 1, 2, 3, 5, 8, 13,...
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static classThis class implements the balancing algorithm of BAP95.private static final classThis class is a continuable tree traversal, which keeps the state information which would exist on the stack in a recursive traversal instead on a stack of "Bread Crumbs".private classThis class is theRopeByteStringequivalent forByteArrayInputStream.Nested classes/interfaces inherited from class com.google.protobuf.ByteString
ByteString.AbstractByteIterator, ByteString.ByteIterator, ByteString.CodedBuilder, ByteString.LeafByteString, ByteString.Output -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ByteStringprivate final int(package private) static final int[]BAP95.private final ByteStringprivate static final longprivate final intprivate final intFields inherited from class com.google.protobuf.ByteString
CONCATENATE_BY_COPY_SIZE, EMPTY, MAX_READ_FROM_CHUNK_SIZE, MIN_READ_FROM_CHUNK_SIZE -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivateRopeByteString(ByteString left, ByteString right) Create a new RopeByteString, which can be thought of as a new tree node, by recording references to the two given strings. -
Method Summary
Modifier and TypeMethodDescriptionConstructs a read-onlyjava.nio.ByteBufferwhose content is equal to the contents of this byte string.Constructs a list of read-onlyjava.nio.ByteBufferobjects such that the concatenation of their contents is equal to the contents of this byte string.bytebyteAt(int index) Gets the byte at the given index.(package private) static ByteStringconcatenate(ByteString left, ByteString right) Concatenate the given strings while performing various optimizations to slow the growth rate of tree depth and tree node count.private static ByteStringconcatenateBytes(ByteString left, ByteString right) Concatenates two strings by copying data values.voidcopyTo(ByteBuffer target) Copies bytes into a ByteBuffer.protected voidcopyToInternal(byte[] target, int sourceOffset, int targetOffset, int numberToCopy) Internal (package private) implementation ofByteString.copyTo(byte[],int,int,int).booleanprivate booleanequalsFragments(ByteString other) Determines if this string is equal to another of the same length by iterating over the leaf nodes.protected intReturn the depth of the tree representing thisByteString, if any, whose root is this node.(package private) byteinternalByteAt(int index) Gets the byte at the given index, assumes bounds checking has already been performed.protected booleanDetermines if the tree is balanced according to BAP95, which means the tree is flat-enough with respect to the bounds.booleanTells whether thisByteStringrepresents a well-formed UTF-8 byte sequence, such that the original bytes can be converted to a String object and then round tripped back to bytes without loss.iterator()Return aByteString.ByteIteratorover the bytes in the ByteString.(package private) static intminLength(int depth) Returns the minimum length for which a tree of the given depth is considered balanced according to BAP95, which means the tree is flat-enough with respect to the bounds.Creates aCodedInputStreamwhich can be used to read the bytes.newInput()Creates anInputStreamwhich can be used to read the bytes.(package private) static RopeByteStringnewInstanceForTest(ByteString left, ByteString right) Create a new RopeByteString for testing only while bypassing all the defenses ofconcatenate(ByteString, ByteString).protected intpartialHash(int h, int offset, int length) Compute the hash across the value bytes starting with the given hash, and return the result.protected intpartialIsValidUtf8(int state, int offset, int length) Tells whether the given byte sequence is a well-formed, malformed, or incomplete UTF-8 byte sequence.private voidintsize()Gets the number of bytes.substring(int beginIndex, int endIndex) Takes a substring of this one.protected StringtoStringInternal(Charset charset) Constructs a newStringby decoding the bytes using the specified charset.(package private) Object(package private) voidwriteTo(ByteOutput output) Writes thisByteStringto the providedByteOutput.voidwriteTo(OutputStream outputStream) Writes a copy of the contents of this byte string to the specified output stream argument.(package private) voidwriteToInternal(OutputStream out, int sourceOffset, int numberToWrite) Internal version ofByteString.writeTo(OutputStream,int,int)that assumes all error checking has already been done.(package private) voidwriteToReverse(ByteOutput output) This method behaves exactly the same asByteString.writeTo(ByteOutput)unless theByteStringis a rope.Methods inherited from class com.google.protobuf.ByteString
checkIndex, checkRange, concat, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFrom, copyFromUtf8, copyTo, copyTo, empty, endsWith, fromHex, hashCode, isEmpty, newCodedBuilder, newOutput, newOutput, peekCachedHashCode, readFrom, readFrom, readFrom, startsWith, substring, toByteArray, toString, toString, toString, toStringUtf8, unsignedLexicographicalComparator, wrap, wrap, wrap, writeToMethods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
minLengthByDepth
static final int[] minLengthByDepthBAP95. Let Fn be the nth Fibonacci number. ARopeByteStringof depth n is "balanced", i.e flat enough, if its length is at least Fn+2, e.g. a "balanced"RopeByteStringof depth 1 must have length at least 2, of depth 4 must have length >= 8, etc.There's nothing special about using the Fibonacci numbers for this, but they are a reasonable sequence for encapsulating the idea that we are OK with longer strings being encoded in deeper binary trees.
For 32-bit integers, this array has length 46.
The correctness of this constant array is validated in tests.
-
totalLength
private final int totalLength -
left
-
right
-
leftLength
private final int leftLength -
treeDepth
private final int treeDepth -
serialVersionUID
private static final long serialVersionUID- See Also:
-
-
Constructor Details
-
RopeByteString
Create a new RopeByteString, which can be thought of as a new tree node, by recording references to the two given strings.- Parameters:
left- string on the left of this node, should havesize() > 0right- string on the right of this node, should havesize() > 0
-
-
Method Details
-
concatenate
Concatenate the given strings while performing various optimizations to slow the growth rate of tree depth and tree node count. The result is either aByteString.LeafByteStringor aRopeByteStringdepending on which optimizations, if any, were applied.Small pieces of length less than
ByteString.CONCATENATE_BY_COPY_SIZEmay be copied by value here, as in BAP95. Large pieces are referenced without copy.- Parameters:
left- string on the leftright- string on the right- Returns:
- concatenation representing the same sequence as the given strings
-
concatenateBytes
Concatenates two strings by copying data values. This is called in a few cases in order to reduce the growth of the number of tree nodes.- Parameters:
left- string on the leftright- string on the right- Returns:
- string formed by copying data bytes
-
newInstanceForTest
Create a new RopeByteString for testing only while bypassing all the defenses ofconcatenate(ByteString, ByteString). This allows testing trees of specific structure. We are also able to insert empty leaves, though these are dis-allowed, so that we can make sure the implementation can withstand their presence.- Parameters:
left- string on the left of this noderight- string on the right of this node- Returns:
- an unsafe instance for testing only
-
minLength
static int minLength(int depth) Returns the minimum length for which a tree of the given depth is considered balanced according to BAP95, which means the tree is flat-enough with respect to the bounds. Defaults toInteger.MAX_VALUEifdepth >= minLengthByDepth.lengthin order to avoid anArrayIndexOutOfBoundsException.- Parameters:
depth- tree depth- Returns:
- minimum balanced length
-
byteAt
public byte byteAt(int index) Gets the byte at the given index. ThrowsArrayIndexOutOfBoundsExceptionfor backwards-compatibility reasons although it would more properly beIndexOutOfBoundsException.- Specified by:
byteAtin classByteString- Parameters:
index- index of byte- Returns:
- the value
- Throws:
ArrayIndexOutOfBoundsException-indexis < 0 or >= size
-
internalByteAt
byte internalByteAt(int index) Description copied from class:ByteStringGets the byte at the given index, assumes bounds checking has already been performed.- Specified by:
internalByteAtin classByteString- Parameters:
index- index of byte- Returns:
- the value
-
size
public int size()Description copied from class:ByteStringGets the number of bytes.- Specified by:
sizein classByteString- Returns:
- size in bytes
-
iterator
Description copied from class:ByteStringReturn aByteString.ByteIteratorover the bytes in the ByteString. To avoid auto-boxing, you may get the iterator manually and callByteString.ByteIterator.nextByte().- Specified by:
iteratorin interfaceIterable<Byte>- Overrides:
iteratorin classByteString- Returns:
- the iterator
-
getTreeDepth
protected int getTreeDepth()Description copied from class:ByteStringReturn the depth of the tree representing thisByteString, if any, whose root is this node. If this is a leaf node, return 0.- Specified by:
getTreeDepthin classByteString- Returns:
- tree depth or zero
-
isBalanced
protected boolean isBalanced()Determines if the tree is balanced according to BAP95, which means the tree is flat-enough with respect to the bounds. Note that this definition of balanced is one where sub-trees of balanced trees are not necessarily balanced.- Specified by:
isBalancedin classByteString- Returns:
- true if the tree is balanced
-
substring
Takes a substring of this one. This involves recursive descent along the left and right edges of the substring, and referencing any wholly contained segments in between. Any leaf nodes entirely uninvolved in the substring will not be referenced by the substring.Substrings of
length < 2should result in at most a single recursive call chain, terminating at a leaf node. Thus the result will be aByteString.LeafByteString.- Specified by:
substringin classByteString- Parameters:
beginIndex- start at this indexendIndex- the last character is the one before this index- Returns:
- substring leaf node or tree
-
copyToInternal
protected void copyToInternal(byte[] target, int sourceOffset, int targetOffset, int numberToCopy) Description copied from class:ByteStringInternal (package private) implementation ofByteString.copyTo(byte[],int,int,int). It assumes that all error checking has already been performed and thatnumberToCopy > 0.- Specified by:
copyToInternalin classByteString
-
copyTo
Description copied from class:ByteStringCopies bytes into a ByteBuffer.To copy a subset of bytes, you call this method on the return value of
ByteString.substring(int, int). Example:byteString.substring(start, end).copyTo(target)- Specified by:
copyToin classByteString- Parameters:
target- ByteBuffer to copy into.
-
asReadOnlyByteBuffer
Description copied from class:ByteStringConstructs a read-onlyjava.nio.ByteBufferwhose content is equal to the contents of this byte string. The result uses the same backing array as the byte string, if possible.- Specified by:
asReadOnlyByteBufferin classByteString- Returns:
- wrapped bytes
-
asReadOnlyByteBufferList
Description copied from class:ByteStringConstructs a list of read-onlyjava.nio.ByteBufferobjects such that the concatenation of their contents is equal to the contents of this byte string. The result uses the same backing arrays as the byte string.By returning a list, implementations of this method may be able to avoid copying even when there are multiple backing arrays.
- Specified by:
asReadOnlyByteBufferListin classByteString- Returns:
- a list of wrapped bytes
-
writeTo
Description copied from class:ByteStringWrites a copy of the contents of this byte string to the specified output stream argument.- Specified by:
writeToin classByteString- Parameters:
outputStream- the output stream to which to write the data.- Throws:
IOException- if an I/O error occurs.
-
writeToInternal
Description copied from class:ByteStringInternal version ofByteString.writeTo(OutputStream,int,int)that assumes all error checking has already been done.- Specified by:
writeToInternalin classByteString- Throws:
IOException
-
writeTo
Description copied from class:ByteStringWrites thisByteStringto the providedByteOutput. Calling this method may result in multiple operations on the targetByteOutput.This method may expose internal backing buffers of the
ByteStringto theByteOutputin order to avoid additional copying overhead. It would be possible for a maliciousByteOutputto corrupt theByteString. Use with caution!- Specified by:
writeToin classByteString- Parameters:
output- the output target to receive the bytes- Throws:
IOException- if an I/O error occurs- See Also:
-
writeToReverse
Description copied from class:ByteStringThis method behaves exactly the same asByteString.writeTo(ByteOutput)unless theByteStringis a rope. For ropes, the leaf nodes are written in reverse order to thebyteOutput.- Specified by:
writeToReversein classByteString- Parameters:
output- the output target to receive the bytes- Throws:
IOException- if an I/O error occurs- See Also:
-
UnsafeByteOperations#unsafeWriteToReverse(ByteString, ByteOutput)
-
toStringInternal
Description copied from class:ByteStringConstructs a newStringby decoding the bytes using the specified charset.- Specified by:
toStringInternalin classByteString- Parameters:
charset- encode using this charset- Returns:
- new string
-
isValidUtf8
public boolean isValidUtf8()Description copied from class:ByteStringTells whether thisByteStringrepresents a well-formed UTF-8 byte sequence, such that the original bytes can be converted to a String object and then round tripped back to bytes without loss.More precisely, returns
truewhenever:Arrays.equals(byteString.toByteArray(), new String(byteString.toByteArray(), "UTF-8").getBytes("UTF-8"))This method returns
falsefor "overlong" byte sequences, as well as for 3-byte sequences that would map to a surrogate character, in accordance with the restricted definition of UTF-8 introduced in Unicode 3.1. Note that the UTF-8 decoder included in Oracle's JDK has been modified to also reject "overlong" byte sequences, but (as of 2011) still accepts 3-byte surrogate character byte sequences.See the Unicode Standard,
Table 3-6. UTF-8 Bit Distribution,
Table 3-7. Well Formed UTF-8 Byte Sequences.- Specified by:
isValidUtf8in classByteString- Returns:
- whether the bytes in this
ByteStringare a well-formed UTF-8 byte sequence
-
partialIsValidUtf8
protected int partialIsValidUtf8(int state, int offset, int length) Description copied from class:ByteStringTells whether the given byte sequence is a well-formed, malformed, or incomplete UTF-8 byte sequence. This method accepts and returns a partial state result, allowing the bytes for a complete UTF-8 byte sequence to be composed from multipleByteStringsegments.- Specified by:
partialIsValidUtf8in classByteString- Parameters:
state- either0(if this is the initial decoding operation) or the value returned from a call to a partial decoding method for the previous bytesoffset- offset of the first byte to checklength- number of bytes to check- Returns:
-1if the partial byte sequence is definitely malformed,0if it is well-formed (no additional input needed), or, if the byte sequence is "incomplete", i.e. apparently terminated in the middle of a character, an opaque integer "state" value containing enough information to decode the character when passed to a subsequent invocation of a partial decoding method.
-
equals
- Specified by:
equalsin classByteString
-
equalsFragments
Determines if this string is equal to another of the same length by iterating over the leaf nodes. On each step of the iteration, the overlapping segments of the leaves are compared.- Parameters:
other- string of the same length as this one- Returns:
- true if the values of this string equals the value of the given one
-
partialHash
protected int partialHash(int h, int offset, int length) Description copied from class:ByteStringCompute the hash across the value bytes starting with the given hash, and return the result. This is used to compute the hash across strings represented as a set of pieces by allowing the hash computation to be continued from piece to piece.- Specified by:
partialHashin classByteString- Parameters:
h- starting hash valueoffset- offset into this value to start looking at data valueslength- number of data values to include in the hash computation- Returns:
- ending hash value
-
newCodedInput
Description copied from class:ByteStringCreates aCodedInputStreamwhich can be used to read the bytes. Using this is often more efficient than creating aCodedInputStreamthat wraps the result ofByteString.newInput().- Specified by:
newCodedInputin classByteString- Returns:
- stream based on wrapped data
-
newInput
Description copied from class:ByteStringCreates anInputStreamwhich can be used to read the bytes.The
InputStreamreturned by this method is guaranteed to be completely non-blocking. The methodInputStream.available()returns the number of bytes remaining in the stream. The methodsInputStream.read(byte[]),InputStream.read(byte[],int,int)andInputStream.skip(long)will read/skip as many bytes as are available. The methodInputStream.markSupported()returnstrue.The methods in the returned
InputStreammight not be thread safe.- Specified by:
newInputin classByteString- Returns:
- an input stream that returns the bytes of this byte string.
-
writeReplace
Object writeReplace() -
readObject
- Throws:
IOException
-