/** * Copyright (c) Meta Platforms, Inc. and affiliates. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */
folly/FBVector.h
Simply replacing std::vector
with
folly::fbvector
(after having included the
folly/FBVector.h
header file) will improve the performance
of your C++ code using vectors with common coding patterns. The
improvements are always non-negative, almost always measurable,
frequently significant, sometimes dramatic, and occasionally
spectacular.
::fbvector<int> numbers({0, 1, 2, 3});
folly.reserve(10);
numbersfor (int i = 4; i < 10; i++) {
.push_back(i * 2);
numbers}
assert(numbers[6] == 12);
std::vector
is the stalwart abstraction many use for
dynamically-allocated arrays in C++. It is also the best known and most
used of all containers. It may therefore seem a surprise that
std::vector
leaves important - and sometimes vital -
efficiency opportunities on the table. This document explains how our
own drop-in abstraction fbvector
improves key performance
aspects of std::vector
. Refer to
folly/test/FBVectorTest.cpp for a few benchmarks.
It is well known that std::vector
grows exponentially
(at a constant factor) in order to avoid quadratic growth performance.
The trick is choosing a good factor. Any factor greater than 1 ensures
O(1) amortized append complexity towards infinity. But a factor that’s
too small (say, 1.1) causes frequent vector reallocation, and one that’s
too large (say, 3 or 4) forces the vector to consume much more memory
than needed.
The initial HP implementation by Stepanov used a growth factor of 2;
i.e., whenever you’d push_back
into a vector without there
being room, it would double the current capacity. This was not a good
choice: it can be mathematically proven that a growth factor of 2 is
rigorously the worst possible because it never allows the vector
to reuse any of its previously-allocated memory. Despite other compilers
reducing the growth factor to 1.5, gcc has staunchly maintained its
factor of 2. This makes std::vector
cache- unfriendly and
memory manager unfriendly.
To see why that’s the case, consider a large vector of capacity C. When there’s a request to grow the vector, the vector (assuming no in-place resizing, see the appropriate section in this document) will allocate a chunk of memory next to its current chunk, copy its existing data to the new chunk, and then deallocate the old chunk. So now we have a chunk of size C followed by a chunk of size k * C. Continuing this process we’ll then have a chunk of size k * k * C to the right and so on. That leads to a series of the form (using ^^ for power):
C, C*k, C*k^^2, C*k^^3, ...
If we choose k = 2 we know that every element in the series will be strictly larger than the sum of all previous ones because of the remarkable equality:
1 + 2^^1 + 2^^2 + 2^^3... + 2^^n = 2^^(n+1) - 1
This means that any new chunk requested will be larger than all previously used chunks combined, so the vector must crawl forward in memory; it can’t move back to its deallocated chunks. But any number smaller than 2 guarantees that you’ll at some point be able to reuse the previous chunks. For instance, choosing 1.5 as the factor allows memory reuse after 4 reallocations; 1.45 allows memory reuse after 3 reallocations; and 1.3 allows reuse after only 2 reallocations.
Of course, the above makes a number of simplifying assumptions about
how the memory allocator works, but definitely you don’t want to choose
the theoretically absolute worst growth factor. fbvector
uses a growth factor of 1.5. That does not impede good performance at
small sizes because of the way fbvector
cooperates with
jemalloc (below).
Virtually all modern allocators allocate memory in fixed-size quanta
that are chosen to minimize management overhead while at the same time
offering good coverage at low slack. For example, an allocator may
choose blocks of doubling size (32, 64, 128,
As discussed above, std::vector
also needs to
(re)allocate in quanta. The next quantum is usually defined in terms of
the current size times the infamous growth constant. Because of this
setup, std::vector
has some slack memory at the end much
like an allocated block has some slack memory at the end.
It doesn’t take a rocket surgeon to figure out that an allocator-
aware std::vector
would be a marriage made in heaven: the
vector could directly request blocks of “perfect” size from the
allocator so there would be virtually no slack in the allocator. Also,
the entire growth strategy could be adjusted to work perfectly with
allocator’s own block growth strategy. That’s exactly what
fbvector
does - it automatically detects the use of
jemalloc and adjusts its reallocation strategy accordingly.
But wait, there’s more. Many memory allocators do not support in-
place reallocation, although most of them could. This comes from the now
notorious design of realloc()
to opaquely perform either
in-place reallocation or an allocate-memcpy-deallocate cycle. Such lack
of control subsequently forced all clib-based allocator designs to avoid
in-place reallocation, and that includes C++’s new
and
std::allocator
. This is a major loss of efficiency because
an in-place reallocation, being very cheap, may mean a much less
aggressive growth strategy. In turn that means less slack memory and
faster reallocations.
One particularly sensitive topic about handling C++ values is that
they are all conservatively considered non- relocatable. In
contrast, a relocatable value would preserve its invariant even if its
bits were moved arbitrarily in memory. For example, an
int32_t
is relocatable because moving its 4 bytes would
preserve its actual value, so the address of that value does not
“matter” to its integrity.
C++’s assumption of non-relocatable values hurts everybody for the benefit of a few questionable designs. The issue is that moving a C++ object “by the book” entails (a) creating a new copy from the existing value; (b) destroying the old value. This is quite vexing and violates common sense; consider this hypothetical conversation between Captain Picard and an incredulous alien:
Incredulous Alien: “So, this teleporter, how does it work?”
Picard: “It beams people and arbitrary matter from one place to
another.”
Incredulous Alien: “Hmmm… is it safe?”
Picard: “Yes,
but earlier models were a hassle. They’d clone the person to another
location. Then the teleporting chief would have to shoot the original.
Ask O’Brien, he was an intern during those times. A bloody mess, that’s
what it was.”
Only a tiny minority of objects are genuinely non-relocatable:
class Ew {
char buffer[1024];
char * pointerInsideBuffer;
public:
() : pointerInsideBuffer(buffer) {}
Ew...
}
The first class of designs can always be redone at small or no cost
in efficiency. The second class of objects should not be values in the
first place - they should be allocated with new
and
manipulated using (smart) pointers. It is highly unusual for a value to
have observers that alias pointers to it.
Relocatable objects are of high interest to std::vector
because such knowledge makes insertion into the vector and vector
reallocation considerably faster: instead of going to Picard’s
copy-destroy cycle, relocatable objects can be moved around simply by
using memcpy
or memmove
. This optimization can
yield arbitrarily high wins in efficiency; for example, it transforms
vector< vector<double> >
or
vector< hash_map<int, string> >
from risky
liabilities into highly workable compositions.
In order to allow fast relocation without risk, fbvector
uses a trait folly::IsRelocatable
defined in
"folly/Traits.h"
. By default,
folly::IsRelocatable::value
conservatively yields false. If
you know that your type Widget
is in fact relocatable, go
right after Widget
’s definition and write this:
// at global namespace level
namespace folly {
struct IsRelocatable<Widget> : boost::true_type {};
}
If you don’t do this, fbvector<Widget>
will fail
to compile with a static_assert
.
fbvector
uses a careful implementation all around to
make sure it doesn’t lose efficiency through the cracks. Some future
directions may be in improving raw memory copying (memcpy
is not an intrinsic in gcc and does not work terribly well for large
chunks) and in furthering the collaboration with jemalloc. Have fun!