Initial push / pull¶
Optimal case¶
(a motivating example of ultimate performance) Assume there is a file with exactly the right data in compressed form. This may be a tarred branch, a bundle, or a blob format. Performance in this case scales with the size of the file.
Disk case¶
Assume current repo format. Attempt to achieve parity with cp -r
. Read
each file only 1 time.
read knit graph for revisions
write filtered copy of revision knit O(d+a)
write filtered copy of knit index O(d)
Open knit index for inventory
Write a filtered copy of inventory knit and simultaneously not all referenced file-ids O(b+d)
Write filtered copy of inventory knit index O(d)
For each referenced file-id:
Open knit index for each file knit O(e)
If acceptable threshold of irrelevant data hard-link O(f)
Otherwise write filtered copy of text knit and simultaneously write the fulltext to tree transform O(h)
Write format markers O(1)
- a:
size of aggregate revision metadata
- b:
size of inventory changes for all revisions
- c:
size of text changes for all files and all revisions (e * g)
- d:
number of relevant revisions
- e:
number of relevant versioned files
- f:
size of the particular versioned file knit index
- g:
size of the filtered versioned file knit
- h:
size of the versioned file fulltext
- i:
size of the largest file fulltext
Smart Network Case¶
Phase 1¶
Push: ask if there is a repository, and if not, what formats are okay Pull: Nothing
Phase 2¶
Push: send initial push command, streaming data in acceptable format, following disk case strategy Pull: receive initial pull command, specifying format
Pull client complexity: O(a), memory cost O(1) Push client complexity: procesing and memory cost same as disk case
Dumb Network Case¶
Pull: same as disk case, but request all file knit indices at once and request al file knits at once. Push: same as disk case, but write all files at once.
Wants¶
Read partial graph
Read multiple segments of multiple files on HTTP and SFTP
Write multiple files over SFTP