Recent comments posted to this site:
I have found one way to graft in the S3 bucket. And that involves performing git-annex initremote cloud type=S3 , which unavoidably creates a new dummybucket (can use bucket=dummy to identify it). Then performing git-annex enableremote cloud bucket=cloud- to utilise the original bucket without having to copy/move over all the files.
I did try it in one shot with git-annex initremote cloud type=S3 bucket=cloud- , but unfortunately it fails because the creation of the bucket step appears mandatory, and the S3 api errors out with an "already created bucket" type of error.
However, if there is a general guidance somewhere for... I guess importing/exporting the special remote metadata (including stored encryption keys), that would be very much appreciated.
Sorry, I should just clarify. Trying to do this via sync from the old, non-tuned git-annex repo fails with:
git-annex: Remote repository is tuned in incompatible way; cannot be merged with local repository.
Which I understand for the wider branch data implications... but I don't know enough to understand why just the special remote data can't be merge in.
Naively, I put myself in a position where my rather large, untuned git-annex had to be recovered due to not appreciating the effect of case-insensitive filesystems.
Specifically, NTFS-3G is deadly in this case. Because, whilst Windows has advanced, and with WSL added the ability to add case-sensitivity on a folder, which is also inheritable to folders under it... NTFS-3G does not do this.
So beware if you try to work in an "interoperable" way. NTFS-3G will do mixed case, but will create child folders that are not case-sensitive.
To that end, I want to migrate this rather large git-annex to be tuned to annex.tune.objecthashlower. I already have a good strategy around this. I'll just create a completely new stream of git-annex'es originating from a newly formed one. I will also be able to create new type=directory special remotes for my "tape-out" existing git-annex. I will just use git annex fsck --fast --from $remote to rebuild the location data for it.
I've also tested this with an S3 git-annex as a proof-of-concept. So in the new git-annex, I ran git-annex initremote cloud type=S3... to create a new bucket, copied over a file from the old bucket, and rebuilt the location data for that file.
But I really really would like to be able to avoid creating a new bucket. I am happy to lose the file presence/location data for the old bucket, but I'd like to graft back in, or initremote the cloud bucket with matching parameters. So too I guess, with an encrypted special remote, ie. import over the encryption keys, etc.
Are there "plumbing" commands that can do this? Or does it require knowing about the low-level storage of this metadata to achieve it, which seems to just send me back to the earlier comment of using a filter-branch... which I am hoping to avoid (because of all the potential pit-falls)
androiddirectory=/
, then git annex wanted thephone 'include=/storage/self/primary/DCIM and include=/storage/33A8-601A/DCIM'
(I guess a trailing /*
would have been necessary in addition), but that gives a gigantic amount of find: [...] read/permission denied
errors upon import (I canceled it) and I guess as it traverses the entire file tree, it is very inefficient.
There's not currently a way to do that without some scripting to get
the keys, and then git-annex whereis --key
.
I think this idea is worth doing something about, so I made this todo: wherewas.
Repo that contains the latest/current version of a file is not accessible. Can git annex whereis find the last available version of a file in other repos (or a specific repo)?
I can looping through commit log and running whereis for each commit until an earlier version of a file is found, but perhaps there is a better way to do it with a single command?
Hi,
I'm wondering whether there an any easy way to delay "progress reporting" (a.k.a. "report progress for ALL transfer_store
operations ONCE", a.k.a. "bulk transfer") for a special remote?
What I'm trying to achieve: there is an archiver called dar, which I would like to implement a special remote for. It can write many files into a single archive and also supports incremental/differential backups. A one can create an archive with this utility, by providing a list of files or directories as params.
The problem with the current git annex special remote API is that it does not allows to report transfer progress for ALL key/files for a special remote (e.g. with transfer_store
), and then check the progress at ONCE for ALL files at the end of the process.
Ideally, the protocol should have some kind of "write test" command to check the written archive for errors, and only then report the progress as "successful".
What I was thinking of is to just write all files into a temporarily list during transfer_store
, and then externally archive this list of files after git annex copy --to dar-remote
is done. But seems like git annex will think that the process of writing files to that remote was successful, while it may not (e.g. file access error happened, or an archive was corrupted, etc).
How can it be achieved? Do we need to extend git annex with another protocol extension? How difficult it may be, and where to start? I suppose there is no way Joey or anyone else will work on it any time soon if there is no workaround, and I have to submit a patch?
P.S.: I've seen async extensions but it seems like it's tied to a threads, which most likely won't allow to achieve the described goals.