If you work in an organization like a newsroom, it might seem the world runs on Dropbox. Which is kind of a shame for a few reasons:
- um, why do I have to query a remote server if I want to share a document with someone sitting right next to me?
- You know they are very aware of the content you store!
- Terms of Service.
So, what are my options if I want to have locally-hosted synced storage? There are a few notable options, like BitTorrent Sync, but those aren’t open enough (~cough~ at all). (Open Source is important. Anyone reading this knows why…) There’s also Sparkleshare, which my employer, the “hacker ethos with a 1099” called The Guardian Project, uses with the help of Tor’s hidden services. Sparkleshare is really great, and I highly recommend it, but I wanted something that was less of a service than a framework: Sparkleshare doesn’t have much room for developers to build on top of it.
Then, I found out about git-annex. It’s a simple piece of software that automatically syncs files across multiple repositories using git. It’s that simple: you add a file, and it’s immediately checked into your repository, and pushed to any remotes. On the receiving end, remotes automatically pull and apply any changes. Very simple.
My gears started turning, and I thought, what if we just tried using git like we use Dropbox? So, I did a proof of concept using two machines: my own laptop as client, and a linux box somewhere else on my network as server. My example server is in python and includes a very lightweight REST API, but you could modify this to suite whatever langauges you want, or even handle the whole thing in bash.
So, without further ado, here’s how to make a Dropbox clone using git-annex.
Step 1: set up your server
1. install dependencies.
We’re going to need requests and tornado packages to handle the REST API.
You’ll notice I have you download git-annex as a pre-built tarball. Although you could totally just
apt-get install git-annex, through trial-and-error, I found this to the the most efficient. Substitute the url for the one that matches your architecture.
2. add git-annex’s path to your PATH in environment variables
(this is usually at
3. init our git-annex remote repository
Now with everything is installed, let’s init our annex.
4. create a git hook
Git hooks are bash scripts that execute whenever a certain event goes off. With git-annex, whenever a file is added to our remote repository on the server, git’s post-receive hook fires. So, let’s use that to automatically sync the files: pulling them and their metadata into our remote.
5. make a little tornado server to handle git hooks.
In the previous step, we wrote a post-receive script that calls out to
localhost:8888/sync. Let’s create that little tornado server to respond to these calls. This is a sample script in python called
api.py that reads the stdout of our sync action and finds out if a new file is added using regular expressions:
6. test it!
…and open localhost:8888/sync/. You should see something like this…
… which means that everything’s working, but since there are no new files, there’s nothing else to do.
Step 2: set up your client
1. install dependencies
Revisit the first step for setting up the server. You should download git-annex on the client machine and set its path in your bash profile.
2. create your ssh key for your annex and link it up to establish trust.
you will be promped for a password: make it a good one. Please substitute
LOCAL_PATH for whatever makes sense for you.
3. init your local repository
Choose a place for your local repository folder. This folder should not exist already! So,
/home/my/computer/ersatz_dropbox must not exist aready (but
Step 3. test it out!
You saw the server in action at the end of step one, but now, try dropping a file into your local ersatz dropbox folder and refresh
localhost:8888/sync/. If all goes well, you should see something like this:
So, there you have it: the first step towards a free, easy, open source dropbox clone. Next steps naturally include:
- sharing files to other users not synced to your repo
- having web access to said files
- running files through pre-processing scripts to do fun things
I’ve done plenty of work of that, too, and stay tuned for the next post!