🖥️Coding and Computation
Writing code using version control
You must use version control when you code ☺️
Graphical Interface
Version control is much easier with a graphical interface than in the command-line. This is the place to start—you can do everything you need, and visually see the whole history of your code base.
I recommend GitKraken, which provides free educational accounts.
You can access for free: https://help.gitkraken.com/gitkraken-client/gitkraken-edu-pack/#enabling-pro-for-students
The free (for everyone) SourceTree is also really good.
An alternative is Github Desktop.
Command-line git
On a server, such as for a compute cluster, you must run git in the command-line.
Here is a quick introduction article to version control using Git.
Some nice tutorials for this have been written by Atlassian.
Basics
Commit regularly
Focus on text files, like code, where changes can be stored efficiently. Avoid binary files (datasets, image files, etc.) that will bloat your repository. (Use the
.gitignorefile).
GitHub
If you're up and using git locally, GitHub is a free (for education) service that allows you to host your code. Then you can share what you've done so others can help you develop it through collaboration, or can exactly reproduce the steps of your analysis as outlined in your code.
Make an educational account on GitHub, which provides you with unlimited public and private repositories.
The first time you set up git, you should generate an ssh key to bypass having to type your username and password with every push to GitHub. Instructions are here. This means that you should use the ssh (not http) version of the Github repository reference:
git clone git@github.com:benfulcher/TestGitRepository.git
Workflow
Whenever you start a project that involves coding, be sure to first initialize a repository on Github, clone it to your local machine (as the ssh not https version), and then regularly commit snapshots of your code.
Whenever you produce an output figure or statistic, you should have a clear way to map back to the state of your code when you produced this (which can be done by committing the code at that point).
Code to reproduce every analysis and figure in your paper should be easy for others to access and use. A good way to do this is to describe all your analyses in a markdown file, like the online-rendered README.md file, where you work through the exact steps to reproduce every analysis in your paper, and embed the outputs.
When you're happy with your code, you should version it with a DOI to keep a preserved snapshot that cannot be modified/deleted. It's actually really easy to do this, by linking Zenodo to your Github (and you get a nice DOI logo on your repo!): instructions here.
If you're really into it, building a Docker container for your project allows full reproduction into the future, regardless of changes to operating systems or software packages. See Aria's notes and this Simple Rules paper.
Coding better
Some helpful coding tips (mostly Matlab) are on this medium blog.
And >20h of tutorials, including version control by Jeremy Howard, on YouTube.
VS Code is a full featured and efficient integrated text editor for coding. You should use the jupyter support.
Terminals
On Mac, iTerm2 is free and far superior to the in-built terminal app.
🔥 If you are fancy, you can change to using the fish shell, which has lots of cool features.
🔥 If you want to set up your Mac with lots of cool terminal superpowers, look no further.
Working with a cluster
As well on your local computer, you should take the time to set up an ssh-key for Github on the server, to avoid typing in your password each time.
You should set up an ssh key for each server you use regularly, to avoid typing in your password every time you connect.
To avoid typing in the full username and servername every time you connect to your server, you set up an ssh config file.
There are also fancy ways of running code (like in Julia) on the cluster via VS Code. E.g., if you set up a VSCode server on a compute node, then you can use the
remote-dev-over-tunnelto connect from a local VSCode instance, or from the browser. Brendan Harris has kindly written up details here.
Last updated
