🖥️Coding and Computation

Writing code using version control

You must use version control when you code ☺️

Graphical Interface

Version control is much easier with a graphical interface than in the command-line. This is the place to start—you can do everything you need, and visually see the whole history of your code base.

Command-line git

On a server, such as for a compute cluster, you must run git in the command-line.

Basics

  • Commit regularly

  • Focus on text files, like code, where changes can be stored efficiently. Avoid binary files (datasets, image files, etc.) that will bloat your repository. (Use the .gitignore file).

GitHub

If you're up and using git locally, GitHub is a free (for education) service that allows you to host your code. Then you can share what you've done so others can help you develop it through collaboration, or can exactly reproduce the steps of your analysis as outlined in your code.

  1. Make an educational accountarrow-up-right on GitHub, which provides you with unlimited public and private repositories.

  2. The first time you set up git, you should generate an ssh key to bypass having to type your username and password with every push to GitHub. Instructions are herearrow-up-right. This means that you should use the ssh (not http) version of the Github repository reference: git clone git@github.com:benfulcher/TestGitRepository.git

Workflow

Whenever you start a project that involves coding, be sure to first initialize a repository on Github, clone it to your local machine (as the ssh not https version), and then regularly commit snapshots of your code.

Whenever you produce an output figure or statistic, you should have a clear way to map back to the state of your code when you produced this (which can be done by committing the code at that point).

Code to reproduce every analysis and figure in your paper should be easy for others to access and use. A good way to do this is to describe all your analyses in a markdown file, like the online-rendered README.md file, where you work through the exact steps to reproduce every analysis in your paper, and embed the outputs.

When you're happy with your code, you should version it with a DOI to keep a preserved snapshot that cannot be modified/deleted. It's actually really easy to do this, by linking Zenodo to your Github (and you get a nice DOI logo on your repo!): instructions herearrow-up-right.

If you're really into it, building a Docker container for your project allows full reproduction into the future, regardless of changes to operating systems or software packages. See Aria's notesarrow-up-right and this Simple Rules paperarrow-up-right.

Coding better

Terminals

Working with a cluster

  • As well on your local computer, you should take the time to set up an ssh-keyarrow-up-right for Github on the server, to avoid typing in your password each time.

  • You should have a nice interactive way of getting data/outputs to/from the server. The best is Transmitarrow-up-right; free alternatives are Cyberduckarrow-up-right, or FileZillaarrow-up-right.

  • You should set up an ssh keyarrow-up-right for each server you use regularly, to avoid typing in your password every time you connect.

  • To avoid typing in the full username and servername every time you connect to your server, you set up an ssh config filearrow-up-right.

  • There are also fancy ways of running code (like in Julia) on the cluster via VS Code. E.g., if you set up a VSCode server on a compute node, then you can use the remote-dev-over-tunnel to connect from a local VSCode instance, or from the browser. Brendan Harris has kindly written up details herearrow-up-right.

Last updated