I have started to use Git and GitHub together with RStudio. Git is a distributed version control system which is very useful when doing reproducible research. It is a good way to handle programming/coding. Moreover, Git (via GitHub) allows groups of people to work on the same documents (often code) at the same time, and without stepping on each other’s toes. RStudio is an excellent integrated development environment built specifically for R.
The learning curve is quite steep; however, afterwards you have an effective version control system working together with your RStudio working environment for a good research workflow.
To make your windows system ready do the following
- Install R and RStudio
- Install Git and setup a GitHub account
- Tell Git your username and email address. These are used to label each commit so it is clear who make each change. In the long run you need to be familiar with using Git from a shell/console/command line. Here you just start-up a shell from RStudio’s menu “Tools->Shell”. In the shell, run:
git config --global user.name "YOUR GITHUB USER NAME" git config --global user.email "YOUR EMAIL ADDRESS"
- In RStudio and set the path to the Git executable in “Tools -> Global Options -> Git/SVN”, e.g. “C:/”Program Files (x86)”/Git/bin/git.exe”
- If you do not want to use https, you may use a SSH key to communicate with GitHub. You can generate a key at “Tools -> Global Options -> Git/SVN” where you click “Create RSA key…”. Copy the public key and add it to GitHub (the easiest way to find the key is to click “View public key” in RStudio’s Git/SVN preferences pane).
- Make Git remember your GitHub username and password, so that every time you, e.g., push, you do NOT have to authenticate yourself interactively:
git config --global credential.helper wincred
You will know you have truly succeeded once you have at least one successful push to GitHub in which you are NOT challenged for your user name and password.
- Restart RStudio and you are ready
Useful Git and GitHub commands and phases
- Repository: A directory or storage space where your projects can live. Sometimes GitHub users shorten this to “repo.” It can be local to a folder on your computer,or it can be a storage space on GitHub or another online host. You can keep code files, text files, image files, you name it, inside a repo. Commands:
git init # create an empty git repo in the current git status # status of your repo
- Commit: Git (locally) has a directory (.git) which you commit your files to and this is your ‘local repo’. When you commit, you are taking a “snapshot” of your repo, giving you a checkpoint to which you can reevaluate or restore your project to any previous state.
git add [file] # add (modified) file [file] to Git’s attention (added to the repo at next commit) git commit -m "Message text" # create a snapshot in your local repo
You can add and commit without a shell in RStudio using the RStudio GUI Git plane (see Hadley Wickham’s chapter for more details).
- Branch: Branches allow you to keep the main code (the ‘master’ branch), make a copy (a new branch) and then work within that new branch. When you’ve finished, you merge the changes made in the branch back in to the master repository.
git branch [branch-name] # create a new branch [branch name] git checkout -b [branch name] # create branch and switch to it git push --set-upstream origin [branch-name] # tell Git that your local branch has a remote equivalent git checkout master # once you are done with a branch switch to master git merge [branch-name] # merge back into master git branch -d [branch-name] # delete the branch locally git push origin --delete [branchName] # delete the branch remote
- Fork: When you fork a repo, you create your own copy of a repository. Forking a repository allows you to freely experiment with changes without affecting the original project. Most commonly, forks are used to either propose changes to someone else’s project or to use someone else’s project as a starting point for your own idea. GitHub also makes it very simple to implement pull requests. Pull requests essentially ask the owner of the original repository to “pull” the changes from your fork of the repo back into the origin. That way, everyone can use source control and have a history of all the changes, including theirs, but not everyone needs write access to the original repo.
- Clone: A clone is simply a copy of a repository. When you clone, you are actually copying the entire source repository, including all the history and branches. You now have a new repository on your machine and any commits you make go into that repository. Nobody will see any changes until you push those commits to another repository (or the original one) or until someone pulls commits from your repository, if it is publicly accessible.
- Pull: If you’re working on your local computer and want the most up-to-date version of your repository to work with, you “pull” the changes down from GitHub with this command. You can use the RStudio GUI Git plane to make a Pull.
- Push: If you’re working on your local computer, and want your commits to be visible online on GitHub as well, you “push” the changes up to GitHub with this command. You can use the RStudio GUI Git plane to make a Push.
- Tag: Git has the ability to tag specific points in history as being important. Typically people use this functionality to mark release points (v1.0, and so on). The easiest way to do this is doing it directly on GitHub. Good notes about giving version tags to your releases can be seen here.
Creating a new R project using Git
There are different ways to do this. If you want GitHub to host you repository then
- Create a new repo on GitHub. Give it the same name as the folder for your project/package, and include the package title as the repo description. Copy the “https clone url” of the GitHub repo (normally https://github.com/user_name/folder_name.git)
- In RStudio create a new project (see the bottom in the upper right corner) using “Version control -> Git” and paste the copied https url. Choose the location of your folder and create the project which now automatically clone your GitHub repo.
If you want only to have a local repo (for the moment) then
- In RStudio create a new project (see the bottom in the upper right corner) using “New Directory” (not “Version control”)
- Initialize the Git repo in the folder (open the Git shell from RStudio):
Restart RStudio and reopen your package. You now have a Git panel in RStudio.
- If you later want to add your local repo to GitHub then create an empty GitHub repo and run from the shell:
git remote add origin https://github.com/user_name/folder_name.git git push -u origin master
The first line tells Git that your local repo has a remote version on GitHub, and calls it “origin”. The second line pushes all your current work to that repo.