Derek Powell

Conducting reproducible research with Docker (Part 3 of 3)

2018-02-26T00:00:00-08:00

In the last two entries in this tutorial series I showed you how to use Docker to maintain a reproducible environment for conducting statistical analyses. Conducting reproducible reseach is primarily about scientific honesty, transparency, and the maintenance of high scientific standards. However, the choice to use Docker for reproducible research also has an awesome side-benefit: the ability to run docker containers remotely on cloud servers. In this tutorial I’ll show you how to run your docker containers on a virtual cloud “workstation” using DigitalOcean.

What’s DigitalOcean?

DigitalOcean is a web-hosting company that lets you rent virtual servers called “droplets.” Using droplets or other virtual private servers as remote workstations is a very economical way to access extra computing power. For me, for at least 80% of my workday my laptop is totally sufficient, but sometimes I really need some extra cores or extra ram (mostly for MCMC samping with the excellent but computationally demanding BRMS R package). For those times I can spin up an an 8-core workstation with 16gb of ram whenever I like at a rate of $0.24 per hour, billed to the minute. Because I only pay for the time I actually use the droplet, the extra computing power usually costs me less than my daily coffee. Perhaps that’s not enough for you? As of this writing, droplets scale all the way up to 32 cores and 192gb of ram for $1.43 an hour.

In addition to DigitalOcean, there are a number of other VPS providers that offer similar services. The heaviest hitters are Amazon Web Services and Google Compute Engine. I like DigitalOcean because it offers “dedicated” cpu instances, has a nice web interface, has nice CLI tools, offers great documentation, and is about as cheap as you’ll find anywhere for similar quality.

Running Docker containers on DigitalOcean droplets

In the first part of the tutorial we’ll …

Sign up for DigialOcean
Create SSH keys to make it easy to access our remote instances
Create a DigitalOcean droplet with Docker
Run RStudio on our remote droplet
“Destroy” the droplet so we stop getting billed for it

I suggest you sign up using my DigitalOcean referral link to get a $10 credit for DigitalOcean. That way you can finish this tutorial and try out DigitalOcean for free!

Step 2: Create SSH keys

Per Wikipedia:

Secure Shell (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network. The best known example application is for remote login to computer systems by users.

We’ll use SSH to communicate with the remote droplet on DigitalOcean and to remotely run commands, such as initiating the docker container. In addition, we can use it to create a “tunnel” between a port on our local machine and the remote machine, and can also use it to transfer files back and forth. Using an SSH key will let us do all of this securely without having to enter passwords at every step of the way.

DigitalOcean actually offers a great tutorial for creating and using SSH keys on your account, so I’ll leave the heavy lifting to them. You’ll need to follow at least steps 1 through 4 of the linked tutorial.

Step 3: Create a DigitalOcean droplet with Docker

Now the main event. We’ll create a new DigitalOcean droplet running docker. Sign in to DigitalOcean and choose “create droplet”.

From the “choose an image” menu select “One-click apps”. Then choose “Docker 17.12.0 on 16.04”. This will create a docker droplet running Ubuntu 16.04 with Docker pre-installed.¹

Next, you’ll choose a droplet size. For our purposes let’s choose the 2 vcpu dedicated instance. This will have some oomph to play around with but without costing us too much for the purposes of the tutorial

Then, choose your datacenter region. You can choose whichever you like, though some options are only available in certain regions.

Finally, make sure to “include the SSH key” you created in step 2. Name your droplet however you like and click “Create”.

Step 4: Running RStudio remotely

Once your droplet is created, copy its address to your clipboard by clicking on it. Now switch back over to terminal and run (being sure to use your droplet’s ip address):

ssh root@138.68.6.84

Then type yes at the prompt. This will give you a shell prompt on your remote DigitalOcean server as the root user. Now, you can start your docker container exactly as you would on your local computer.² Run:

docker run -d -p 8787:8787 -e USER=yourName -e PASSWORD=secretPassword -e ROOT=TRUE rocker/tidyverse:3.4.3

Hop on your browser and point it to your droplet’s ip address and port 8787. As I made this tutorial mine was 138.68.1.215:8787. You should be greeted with the RStudio sign-in page.

Do note that using an original password (and possibly username) is much more important now that you’re working on a remote server. Anyone in the world can type in that ip address and port and potentially access your droplet, so you want to ensure there’s real protection.

Now that you’ve got RStudio running remotely, there are a few different ways to get your files onto it. The most direct is to upload them from the files window in the web interface. You can also securely copy them using ssh and the scp command.

My personal preference is to interface with github. I save all my R projects as github repositories, and clone whatever I’m working on to the remote machine. You can do this through command-line, or directly in the rstudio interface: Go to File -> New Project -> Version Control -> Git and enter the repository name. After you enter your username and password, the files will be cloned to the remote machine and you can commit-push when you are done working.

Step 5: Destroying the droplet

Once you’re done working you’ll want to “destroy” the droplet so that you are no longer billed for it. This sounds dramatic but I think it’s so-named to ensure you won’t forget to save your work from the droplet to your local machine or to a repository like github. To destroy the droplet, navigate to its page on the DigitalOcean website and choose “Destroy”.

Creature comforts

Working from within a Docker container offers some great advantages, but it can also have some drawbacks. Because reproducibility demands the container be available to anyone, there’s a limit to the amount of customization that we should build into the container itself. For instance, we should never put any passwords, keys, authentication info, etc. into a Docker container. Here I’ll show how we can add some creature comforts to our RStudio environment within our docker container, without compromising security or preventing others from using it easily.

Setting up git username and password

Using the git and github integration in RStudio server requires telling git how to sign commits. As is, this means running the following commands at the shell every time we create a new docker container:

git config --global user.name "Your Name"
git config --global user.email "yourEmail@gmail.com"

That’s a pain.

We’ll fix this by adding a script to the /init startup directory of our Rocker-based RStudio container that will perform this step for us. Rather than hard-coding our name and email–which could make this difficult for others to use, we’ll pass that info in as an an environment variable.

Here’s the script we’ll create in our docker project folder (the same folder with the Dockerfile) as git_config.sh:

#!/usr/bin/with-contenv bash

GIT_USER=${GIT_USER:=none}
GIT_EMAIL=${GIT_EMAIL:=none}

if [ "$GIT_USER" != none ]; then
	echo -e "[user]\n\tname=$GIT_USER\n\temail=$GIT_EMAIL" > /home/rstudio/.gitconfig
fi

Then, we’ll modify our Dockerfile to add this file to the appropriate startup directory. Here’s how we’d modify the Dockerfile I created in part 2 of this tutorial:

####### Dockerfile #######
FROM rocker/tidyverse:3.4.3

ENV DEBIAN_FRONTEND noninteractive

COPY git_config.sh /etc/cont-init.d/gitconfig

RUN apt-get update -qq && apt-get -y --no-install-recommends install \
	libglu1-mesa-dev \
&& install2.r --error \
    --deps TRUE \
    lme4 \
    car

Copying this script into /etc/cont-init.d sets it to run at startup. The script looks for environment variables GIT_USER and GIT_EMAIL and if they exist it runs the commands for us. When we start the docker container we can pass in that info with -e flags and it will set things up for us.

Changing themes

Personally, I like using the “Solarized Dark” theme in RStudio. Rather than manually changing the themes each time we run the container, we can also make these changes using environment variables.

To do so, create a set_theme.sh script in the docker project directory, with the following content:

#!/usr/bin/with-contenv bash

THEME=${THEME:=none}

if [ "$THEME" != none ]; then
	mkdir -p /home/rstudio/.rstudio/monitored/user-settings
	echo "uiPrefs={\"theme\" : \"$THEME\"}" > \
	/home/rstudio/.rstudio/monitored/user-settings/user-settings
	chown -R rstudio /home/rstudio
fi

Then, just like before we add another line to the dockerfile:

####### Dockerfile #######
FROM rocker/tidyverse:3.4.3

ENV DEBIAN_FRONTEND noninteractive

COPY git_config.sh /etc/cont-init.d/gitconfig
COPY set_theme.sh /etc/cont-init.d/theme

RUN apt-get update -qq && apt-get -y --no-install-recommends install \
	libglu1-mesa-dev \
&& install2.r --error \
    --deps TRUE \
    lme4 \
    car

Putting it all together

When you’ve got your scripts and dockerfile written correctly, add those scripts to the git repo, commit, and push to trigger the automated build. Once the image is ready, we can pass in our preferred defaults as environment variables to the docker run command.

docker run -d -p 8787:8787 -e USER=yourName -e PASSWORD=secretPassword -e ROOT=TRUE -e GIT_USER="gitUsername" -e GIT_EMAIL="yourEmail@gmail.com" -e THEME="Solarized Dark"  rocker/tidyverse:3.4.3

Voilà!

You can extend this general approach to run whatever commands or set whatever settings you like. For more advanced users, here’s more information on the init setup being used by the Rocker images.

Conclusions

One virtue of using Docker containers for reproducible research is that they are complete and yet fully portable. This allows others (including our future selves) to reproduce our work, but with the help of RStudio and RStudio server, it also means we can do that work wherever we want.

Another option here is to choose Container Distributions and coreOs. This is a more minimal linux distribution that also has docker pre-installed. If you choose to go this route you’ll need to login as user “core”, using ssh core@ip.address in the next step. ↩
You might note I’m not mapping a volume into the container. That’s because there isn’t any data or files on this remote server, and instead I plan to do pretty much everything from within the container. If we wanted to scp some files or something, then we would want to do some mapping. ↩

Conducting reproducible research with Docker (Part 2 of 3)

2018-02-14T00:00:00-08:00

This post picks up right where part 1 of this tutorial left off. If you haven’t read that, I strongly recommend you start there.

In part 1 of this series you saw how to get started using Docker for reproducible research. Here, we’ll build a Docker image with our own custom R environment. This will allow us to work in a reproducible environment with all the packages and libraries we need at hand.

The Rocker Project

Carl Boettiger and Dirk Eddelbuettel at The Rocker Project have done the hard work of properly organizing R and RStudio server applications into a well-maintained and versioned docker image. They maintain docker containers that run R studio with sensible security options and some helpful base packages. Their website is also a good resource for help using these images. In part 1, we used their image to run RStudio server in Docker. Now, we’ll build off their work in this tutorial to make our own image with whatever packages we like.

Tutorial

In this tutorial we’ll …

Create a repository on github for our docker image
Create a dockerfile
Setup github and dockerhub integration
Practice testing and installing packages
Build our dockerfile locally (for testing)
Initiate an automated build
Tag our dockerfile with a version so we can refer to it later

Along the way, I’ll assume you have some basic working knowledge of git/github and the use of the terminal (though I’ve added some footnotes to help). I’ll also assume you’re on a mac, though things shouldn’t be that different on Linux. I can’t really speak to Windows, but the overall process should be similar.

Step 1. Create a git repository on github

I like to create my repos on github and then clone them to my local machine so I can get the license and .gitignore files from github. I’ve made a repo in my github called “docker-tut-example” for this tutorial. You should name your repo whatever you like, but be sure to use your name in all the code below. Whatever name you use will also appear on Docker hub.

Step 2. Create a Dockerfile

In the root of your git repo directory, create a file called “Dockerfile” (no file extension). If you like, you can do this directly on github, or you can clone the repo to your local machine.¹ In the editor of your choosing, add the following:

####### Dockerfile #######
FROM rocker/tidyverse:3.4.3

This will create a docker file image that starts from the rocker/tidyverse:3.4.3 image. We’ll use this to build up our own custom image.

Save the Dockerfile. If you’re working locally, add it to your git repository, commit, and push those changes up to github.² Refresh your github page and you should see the Dockerfile has been added.

Step 3. Setting github and docker hub integration

Now we’ll set up integration between github and dockerhub for automated builds. This will allow other researchers to use your docker container and to allow you to use it across different machines.

First, follow this guide to link your github and dockerhub accounts. Once your accounts are setup, head to the “settings” tab on your github repo page. Then, click on the “integrations & services” tab and add the docker service.

Finally, head over to dockerhub and click “Create” –> “Create Automated Build”, and choose to do so from your github. Choose the appropriate repo from the list.

Now, whenever you push to this repo, an automated build will be triggered on dockerhub. You can watch the builds occur by checking “Build Details” on the dockerhub image page. If nothing is listed on the build details page, make a change in your README.md file so you can make a new commit and push it to github. This should trigger the build.

Step 4. Testing installing packages

The automated build will take a few minutes. Once it’s ready, download and run it on your own machine using the docker run command. In the terminal, enter (after replacing with your repo name):

docker run -d derekpowell/docker-tut-example

The container will download to your machine and start running in the background.

Entering running containers

To interact with running containers we need to find out what they’re named. To see a list of running containers, at terminal enter:

 docker ps

Docker assigns each running image a container id and a weird randomized name. As I was putting together this tutorial it was xenodochial_kilby .

To get a bash prompt inside your running docker image, copy the name or container id and run the following command:

docker exec -i -t xenodochial_kilby /bin/bash

From here we can experiment with installing R packages and any other changes we might want to make. This allows us to test the install process without having to trigger a full automated build everytime we want to add a package.

Installing R packages from CRAN

Let’s try installing an R package using install2.r from littler (which is already in the container). Suppose we wanted to install the lme4 package (a popular package for hierarchical linear models). At your bash prompt, enter:

install2.r --error --deps TRUE lme4

This will install the lme4 R package and its dependencies, throwing an error if anything fails along the way. If it’s a success, we can safely add this line to our Dockerfile.

Installing R packages from github

Generally speaking, install2.r is the preferred way to install packages inside a docker container. But, suppose that instead of installing from CRAN, we wanted to install the latest development version from github. We can do that like so:

R --no-restore --no-save -e 'devtools::install_github("lme4/lme4",dependencies=TRUE)'

Installing R packages while specifying a specific version

Finally, let’s say instead of the latest version we actually wanted to install an older version or maybe we just want to be explicit about the version that’s installed. We can do that too:

R --no-restore --no-save -e 'devtools::install_version("lme4", version="1.1-14")'

To generalize, we can run any R command we want from the command line and we can do this in the creation of our docker container image.

Installing system packages

In some cases, installing an R package might not work as expected and you might end up with something like the following: Error: installation of package ‘rgl’ had non-zero exit status.

This is, in fact, the error you’ll get if you try to install the car package at this stage. This error occurs because we are missing some linux headers for libraries that are required. Unfortunately, littler’s install2.r script can’t take care of these dependencies. This is (a big part of) why we test!

Google this error and you’ll find this stackoverflow post is the first result. The solution is to install libglu1-mesa-dev before installing the car package. To do so, use the following commands:

apt-get update -qq
apt-get -y --no-install-recommends install libglu1-mesa-dev

Run those commands, then you can install car using install2.r (I’ll leave this as an exercise for the reader, as they say).

Step 5. Building Dockerfile locally

Now we’re ready to add the steps we just tested to our docker image. If you haven’t already, clone your github repo to your local machine.¹ Then, open up the dockerfile and edit it so it looks like this:

####### Dockerfile #######
FROM rocker/tidyverse:3.4.3

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update -qq && apt-get -y --no-install-recommends install \
	libglu1-mesa-dev \
&& install2.r --error \
    --deps TRUE \
    lme4 \
    car

A few notes on what’s going on here: RUN is a docker command that executes bash commands during the building of the image. In bash, you can chain commands together with && and split them onto multiple lines with \. Everything must be done in noninteractive mode because the build is automated, you won’t be there to press “y” to continue.

At this stage, we could commit and push this up to github to trigger an automated buid, but before we do that let’s just make sure everything works by building it locally. From terminal on your local machine, ³ navigate up to the parent directory that contains your github repo with cd .., and enter the following command (swapping in your repo name):

docker build docker-tut-example

This will build the docker image from the dockerfile locally. Building locally can let you test more quickly, and without cluttering up your github repo with tons of commits. ⁴

Step 6. Push to github for automated build

Once the build succeeds, you can commit and push your repo to github and docker hub will begin automatically building your docker image. You can check on its progress in the “Build Details” tab of your docker hub repo.

Using the automated build feature of dockerhub might seem like a bit of extra work right now, but it is important for security and trust when you share the images. When you share a docker container that was produced with an automated build, your recipients can check the dockerfile and be sure of its contents.

Step 7. Add version tags

There are lots of different ways you might organize Docker containers to achieve a reproducible workflow. As far as I can see, the simplest would be to maintain a single “personal” image with the libraries you use most. To maintain reproducibility between different projects, you can version this image using tags. Tags let you have multiple version of the same image, as we saw when we used the tidyverse:3.4.3 image in Part 1.

Under this approach, each time you publish a paper or release some work, you would make sure that the docker container was tagged with a version and you would include that with your publication. Something like, “Analyses were conducted using derekpowell/docker-tut-example:0.0.1 docker image.”

To set up your dockerhub repo for tagging, head to the “Build Settings” tab on its page.

This is the tag configuration area. The first row shows that the “master” branch of the github repo is assigned the “latest” tag. This is a special, default tag. If you run docker build rocker/tidyverse with no specific tag, it will assume that the “latest” version should be used. On the next row, change “Branch” to “Tag” (as shown). Now, when you tag your github repo, that tag will also be reflected on docker. Be sure to click “save changes” when you’re done.

Now, head back to terminal and tag the current version of your repo:

git tag -a 0.0.1 -m "very first version"

This will tag the repo with “0.0.1”. If you don’t like that number scheme, you can use any other you like, or even more descriptive tags, e.g., “dissertation.” Do note, the normal git push command will not push tags. To push all tags, enter:

git push origin --tags

Then head over to dockerhub and check the “Build Details” tab–you should see the version being built. For more on git tags, check out this resource.

Conclusions

In this tutorial we created a simple docker container with a custom R environment. Using the steps covered here, you should be able to make create a docker container with R, Rstudio, and the packages you use most. This will give you a reproducible environment to conduct analyses and a way to share that environment with other researchers.

In Part 3 of this tutorial series, I plan to cover one of the side benefits of using Docker for reproducible research: the ability to run Docker containers remotely on cloud services like AWS and Digital Ocean. I’ll also cover some ways to reduce the (minor) pain points associated with running RStudio from within a container in day-to-day use.

Open terminal, navigate to the directory you’d like it to appear in, and type git clone YOUR_REPO_NAME . ↩ ↩²
Add it with ` git add Dockerfile and make a new commit with git commit -m “added dockerfile” . . Finally, push this to github with git push`. ↩
If your terminal window is still at the prompt in your docker container, you can type exit to exit out (should be familiar if you’ve ever used ssh). ↩
On a mac, you may need to make sure Docker has been granted sufficient ram. If not, you may get compiler errors. Access the docker app preferences via the menu bar and bump up the ram if this happens. ↩

Conducting reproducible research with Docker (Part 1 of 3)

2018-02-09T00:00:00-08:00

In scientific research, reproducibility is a necessary (though not sufficient) condition for validity. But conducting reproducible research is hard! Sadly, many psychological studies fail tests of empirical reproducibility. Unfortunately, there’s no software package that can solve the set of structural and statistical issues likely at the root of those non-replications.

Still, there are some tools that can help us achieve statistical or computational reproducibility. This kind of reproducibility means that another researcher can take our data and reproduce the analyses we conducted in a published paper. Sadly again, many studies in psychology fail here too. However, here the problem really might be solved with better tools–tools like R Markdown that can help ensure that our results sections are reflective of our actual analyses.

Here I’m going to describe another tool for producing statistically and computationally reproducible research, Docker. Reproducibility demands we make available the data and analyses scripts used in our research projects, but sometimes the line between our personal computer systems and our projects can start to blur. Our projects have “dependencies” that are required for them to run properly. So, to ensure other researchers can reproduce our projects, we need to clue them in to these dependencies in some way. The simplest way would be to dump our sessionInfo() at the bottom of the page. That’s easy in the moment, but not easy down the road for those who want to reproduce our work. The easier we can make reproducible research the whole way through, the better. Keep in mind, the researcher most likely to attempt to reproduce your work is future you.

Here, I’ll show you how to use Docker to create reproducible workflows for scientific research.

What is Docker?

Docker is a tool for making containerized applications. The docker engine is like a very lightweight virtual machine engine. A virtual machine is (to oversimplify) a computer program that simulates another computer system, typically another operating system. This allows you to run a windows app on your mac, or a linux progam on windows, and so forth.

Docker creates a “containerized” version of an application that includes everything needed to run the app: OS, headers, libraries, packages, etc. This container is saved as an “image”, that can then be shared with others. This allows people working on different computers, with different OS versions, package versions, etc to share and execute code or apps. So long as you have Docker installed on your computer and the right Docker image, you can spin up a container that will exactly reproduce the environment needed for the app, no matter what your own personal computing environment looks like.

Maybe you’re seeing how this can help us do reproducible research: if we create a containerized version of R, we can ensure we have R, R packages, system libraries, etc all in the right versions to reproduce the analyses. And because everything is held together in the container, if we share the image with another researcher, or with our future selves, it won’t matter that they might have a different computer with different OS, packages, etc.

The Rocker Project

Carl Boettiger and Dirk Eddelbuettel at The Rocker Project have done the hard work of properly organizing R and RStudio server applications into well-maintained and versioned Docker images. These Docker containers run R and RStudio server with sensible security options and some helpful base packages. Their website is also a good resource for help using these images. In this tutorial, we’ll use their image to run RStudio server in a Docker container. In Part 2, we’ll build off their work to make our own image with whatever packages we like.

Docker vs. Packrat

There’s another solution to the problem of statistical and computational reproducibility in R, called packrat. I will admit I don’t have a great deal of familiarity with packrat, but I can discuss some differences. First, packrat is focused on R and R alone. This means if you incorporate other languages (e.g., python) in your projects, you will need multiple reproducibility solutions. In contrast, Docker handles everything. Second, Docker gives us other nice and powerful features, such as an easy way to run code remotely on cloud servers. Finally, there’s nothing stopping you from using both approaches (even using packrat inside your Docker container).

Tutorial

In this tutorial we’ll …

Install docker
Make a docker cloud account
Run a docker image from docker cloud

Along the way, I’ll assume you are comfortable using the terminal. I’ll also assume you’re on a mac, though things shouldn’t be that different on Linux. I can’t really speak to Windows, but the overall process should be similar.

1. Installing Docker (on Mac)

I’d advocate for installing Docker on Mac using homebrew. If you don’t have homebrew, Docker has an installation guide for mac that covers all the steps to install the traditional way.

To install using homebrew, open up terminal and run:

brew update && brew cask install docker

Then launch docker from your applications (or with spotlight, cmd-space and type “docker”). You’ll need to enter your administrator password.

Optional: set up bash completion for docker by running the below commands in terminal:

brew install bash-completion
brew install docker-completion
brew install docker-compose-completion
brew install docker-machine-completion

2. Make your Docker cloud account

When you first launch Docker it should prompt you to sign in or create a Docker cloud account. Alternately, you can go to hub.docker.com and create an account there. Dockerhub is a centralized store for docker images (saved containers). In the next step, we’ll grab an image from dockerhub to run a container our machine. Eventually, this will host your own personalized docker images (part 2 of this series).

In the next step, we’ll load the tidyverse container from the rocker project’s page on dockerhub.

3. Run a docker image from docker hub

Ok, now let’s actually get a docker container image running on our machine. First, make sure Docker is running on your machine (check the menubar for the icon). Then, head back over to terminal and enter the following command:

docker run -d -p 8787:8787 -v "`pwd`":/home/rstudio/working -e PASSWORD=rstudio -e ROOT=TRUE rocker/tidyverse:3.4.3

There’s a lot going on in here so let’s break down this command.

The first part, docker run says we want to start running a docker container.
The -d flag tells the container to run in the background (detached)
The -p 8787:8787 flag maps port from inside the docker container to the main computer. This container will end up running an instance of RStudio server, which will be available at localhost:8787. Port 8787 happens to be the default, but it can be nice to be explicit.
The -v `pwd`:/home/rstudio/working flag uses the –volume tag to connect the filesystem on our machine to our docker container. It maps our present working directory to a folder in the docker container called “working” that’s in a location we can access through the RStudio interface. This lets you access whatever data or project files you need from your computer in the docker container.
The -e PASSWORD=rstudio flag sets an environment variable “PASSWORD” to “rstudio”. This sets the password to access the rstudio server instance. Here we’re just explicitly setting the password to the default, “rstudio”. If you run this remotely (part 3 teaser!), this should obviously be changed.
The -e ROOT=TRUE flag gives us root access from inside RStudio. This can be helpful for installing linux dependencies when installing R packages.
Finally rocker/tidyverse:3.4.3 specifies the docker image to run. That is, version 3.4.3 of the rocker/tidyverse image. If we didn’t specify a tag, docker would default to the “latest” tag.

When you run a container without its image present locally, Docker will automatically download it.

Using RStudio

Now open up your browser and navigate to localhost:8787. Enter “rstudio” as your username and whatever password you set as the password (defaults to “rstudio”).

You will then be met with a fully-functioning RStudio interface. In the lower right you should see the file browser with the “working” directory we mapped when we ran the container.

If you make changes to files in “working” inside this Docker container, they will also be reflected on your computer’s file system.

Feel free to play around with this, you can see the already-installed R packages by typing sessionInfo().

Configuring your container

Finally, you may need to adjust how much of your machine you allow Docker to use. On mac, Docker is very “polite” so it doesn’t give itself very much of your machine’s resources. But, because you plan to be working in this container, you will probably want to give it some more juice.

To fix this, access the docker preferences via the menu button and select the “advanced” tab. Then, adjust to your liking. There doesn’t seem to be any harm to letting docker have full access to your system resources, at least not when used in this fashion.

Coming up next …

That’s it for Part 1 of this series. Next, in Part 2 we’ll discuss customizing a docker image with your own personal R environment. Till then you might want to poke around a bit and see what’s available on dockerhub. I won’t cover it’s use in this series, but if you do any work in python, the jupyter notebook datascience container is worth checking out.

Derek Powell

Conducting reproducible research with Docker (Part 3 of 3)

What’s DigitalOcean?

Running Docker containers on DigitalOcean droplets

Step 1: Sign up with DigitalOcean

Step 2: Create SSH keys

Step 3: Create a DigitalOcean droplet with Docker

Step 4: Running RStudio remotely

Step 5: Destroying the droplet

Creature comforts

Setting up git username and password

Changing themes

Putting it all together

Conclusions

Conducting reproducible research with Docker (Part 2 of 3)

The Rocker Project

Tutorial

Step 1. Create a git repository on github

Step 2. Create a Dockerfile

Step 3. Setting github and docker hub integration

Step 4. Testing installing packages

Entering running containers

Installing R packages from CRAN

Installing R packages from github

Installing R packages while specifying a specific version

Installing system packages

Step 5. Building Dockerfile locally

Step 6. Push to github for automated build

Step 7. Add version tags

Conclusions

Conducting reproducible research with Docker (Part 1 of 3)

What is Docker?

The Rocker Project

Docker vs. Packrat

Tutorial

1. Installing Docker (on Mac)

2. Make your Docker cloud account

3. Run a docker image from docker hub

Using RStudio

Configuring your container

Coming up next …