Running R on the SC clusters

Using R in Rstudio is very convenient. On the other hand, if you R programm uses multiple threads or processes it will soon bring the Rstudio servers to a halt and other user may not be able to work because your R programm is using all the Rstudio computing resources.

Fortunately, computing-intensive R tasks can be run on the SC clusters as well.

Preparing your R script

In order to run your script in a queueing system, all interactive parts must be replaced

any input must be done using files,
any graphical output from plots,
and so on

Running you R script locally

To verify that your R script (test.R) is working type

Rscript test.R

This should work without error.

Here is an example script (iris.R) with graphical output:

PL <- iris$Petal.Length
PW <- iris$Petal.Width
speciesID <- as.numeric(iris$Species)

jpeg("iris.jpg", width = 600, height = 350)

plot(PL, PW,                               #x and y
    pch = speciesID,                      #marker type
    col = rainbow(3)[speciesID],          # color
    xlab = "Petal length (cm)",           # x label
    ylab = "Petal width (cm)",            # y label
    main = "Petal width vs. length")      # main title

dev.off()

Running

Rscript iris.R

will produce a file iris.jpg that look like this

Iris Dataset example

Running your R script on the cluster

To run your script on the SC clusters you have to connect to the SC clusters and create a slurm job script (R.job) which may look like this:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --job-name=R-test
#SBATCH --partition=clara
#SBATCH --time=01:00:00

module load R
Rscript iris.R

Now you can submit your job script to the queueing system using sbatch on login01.sc.uni-leipzig.de like this:

sbatch R.job

Your job will be started as soon as resources are available. When the job is finished, your output as well as a slurm-output file should be in the directory where you started the sbatch before.

For a more detailed Slurm documentation follow this link.

Troubleshooting "version `GLIBC_2.29' not found"

This error can occur if you switch from RStudio to the cluster environment. This error is shown due different versions of the named dependency. To fix the error you can create a own R library environment on the cluster. Afterwards, you have to recompile all packages. To so, you can do the following:

# create specific directory for cluster environment
mkdir ~/R/x86_64-pc-linux-gnu-library/cluster

# export directory for installing packages
export R_LIBS_USER=/home/sc.uni-leipzig.de/<scuser>/R/x86_64-pc-linux-gnu-library/cluster

# Install packages

module load R

R

install.packages("<package~name>")

Afterwards, you can use the defined library path in your R scripts by adding the following line:

# Load directory as library directory in R script 

.libPaths("/home/sc.uni-leipzig.de/<scuser>/R/x86_64-pc-linux-gnu-library/cluster")