Running R on the SC clusters
Using R in Rstudio is very convenient. On the other hand, if you R programm uses multiple threads or processes it will soon bring the Rstudio servers to a halt and other user may not be able to work because your R programm is using all the Rstudio computing resources.
Fortunately, computing-intensive R tasks can be run on the SC clusters as well.
Preparing your R script
In order to run your script in a queueing system, all interactive parts must be replaced
- any input must be done using files,
- any graphical output from plots,
- and so on
Running you R script locally
To verify that your R script (test.R) is working type
This should work without error.
Here is an example script (iris.R) with graphical output:
PL <- iris$Petal.Length
PW <- iris$Petal.Width
speciesID <- as.numeric(iris$Species)
options(bitmapType='cairo')
jpeg("iris.jpg", width = 600, height = 350)
plot(PL, PW, #x and y
pch = speciesID, #marker type
col = rainbow(3)[speciesID], # color
xlab = "Petal length (cm)", # x label
ylab = "Petal width (cm)", # y label
main = "Petal width vs. length") # main title
dev.off()
Running
will produce a file iris.jpg that look like this
Running your R script on the cluster
To run your script on the SC clusters you have to connect to the SC clusters and create a slurm job script (R.job) which may look like this:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --job-name=R-test
#SBATCH --partition=clara
#SBATCH --time=01:00:00
export LC_ALL=C.UTF-8
module load R
Rscript iris.R
Now you can submit your job script to the queueing system using sbatch on login01.sc.uni-leipzig.de like this:
Your job will be started as soon as resources are available. When the job is finished, your output as well as a slurm-output file should be in the directory where you started the sbatch before.
For a more detailed Slurm documentation follow this link.
Troubleshooting "version `GLIBC_2.29' not found"
This error can occur if you switch from RStudio to the cluster environment. This error is shown due different versions of the named dependency. To fix the error you can create a own R library environment on the cluster. Afterwards, you have to recompile all packages. To so, you can do the following:
# create specific directory for cluster environment
mkdir ~/R/x86_64-pc-linux-gnu-library/cluster
# export directory for installing packages
export R_LIBS_USER=/home/sc.uni-leipzig.de/<user>/R/x86_64-pc-linux-gnu-library/cluster
# Install packages
module load R
R
install.packages("<package~name>")
Afterwards, you can use the defined library path in your R scripts by adding the following line:
# Load directory as library directory in R script
.libPaths("/home/sc.uni-leipzig.de/<user>/R/x86_64-pc-linux-gnu-library/cluster")
Troubleshooting "LC_CTYPE failed, using 'C'"
This error can occur during the start of R and stats that some locale settings are missing.
R
...
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_MONETARY failed, using "C"
6: Setting LC_PAPER failed, using "C"
7: Setting LC_MEASUREMENT failed, using "C"
To solve this error, add export LC_ALL=C.UTF-8
before you start R or add the line in your job script.