Personal Structural Biology (PSB) pipeline usage example

Warning

This is a work in progress and may contain wrong and/or dangerous information.

The PSB pipeline is currently used to compute a number of structural biology processes and combine the result in a web site.

The pipeline has 4 phases:

Plan
Launch
Monitor
Report

The pipeline is currently implemented inside a singularity container with most of the necessary software preconfigured.
All the databases are outside the container but are available fom within the container.

Prepare input data for the pipeline

The pipeline requires one directory for each case. The root directory for all cases is /work/pipeline/cases/UDN/.

Inside that directory every case gets it's own subdirectory. For this example, we assume the case is called mycase.

mkdir /work/pipeline/cases/UDN/mycase

Inside the case directory the user needs to create a csv file named casename_missense.csv.

In this example: 'mycase_missense.csv'

The file containins the information to be included in the case e.g.

,gene,refseq,mutation,unp
0,KCNA2,NM_001204269,R300C,P16389-2
1,FRAS1,NM_025074,H3285Y,Q86XX4-2
2,FRAS1,NM_025074,P214L,Q86XX4-2
3,AP3B2,NM_004644.4,E465K,Q13367-1
4,ATP6V0A1,NM_001130020.1,R741Q,Q93050-3
5,RAPGEF6,NM_016340,N1075S,Q8TEU7-1
6,ALG11,NM_001004127,Q213P,Q2TAA5

A more detailed description of the file content is coming soon.

Define the PSB customisations

PSB has a global configuration file as well as a user-specific one. The global configuration should work for all users on the Clara, Paula and Paul cluster.

The user-specific config file can be put in the parent directory of the case directory (currently /work/pipeline/cases/UDN/) and must be named .config. It can be used to add additional parameters or override already defined parameters in the global configuration file. For reference, the global config file is at /work/pipeline/cases/UDN/config/global.config and has additional information.

[UserSpecific]
[Genome_PDB_Mapper]
[SlurmParametersAll]
# It is often the case that only the SLURM mail-user is overwritten
# in this file
mail-user=nospam@uni-leipzig.de
mail-type=end,fail

Start the PSB container

For the pipeline to work we need to bind-mount a few directories into the container and also change into the case directory before starting the container.

cd /work/pipeline/cases/UDN/mycase
singularity exec --env UDN=/work/pipeline/cases/UDN --bind /work/pipeline/cases/UDN/,/work/pipeline/db:/data/ ../image_phase9.simg /bin/bash

Plan the pipeline

The next step is to "plan" the work that will be done for the case. In this step the csv file which was defined earlier, will be parsed and evaluated.

This step is done by calling

psb_plan.py

inside the singularity container.

Submitting the jobs to SLURM

After the plan is created, jobs that calculate the data which will later be used to create the web site, need to be submitted. Since jobs cannot be submitted from within the pipeline yet, we need to create a script that will be used to launch the jobs outside the container.

Create the launch script

psb_launch.py --nolaunch

will create a scipt called launch_mycase.py in the mycase directory.

Submit the jobs to SLURM

This step needs to be done outside the singularity container. Log in to login01 using a different shell, go to the case directory and call the launch script like this:

cd /work/pipeline/cases/UDN/mycase
./launch_mycase.py

Now you should be able to see that a number of SLURM jobs have been submitted by running squeue --me. Depending on your configuration, you will get an email when each of the job finishes and/or fails.

Monitor the pipeline progress

Running the jobs will take some time, depending on the availability of compute resources as well as the complexity of your case. The progress of the pipeline can be monitored by running

psb_monitor.py

inside the singularity container.

Create the report/web site

The final step of the pipeline is to create the files for the web site. This is done by running

psb_rep.py

within the PSB container.

This will create an archive file to copy to your web server. In the mycase example, these files are mycase.tar.gz

Upload the results to a web server

We currently do not have an automatic setup for this.