Personal Structural Biology (PSB) pipeline usage example
Warning
This is a work in progress and may contain wrong and/or dangerous information.
The PSB pipeline is currently used to compute a number of structural biology processes and combine the result in a web site.
The pipeline has 4 phases:
- Plan
- Launch
- Monitor
- Report
The pipeline is currently implemented inside a singularity container with most of the necessary software preconfigured.
All the databases are outside the container but are available fom within the container.
Prepare input data for the pipeline
The pipeline requires one directory for each case.
The root directory for all cases is /work/pipeline/cases/UDN/
.
Inside that directory every case gets it's own subdirectory.
For this example, we assume the case is called mycase
.
Inside the case directory the user needs to create a csv file named casename_missense.csv.
In this example: 'mycase_missense.csv'
The file containins the information to be included in the case e.g.
,gene,refseq,mutation,unp
0,KCNA2,NM_001204269,R300C,P16389-2
1,FRAS1,NM_025074,H3285Y,Q86XX4-2
2,FRAS1,NM_025074,P214L,Q86XX4-2
3,AP3B2,NM_004644.4,E465K,Q13367-1
4,ATP6V0A1,NM_001130020.1,R741Q,Q93050-3
5,RAPGEF6,NM_016340,N1075S,Q8TEU7-1
6,ALG11,NM_001004127,Q213P,Q2TAA5
A more detailed description of the file content is coming soon.
Define the PSB customisations
PSB has a global configuration file as well as a user-specific one. The global configuration should work for all users on the Clara, Paula and Paul cluster.
The user-specific config file can be put in the parent directory of the case directory (currently /work/pipeline/cases/UDN/) and must be named /work/pipeline/cases/UDN/config/global.config
and has additional information.
[UserSpecific]
[Genome_PDB_Mapper]
[SlurmParametersAll]
# It is often the case that only the SLURM mail-user is overwritten
# in this file
mail-user=nospam@uni-leipzig.de
mail-type=end,fail
Start the PSB container
For the pipeline to work we need to bind-mount a few directories into the container and also change into the case directory before starting the container.
cd /work/pipeline/cases/UDN/mycase
singularity exec --env UDN=/work/pipeline/cases/UDN --bind /work/pipeline/cases/UDN/,/work/pipeline/db:/data/ ../image_phase9.simg /bin/bash
Plan the pipeline
The next step is to "plan" the work that will be done for the case. In this step the csv file which was defined earlier, will be parsed and evaluated.
This step is done by calling
inside the singularity container.
Submitting the jobs to SLURM
After the plan is created, jobs that calculate the data which will later be used to create the web site, need to be submitted. Since jobs cannot be submitted from within the pipeline yet, we need to create a script that will be used to launch the jobs outside the container.
Create the launch script
will create a scipt called launch_mycase.py
in the mycase directory.
Submit the jobs to SLURM
This step needs to be done outside the singularity container.
Log in to login01
using a different shell, go to the case directory and call the launch script like this:
Now you should be able to see that a number of SLURM jobs have been submitted by running squeue --me
.
Depending on your configuration, you will get an email when each of the job finishes and/or fails.
Monitor the pipeline progress
Running the jobs will take some time, depending on the availability of compute resources as well as the complexity of your case. The progress of the pipeline can be monitored by running
inside the singularity container.
Create the report/web site
The final step of the pipeline is to create the files for the web site. This is done by running
within the PSB container.
This will create an archive file to copy to your web server.
In the mycase
example, these files are mycase.tar.gz
Upload the results to a web server
We currently do not have an automatic setup for this.