Skip to content

Introduction to High Performance Computing

Using your workstation or our cluster

A High Performance Computing (HPC) cluster is a group of interconnected computers that work together to solve complex computational problems, while a local workstation is a single computer system used by an individual.

The decision of whether to use an HPC cluster or a local workstation depends on several factors such as the size of the problem, the required computational resources, and the available time.

Here are some general guidelines:

When to use an HPC cluster

  1. When the problem is too large to be solved on a single computer or requires a significant amount of computational resources, such as memory or processing power.
  2. When the computation requires parallel processing, where the problem can be divided into smaller parts and processed simultaneously on multiple computers.
  3. When the researcher needs access to specialized software or hardware that is only available on the HPC cluster.
  4. When there is a need to run multiple jobs concurrently or in a batch mode, where the jobs are submitted to a queue and executed sequentially on the cluster.

When to use a local workstation

  1. When the problem is small enough to be solved on a single computer, and the required computational resources are available on the local workstation.
  2. When the researcher needs immediate access to the computation results, and there is no need to wait for the jobs to complete on an HPC cluster.
  3. When the researcher needs a high degree of control over the computation environment, such as the ability to customize the software and hardware configuration.

In general, an HPC cluster is most suitable for large-scale and complex computations, while a local workstation is best for small to medium-sized problems that require less computational resources. However, the decision of which platform to use ultimately depends on the specific problem you are trying to solve.

Warning

Running a simple, serial program might no be faster on the cluster.
In fact, it may even be slower, because of the lower CPU clock speed (between 2GHz and 3GHz) on the cluster compared to the clock speed on your local workstation or even laptop (sometimes in the range of 4GHz for good workstations).

Shared resources

When using a High-Performance Computing (HPC) environment that is shared among multiple users, it is important to be mindful of the shared resources to ensure that everyone can access and utilize them efficiently.

Here are some instructions that you can provide to users to promote responsible use of shared HPC resources:

Fair use of shared resources

Be considerate of others: Remember that you are sharing the HPC environment with other users who also need access to the resources. Try to minimize the amount of time that you spend on the system, and avoid running jobs that are unnecessarily long or resource-intensive.
Use the queueing system: Our HPC environment uses a queueing system to manage the allocation of resources. When submitting a job, make sure that you are using the appropriate queue for the job and that you are not monopolizing the resources.
Use the resources efficiently: Make sure that your job is using only the resources that it needs to complete the task. For example, if your job only requires a single CPU core, do not request more cores than necessary.
Clean up after yourself: When your job has finished running, make sure that you remove any files or data that you no longer need. This will help to free up space for other users.
Follow the rules and guidelines: We have specific rules and guidelines for usage that are designed to ensure fair and equitable access for all users. Make sure that you are familiar with these rules and guidelines, and follow them at all times.

By following these instructions, users can help to ensure that the shared HPC resources are used efficiently and effectively, and that everyone has access to the resources they need to complete their work.

Queueing system - SLURM

SLURM (Simple Linux Utility for Resource Management) is a job scheduler and resource manager that is commonly used in High-Performance Computing (HPC) environments. Its main function is to allocate and manage resources on a cluster of computers in a way that maximizes utilization and minimizes waiting times for users.

As a new user, here are some key concepts and commands that you should be familiar with in SLURM:

SLURM terms

Partitions: A partition is a subset of the computing resources on a cluster, such as a specific group of nodes or a particular type of hardware. Users can request access to specific partitions when submitting a job.
Jobs: A job is a unit of work that is submitted to the cluster for execution. Users can submit jobs using the sbatch command, which includes information such as the required resources, duration of the job, and the command to be executed.
Job status: Users can check the status of their jobs using the squeue command, which provides information such as the job ID, the status (running, pending, completed), and the time remaining until completion.
Resource allocation: SLURM allocates resources based on a set of policies, such as fair-share scheduling or priority-based scheduling. Users can set various options to influence the allocation of resources, such as the number of CPUs, the amount of memory, and the duration of the job.
Job management: Users can manage their jobs using various commands, such as scancel to cancel a job, scontrol to modify a job's parameters, or sacct to view job accounting information.

Overall, SLURM is a powerful tool for managing resources and scheduling jobs in an HPC environment. As a new user, it is important to familiarize yourself with the basic concepts and commands so that you can submit and manage jobs efficiently and effectively.

Understanding shared memory and distributed memory

Shared-memory and distributed-memory are two different approaches to manage and access memory in computer systems.

Shared-memory refers to a type of memory architecture where all the processors in a computer system share a common pool of memory. This means that any processor can access any part of the shared memory at any time, which allows for easy and efficient communication between the processors. Shared-memory systems can be either symmetric multiprocessing (SMP) or non-uniform memory access (NUMA) systems. SMP systems have multiple processors that share a common memory space, while NUMA systems have multiple processors that access a larger shared memory space by using a local memory cache.

On the other hand, distributed memory refers to a type of memory architecture where each processor has its own local memory, and the processors communicate with each other through message passing. This means that each processor can only access its own local memory, and to access data from other processors, it needs to communicate with them explicitly through message passing. This approach is typically used in distributed systems where the processors are located in different physical locations and are connected through a network.

In terms of programming, shared-memory systems are generally easier to program because all the processors can access a common pool of memory. This means that variables and data structures can be shared across all processors, and synchronization mechanisms such as locks and barriers can be used to manage access to shared resources. On the other hand, distributed memory systems require more complex programming because the processors cannot directly access each other's memory. In this case, the programmer needs to explicitly manage the message passing between processors to ensure that data is correctly shared and synchronized.

Overall, the choice between shared-memory and distributed memory architectures depends on the specific requirements of the application and the available hardware. Applications that require high levels of parallelism and communication between processors may benefit from shared-memory systems, while applications that require scalability and fault-tolerance may benefit from distributed memory systems.

CPU vs GPU programming

CPU and GPU programming differ in how they utilize processing units and memory resources to perform computations.

CPU (Central Processing Unit) programming refers to the traditional approach of programming for the main processor of a computer, which is responsible for executing general-purpose tasks such as running the operating system, executing software applications, and managing input/output operations. CPUs have a relatively small number of cores (usually 4-16 in a workstation; more in an HPC node) and are optimized for single-threaded performance, meaning they can execute a single instruction at a time with high efficiency.

GPU (Graphics Processing Unit) programming, on the other hand, refers to programming for specialized hardware designed to handle computationally intensive tasks, such as image and video processing, machine learning, and scientific simulations. GPUs have many more cores (typically hundreds to thousands) than CPUs and are optimized for parallelism, meaning they can perform multiple operations simultaneously across large sets of data. Additionally, GPUs have specialized memory structures, such as high-bandwidth memory (HBM), that can quickly move data between the processor cores and memory.

In terms of programming, CPU programming typically uses general-purpose programming languages such as C, C++, or Python, while GPU programming uses specialized languages such as CUDA (for NVIDIA GPUs) or OpenCL (for multiple GPU architectures). Additionally, GPU programming requires careful management of memory usage and data transfers between the CPU and GPU, as well as efficient parallel algorithms that can fully utilize the GPU's processing power.

Overall, the choice between CPU and GPU programming depends on the specific requirements of the application. CPU programming is generally used for general-purpose computing tasks and can handle small to moderate data sets, while GPU programming is ideal for computationally intensive tasks that involve large data sets and require high levels of parallelism.