newnew


Time limit on GPU jobs!


Once you've obtained an account and successfully logged into TARDIS a natural question to ask is: What next?

Here, we'll try to give an answer to that question and, hopefully, help you get going in no time.

A warning, thou. This page will not discuss details of software packages, but only tell you how to run them on the cluster. For example, if you want to know how to set up a molecular dynamics simulation with LAMMPS, please read LAMMPS documentation first. Once you are ready, come back to this page and see how to submit and run that simulation on TARDIS.

Layout

TARDIS has 36 computer nodes each having 16 CPU cores and 1 graphics card. Nodes are connected with low-latency fast interconnect (Infiniband) which allows users to run communication intensive applications like molecular dynamics simulations or linear algebra software.

Ground rules

Resource management is done by Sun Grid Engine (SGE). Please take a look at SGE user documentation). Computational nodes can only be accessed through the SGE queuing system. Although user can log into each node directly, this is very rarely necessary. Actually, the only reason a user should log into execution nodes is to kill a runaway jobs, that is jobs that were not properly terminated by SGE. This should almost never happen.

Master node (tardis) is not a part of the SGE queuing system and cannot be used for running jobs. Users are permitted to run short tests. However, if your test job runs for more than 15 minutes, and you really need an interactive control over it, please request a SGE interactive session (see below). Also, no MPI jobs (even tests) should run on the master node.

Warning! Users are not allowed to bypass the SGE queuing system, directly log into nodes and run their jobs. Such action would severely interfere with the resource management as well as with jobs of other users. Such behavior will result in the offender's jobs being killed and the account being disabled, without any notification!

Resources

File system

All users' home directories are located at /home which is mounted on a RAID-5 disk array. Total available space in /home directory is 5Tb. RAID system ensures data redundancy. Therefore there is not need for a separate backup system for the /home directory. However, please note that currently there is no backup system in place which would save you from accidentally deleting a file. Be careful with the rm command!

/home and /shared directories are NFS mounted on all nodes making their content available to all compute nodes.

directory mount point purpose total space quota backup
home /home stores users' home directories 5TB NO automatic (RAID-5)
shared /shared custom software approx. 250Gb NO none

Environment modules

In order to avoid adding long paths to the $PATH, $LD_LIBRARY_PATH, etc. environment variables inside users .bashrc and .tcshrc scripts, as well as to make switching between different versions of the same package as simple as possible TARDIS uses environment modules system.

Different packages are loaded and unloaded as necessary by using

module




command. module command automatically sets all necessary environment variables making the package available to the user. For example, if we need GSL library, we type

module load gsl




and all necessary paths will be set. If we now want to compile a C program (e.g., test_gsl.c) that makes calls to the GSL library, we simply type

gcc -o test_gsl test_gsl.c -lgsl -lgslcblas




thus avoiding need for lengthy -I and -L definitions.

Once we do not need the module any more we simply unload it

module unload gsl




and all environment variables will be automatically set back to their original values.

List of all available modules can be obtained by typing:

module avail




while

module list




will list all currently loaded modules. For more information type

man module




Sun Grid Engine Queues

Access to the resources is controlled by three SGE queues, one for CPU jobs (serial, MPI and shared memory - OpenMP) and two for GPU jobs (long and short GPU jobs) There is a maximum allowed execution time set for all queues. TThe SGE queues are summarized in the following table:

queue name queue type max CPU per job total number of CPUs in the queuemax GPUs per job max time parallel environment
cpu.q CPU - serial, shared and parallel 540 540 0 4 days parallel, shared
longgpu.q GPU 1 10 1 10 days none
shortgpu.q GPU 1 26 1 24h none


SGE requires that each parallel job has a parallel environment (PE), that is a set of configuration parameters that define how parallel jobs are handled. parallel PE uses $fill_up allocation rule, meaning that as many as possible processors will be allocated on one node.

In addition, there is a shared PE which allows users to submit memory shared jobs, that is jobs that use OpenMP or a similar threaded code. This PE allows users to gain control of 15 cores of a single node. The maximum number of slots that can be requested by a job using this PE is 15.

Parallel jobs can allocate at most 15 CPU cores per node. One CPU is always reserves for GPU jobs. Even is no GPU jobs are running, one core per node will always be unavailable for CPU jobs. GPU resources are limited to 36 graphics cards, and this policy is set to ensure that a GPU job need not wait for CPU only jobs to finish.

There are two queues for GPU jobs. GPU jobs that take up to 10 days (240 hours) should be scheduled to the longgpu.q queue. This queue can run up to ten such jobs at the same time. All long GPU jobs are executed on nodes 1 through 10. Short GPU jobs (up to 24 hours) are queued in the shortgpu.q queue. This queue has 26 slots spanning nodes 11 through 36. There are no unlimited GPU jobs.

Note that the default SGE queue (all.q) is disabled.

Submitting serial jobs

To submit a serial job we first need to make a sge script file. The simplest way is to copy the following template

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -q cpu.q

module load required_module_goes_here

your_commad_goes_here
















into a file, e.g., serial.sh We can now submit the job by typing:

qsub serial.sh




We can then check status of a job using:

qstat






Lets take a moment and look into the serial.shscript. First line:

#!/bin/bash




tells to Linux that this is a bash shell script, i.e., that this text file will contain commands that bash shell is supposed to execute. Second line:

#$ -S /bin/bash




is SGE command (all SGE commands start with #$ sign). It tells SGE to use bash shell to run this job. Next line:

#$ -cwd




makes SGE change directory to the current, i.e., the directory from which the job was submitted. If this line is omitted all output and error files will be created in user's home directory (/home/username). This is probably something you would want to avoid, so it is advisable to keep this line. Finally,

#$ -q cpu.q




tells SGE that we want to place our job in the cpu.q queue. Remaining lines are standard shell script commands. In this case we first load a couple of modules and finally execute the main program. An SGE script for our gsl_test example would be:

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -q cpu.q

module load gsl

./gsl_test
















Submitting parallel jobs

To submit a 20-processors parallel (MPI) job to the cpu.q queue we use following template script

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -q cpu.q
#$ -pe parallel 20

module load mpi
module load other_required_modules

mpirun -np $NSLOTS your_commad_goes_here


















Save it to a file, e.g., parallel.sh (of course, you need to specify modules and the executable you want to use), and submit

qsub parallel.sh




Again, we can check status of the job with:

qstat






Parallel script is very similar to the serial version. Main difference is the line:

#$ -pe parallel 20




it tells SGE to use parallel environment parallel (see above) and allocate 20 CPU cores for this job. An example of a parallel script for submitting a LAMMPS job:

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -q cpu.q
#$ -pe parallel 20

module load mpi
module load lammps

mpirun -np $NSLOTS lmp_openmpi -in in.chain


















Note that we could omit module load mpi as module lammps loads it automatically.

Submitting shared memory jobs

To submit a 10-core threaded (e.g., OpenMP) job to the cpu.q queue we use following template script

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -q cpu.q
#$ -pe shared 10

export OMP_NUM_THREADS=10 # in case you are using OpenMP

module load required_modules

your_command_goes_here




















Save it to a file, e.g., shared.sh (of course, you need to specify modules and the executable you want to use), and submit

qsub shared.sh




Again, we can check status of the job with:

qstat






Shared memory job script is very similar to the serial version. Main difference is the line:

#$ -pe shared 10




it tells SGE to use parallel environment shared (see above) and allocate 10 CPU cores from one node for this job.

Submitting GPU jobs

To submit a GPU job to the shortgpu.q queue - execution time up to 24 hours - we can use

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -q shortgpu.q
#$ -l ngpu=1

module load hoomd

hoomd my_confing.py

















This will place our GPU job into the shortgpu.q queue. Line

#$ -l ngpu=1




tells SGE that we are requesting a complex resource ngpu (number of GPUs) and that it is equal to 1 (recall, there is one graphics card per node). If we save this script to file gpu.sh we can submit it with:

qsub gpu.sh




Submitting interactive jobs

On occasion it is convenient to have interactive access to the compute nodes. One example would be debugging a large parallel application. Instead of going around SGE and violating access policies by logging directly on to the nodes, interactive access can be gained by using, e.g.

qrsh -S /bin/bash -q cpu.q -pe parallel 4




This command will allocate an interactive shell session with 4 CPU cores in the cpu.q queue. After you are done with an interactive session, please make sure to log out so the CPUs can be released back to the SGE scheduler.


Misc

Here list a few tips that are not required to use the cluster, but occasionally can be helpful.

Add host name to the hosts file

If you have been accessing TARDIS from a Linux box and got tired of constantly having to type "ssh -X tardis.research.northwestern.edu" you might want to add the following line:

129.105.61.20 tardis.research.northwestern.edu tardis




to the end of the /etc/hosts file on your machine. Note, you have to be a superuser to do this. Now you can simply type:

ssh -X tardis




to access the cluster.

Password-less login (from Linux)

In case you prefer not to type your password each time you want to log on to the cluster, please follow these simple steps:

  1. In a terminal on your workstation type

    ssh-keygen -t rsa




    This will generate a public/private key pair in your ~/.ssh directory (if you used the default key location). Please leave the passphrase field empty (hit Enter when prompted).
  2. Copy the newly generated public key to the cluster with the following command:

    ssh-copy-id -i ~/.ssh/id_rsa.pub username@tardis.research.northwestern.edu




    where you should substitute word username with your actual user name. This command appends content of the ~/.ssh/id_rsa.pub to the ~/.ssh/authorized_keys file on the cluster.


If everything goes well, you will be able to ssh into the cluster without being asked to type your password.

Categories :