Softwareinstallation

Bitte authentisieren Sie sich mit Ihrem Uni-Account.

Ihr Uni-AccountIhr Uni-Passwort

Using MATLAB

MATLAB (matrix laboratory) is a multi-paradigm numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages, including C, C++, Java, and Fortran. (taken from http://en.wikipedia.org/wiki/MATLAB, more details there)

http://www.mathworks.de/products/matlab/

Usage

module add software/matlab/R2015a

See the vendor documentation: http://www.mathworks.de/de/help/index.html

Using Matlab-Compiler

There are several options to compile your Matlab code to stand-alone executables/libraries. Being independent of licenses is one of the major advantages here, of course. But when running compiled code with the Matlab Runtime Envirenment (MRE) on the cluster you have to consider the threading of your code just as well as when you run Matlab itself. Generally, Matlab detects the number of physical cores and opens the same amount of threads to make full use of the multithreading implemented in the built-in functions. So, if you call

you obtain multithread code. Often this might be wanted, but you have to make sure that you select the appropriate resources for this then - namely, the appropriate core-affinity. Since Matlab wants to use everything on a host you'll have to call bsub -n 1 -R 'affinity[cores(64)]' and an appropriate memory reservation. On the other hand, if your code doesn't need the full multithreading capability, you should compile your code with the flag 'singleCompThread'

This makes sure that your standalone code will run on a single computational thread, which not only doesn't frustrate the core scheduler and the other users but improves performance of your code because less time is spent in scheduling all the threads on one core.

Sometimes it can speed up your single-core job if submit it using

This binds 2 cores of the same memory block to your process and can speed up your calculation by up to 50% depending on your code, of course. Note that you can only use up to 32 GB for your task in this case, since one NUMA-Block contains 32 GB.

Examples

Submitting a Matlab job

 

Compiling a m-file on a node

Using the following shell script you can compile a m-file into a stand-alone c application. It's a variation of the script for using local scratch on a node.

 

Simple Submission

Simple command line submission

Submitting a job is as easy as typing the following on the command line:
$ bsub -q short -n 128 -R 'span[ptile=64]' -app Reserve1800M ./cpi
Now, what does this do? It asks LSF to run the binary ./cpi on 128 job slots distributed on 2 nodes and 1800M of RAM per jobslot in the queue short. But note, this job will exit after 15 minutes because this is the default run time for the queue and an individual run time limit wasn't set using -W.

Simple submission by script

The same can be achieved by using a bsub script. Here, you list all your bsub parameters in a file as special comments like this:

And then you type
$ bsub < job

Memory reservation

Memory reservation

When submitting a job, you need to set memory limits and reservation (or a default of 300M will be applied, which is most likely not enough).

All of these values are per process or requested core.

Application profile

The easiest way to reserve memory for your job is to use pre-defined application profiles. You can list all available profiles using the bapp command. The exact definition of a profile can be viewed with bapp -l <Profile Name>.

To use an application profile, supply its name to the bsub command:

If you use full nodes (with -R 'span[ptile=64]'), you should use the application profile Reserve1800M, which reserves nearly all available memory on the node (subtracting a little buffer for the operating system) for your job. Adding this option is recommended.

High memory requirements

If you need a lot of memory by just one process you have to combine this requirement with the affinity-string in your bsub command. Imagine a single process that needs the whole available RAM on a node. You would call

This binds all 64 cores to one process and allows that single process to use the whole memory.
 

Determine memory usage of applications

There are many ways to determine how much memory an application uses, e.g. looking at the output of ps or top while the program is running.

The easiest way to get the maximum memory consumption is to use the time binary (not the bash builtin) as described here:

Using GPU

Using GPUs

The titan-Queues (titanshort/long) currently include that carry 4 GeForce GTX TITAN, hence a usage request up to cuda=4 can be selected (see below). In contrast for the tesla-Queues (teslashort/long) 4 Tesla K20m cards are installed.

GPU Usage

To use a GPU you have to explicitly reserve it as a resource in the bsub call:

The code or application to be carried out needs to

  1. be an executable script or program.
  2. carry a shebang.

While this is true for LSF in general, it is imposed for the GPU-resource requests.

Using multiple GPUs

If supported by the queue, you can request multiple GPUs like

Be sure to add a sufficient time estimate with -W. Also, multiple CPUs can be requested with the usual ptile option.

Using multiple nodes and multiple GPUs

In order to use multiples nodes, you have to request entire nodes and entire GPU sets, e.g.

In this example 2 entire titannodes will be used (also the CPU set).

Your job script / job command has to export the environment of your job. mpirun implementations do have an option for this (see your mpirun man page).

Multiple GPU nodes require to take entire nodes. The entire GPU set has to be claimed and the entire CPU set - either by setting affinity(core) or ptile.

Using ramdisk

Using ramdisk(s)

Especially for I/O intensive jobs, which issue many system calls, the local disk as well as our GPFS fileserver can be a bottleneck. Staging of files to a local ramdisk can be a solution for these jobs.

In order to create a ramdisk for your job you must specify an (additional) rusage statement within your bsub command:
-R rusage[ramdisk=1000]
, where the size of the ramdisk is stated in MByte. The amount of memory used for the ramdisk is automatically added to your requested memory. Depending from the way you reserve your memory (
-R rusage[mem=1000]
or
-app Reserve5G
), bsub will indicate the new memory reservation settings caused by the ramdisk request.

The ramdisk is created in the jobdir directory
/jobdir/${LSB_JOBID}/ramdisk
The resource option
-R span[ptile=]
is mandatory for ramdisks! Adjust it to (at least) the number of processor cores your job requests. e.g. for a single core job use:
-R span[ptile=1]
Examples:

1) Memory reservation via application profile:

2) Memory reservation via rusage directive:

Multi processor jobs

For multi processor jobs one ramdisk per host is created. The memory occupied by the ramdisk is equally distributed on the RAM requests for every single jobslot. E.g. a 3 processor job with 2000MB per process and a 1000MB ramdisk (or for example 6 processor job on two hosts with 3 processes per host, 2000MB per process and a 1000MB ramdisk) will end up with a memory request of 2333MB per jobslot:

Access

If you need access to mogon for a project via AHRP. In case you require help filling out the application address us with a mail to hpc@uni-mainz.de.

If you're member of a research group and can't login into Mogon, ask the technical responsible person of your research group to add your account to the project. There should be at least one technical responsible per group but anyways the project leader can do this, too.

According to the ZDV's rules for using all facilities sharing accounts and passwords is not permitted.

Access to HIMster and Clover is managed outside of the ZDV at the Helmholtz-Institut Mainz.

Setting up environment in Python

To load environment modules in python:

Of course, all the other commands work as well like module('list') to check already loaded modules and avoid version conflicts.

For Python 3.4.1 on mogon we enabled a modules module ;-), e.g.

This, of course, requires an environment, where the –system-site-packages-option has been employed during the set up of your (currently active) python 3.4.1 environment.

Setting up environment

To run jobs on Mogon, you need to set up your environment correctly. This wiki page will help you to do so.

List of useful module commands

CommandDescription
module -hshows an overview of all available commands
module availshows all available modules
module load/add [module]loads the module [module]
module listshows all loaded modules
module rm/unload [module]unloads the module [module]
module purgeunloads all loaded modules

Show modules

In many cases there are several versions of the same compiler/program installed. You have to check that your desired version is available. Yo can get an overview of all available modules using the command module avail.

As module reports on standard error, a grep for a particular module can be done like:

Alternatively, you could call

if you are looking for a list of all available mpi versions.

If you think there is an important version missing, feel free to contact us at hpc@uni-mainz.de!

Load modules

Now, knowing the modules that are available you can go on by loading the module(s) you need. This will happen using the command module load [module] or module add [module] 1).

Example: Maybe you want to load gcc (version 4.7.4) and intelmpi, so you would do:

That's all!

List active modules

If you want to check your loaded modules, simply use module list.

Example:

Unload modules

To unload a specific module you can use the command module rm [module] or module unload [module]. If you want to unload all your loaded modules, use module purge.

Example:

Queues

List of queues on Mogon

QueueDescription
short(default) for jobs using less than 64 cores and running up to 5 hours
longfor jobs using less than 64 cores and running up to 5 days
nodeshort/-longfor jobs using more than one node
gpushort/-longIntel-nodes with different GPUs
teslashort/-long
Intel-nodes with TESLA K20(X)m GPUs
titanshort/-longIntel-nodes with TITAN GPUs
micshortIntel-nodes with XeonPhi
systemtestfor hpc-admins only

To get information about the queues on mogon, use the bqueues command, e.g.:

For detailed information, use bqueues -l , e.g.: