Troubleshooting 101

Dear Users,

In order to shoot your troubles (with job scripts) you ought to know first. Right? Alas, SLURM is a little bit obfuscating sometimes when job scripts do not use proper job steps. Some hints can be found in our wiki.

Also, be sure to check for available modules rather than compiling standard software yourself. In case some software is missing, you may use our little form to ask for a software or library to be installed. When you report issues with a software we installed - at least we know what you are talking about.

Best,

Your HPC Team

Using mogonfs

Dear Users,

We occasionally mentioned that file transfers can be a lot faster, if no encryption (like with scp ) is applied. And for quite a while the wiki stated example will follow asap in the section on ftp. Well, a comprehensive introduction to (l)ftp is not our goal, but at least know we have the promised example.

Your HPC-Team

„Going Productive“

Dear Users,

 

Noticed a low turnover of your jobs? One potential - and alas not so infrequent - cause is taking too much resources.

Ok, what is "too much" with regard to resources (CPU time, RAM)? Of course, you do not want to see your jobs crashing because they hit the run time or memory limit. Therefore you rather ask for a little overhead. And this is what we recommend as well. After all: What is the point if you loose time and have to re-submit?

Yet, asking for 3 GB, when the first 1000 jobs took all below 0.5 GB will cause you to occupy slots where other users and also your jobs could be running. This assumes jobs, of course, with only one or a few slots. If taking multiple nodes, you will be unnecessarily waiting for nodes with more memory. (We sometimes observe memory requirement ratios which are worse.)

Likewise for run time limits: Always asking for the maximum run time of a queue, will impair backfilling, the mechanism which attempts to use all potential CPU time. As a consequence your jobs will be pending longer than needed.

So, instead of stuffing queues with untested jobs, we ask you to test a few jobs (which might require some great overhead in terms of run time and memory). Look for the actual run time in the LSF report and also the actual memory used (its maximum value). It is then still alright to "round up" those values, just to be safe. However, try not to be too cautious as a "too much" will result in a slower work flow for you.

We reserve the right to point you to problematic usage. But remember: We offer courses, individual counseling, etc.. Just ask for our help, if in doubt.

Your HPC team

Shellcheck

Dear Users,

We frequently see jobs dying because of faulty scripts. This is part of a development cycle. After all: Who is perfect?

There are other ways than trying to correct the script post mortem - and saving time. One is to just check the syntax without executing a script:

Another, very powerful tool is this on-line shell checker tool.

Your HPC-Team

 

Customizing bqueues

Many of you have noticed the increasing number of queues. This can be confusing when you just want to check the current status of the queues you're allowed to use. But there is an easy way to change this. You can tell bqueues to show just "your" queues by calling it using

If you want bqueues to always show your queues only, you can just add an alias to your .bashrc (found in your home-directory). Here, is a simple command to add it to your .bashrc-file:

Now, either you call

Or you wait until your next login to Mogon and the changes will be active.

Remark:
If you want to see all queues again, you have to call unalias bqueues . Then bqueues is back to "normal" for the current shell.

Your HPC Team

Publiziert am: 21. Januar 2016. Abgelegt unter TipsnTricks