"Going Productive" | High Performance Computing

"Going Productive"

Dear Users,

Noticed a low turnover of your jobs? One potential - and alas not so infrequent - cause is taking too much resources.

Ok, what is "too much" with regard to resources (CPU time, RAM)? Of course, you do not want to see your jobs crashing because they hit the run time or memory limit. Therefore you rather ask for a little overhead. And this is what we recommend as well. After all: What is the point if you loose time and have to re-submit?

Yet, asking for 3 GB, when the first 1000 jobs took all below 0.5 GB will cause you to occupy slots where other users and also your jobs could be running. This assumes jobs, of course, with only one or a few slots. If taking multiple nodes, you will be unnecessarily waiting for nodes with more memory. (We sometimes observe memory requirement ratios which are worse.)

Likewise for run time limits: Always asking for the maximum run time of a queue, will impair backfilling, the mechanism which attempts to use all potential CPU time. As a consequence your jobs will be pending longer than needed.

So, instead of stuffing queues with untested jobs, we ask you to test a few jobs (which might require some great overhead in terms of run time and memory). Look for the actual run time in the LSF report and also the actual memory used (its maximum value). It is then still alright to "round up" those values, just to be safe. However, try not to be too cautious as a "too much" will result in a slower work flow for you.

We reserve the right to point you to problematic usage. But remember: We offer courses, individual counseling, etc.. Just ask for our help, if in doubt.

Your HPC team