Scheduler Update: Memory Limits

Recently, we noticed a substantial amount of nodes crashing, causing job failures. We have been investigating this issue and have determined that the problem is related to memory issues. Jobs have been filling up all available RAM and swap partitions. Under Moab and RHEL 6.5, this issue did not show up, since offending jobs would get killed by the Linux kernel. Currently with the new scheduler, these jobs cause compute nodes to crash and reboot. Any running job on those nodes will fail without any meaningful error message sent to users ("node failure").

To solve this problem, we are activating some memory related optimizations on the HPC starting today:

  1. We have enabled memory management in the Slurm scheduler. This means that Slurm will take into account available RAM on compute nodes when scheduling jobs.
  2. We have disabled memory paging (swap) on all compute nodes in the HPC.
  3. In addition to specifying processor resources (ie. nodes and cores) in your submission scripts, you can now specify the desired RAM per CPU. You can use the --mem-per-cpu=<MB> or --mem=<MB> options to set this value (see docs or man sbatch).
  4. If you do not specify a desired memory parameter, Slurm will allocate the default: 4GB per CPU for all partitions, except backfill and backfill2, which will be allocated 2GB per CPU. The 48 and 8 core AMD nodes have only 2.6GB or 2.0GB memory per core, respectively; if your partition is mapped to one of these nodes, you should explicitly set the memory in your submission script or you might not be able to fully utilize all cores on these nodes. We are currently looking into a solution for this.
  5. Job start times will be affected by desired memory limits. Jobs that request larger amounts of memory may take longer to start; jobs that request less memory will likely start sooner.

We have implemented these changes mainly to solve stability problems on the cluster, and testing indicates that this change will eradicate the recent instances of nodes crashing. Both memory and processors are limited resources, so it makes sense for the scheduler to be aware of both.

If you have any questions or concerns, please let us know.