System Resource Allocation Units
NREL uses an allocation unit (AU) to allocate and charge time used on its high-performance computing (HPC) systems.
NREL allocates time on compute nodes and space on its file systems and MSS (archive) system. A Kestrel CPU node hour is equal to 10 AUs if a job is set at normal priority, while a Kestrel GPU node hour is equal to 100 AUs. For comparison with other HPC systems, each Kestrel CPU node has a theoretical peak performance of 8,320 GigaFLOPS and has 104 cores. A Kestrel GPU node has a theoretical peak performance of 245.42 TeraFLOPS.
Computing Allocation Unit Charges
AU charges are calculated for each job run upon job completion. The cost of the job is computed using this formula:
Walltime in hours * Number of Nodes * QoS Factor * Charge Factor
Quality of Service Factor
The quality of service (QoS) factor reflects the priority given to a job. The default QoS factor for all jobs is 1, meaning the job is run at normal priority. A user can change the job to run at high priority by adding “--qos=high” when the job is submitted. Setting the job to high-priority will give the job a priority boost, so it runs sooner than other jobs in the queue. High-priority jobs have a QoS factor of 2. This means they will be charged at twice the normal rate.
Charge Factor
The conversion from node hours to AUs is the charge factor. See the table below for the charge factors on each system. Some systems allow for the use of fractional nodes. Keep in mind that when running jobs on Kestrel, users have the ability to request the number of GPUs needed for the job, which can be one, two or four GPUs available per node. The charge factor per GPU is the corresponding fraction ¼, ½ or 1.
System | Eagle | Kestrel | Swift | Vermilion |
---|---|---|---|---|
Charge Factor (CPU node) | 3 | 10 | 5 | Varies by partition |
Charge Factor (GPU node) | 3 | 100 | 50 | 12 |
For detailed information and examples on calculating AU charges, please see the NREL Systems documentation on GitHub for each cluster.
Users should take care to make efficient use of the node hours they request. Users whose codes are not parallelized are encouraged to run arrays of jobs on a whole node or use a fraction of a node, in order to efficiently make use of the AUs they will be charged.
Estimating Allocation Units for Allocation Requests
When possible, running jobs on NREL HPC systems prior to making an annual allocation request is encouraged, both to make sure the code is ready to go, and to run test jobs. You can request a pilot allocation at any time.
The basic formula for estimating AUs is:
Per-Job walltime in hours * Number of Nodes * Charge Factor * Number of Runs anticipated
Reasoning Behind Allocation Units
Giving out allocations in AUs, instead of node hours or core hours, lets NREL keep the definition of an allocation consistent as we move from HPC system to HPC system. This is a best practice that NREL borrowed from other HPC facilities in the U.S. Department of Energy system.
AUs are indexed to a node-hour on Peregrine—the first system deployed at the HPC User Facility within the Energy Systems Integration Facility. A Peregrine node (for purposes of the AU standard) was a 24-core Intel Xeon (Haswell) node, which had a theoretical peak performance of 883.2 GigaFLOPS.
Share