User Basics

Follow these instructions to start using NREL's high-performance computing (HPC) system resources.

Before you can access NREL HPC systems, you will need an HPC user account.

Internal Connection

If you are on the HPC VPN, NREL External VPN, or on site with an NREL device, you may access login nodes at kestrel.hpc.nrel.gov, which will round-robin forward you to one of the CPU login nodes:

kl1.hpc.nrel.gov
kl2.hpc.nrel.gov
kl3.hpc.nrel.gov

Two GPU login nodes are also available with software stacks suitable for compiling software compatible with the GPU compute nodes. These nodes have a different CPU than the The round-robin kestrel-gpu.hpc.nrel.gov will forward you to one of the following:

kl5.hpc.nrel.gov
kl6.hpc.nrel.gov

Similarly, DAV nodes can be accessed at kestrel-dav.hpc.nrel.gov, which will forward your sessions to one of:

kd1.hpc.nrel.gov
kd2.hpc.nrel.gov
kd3.hpc.nrel.gov
kd4.hpc.nrel.gov
kd5.hpc.nrel.gov
kd6.hpc.nrel.gov
kd7.hpc.nrel.gov

kl4 and kd8 are isolated in a DMZ and serve as the login nodes for external connections mentioned below.

External Connection

If you are an external HPC user, you will need a one-time password multifactor token (OTP) for two-factor authentication. Please request a multifactor token registration code if you did not receive one with your account.

Once you have a multifactor token, you may login directly to kestrel.nrel.gov or kestrel-dav.nrel.gov for login nodes or DAV nodes respectively. These will land your session on systems which are isolated in a DMZ.

Alternatively, you may connect to the NREL HPC VPN (which also requires an OTP; please see our instructions on connecting to the HPC VPN) to be able to resolve the hostnames listed in the Internal Connection section.

SSH Connection Examples

Here are examples of using SSH from a terminal to login to HPC systems:

Connect <username> to any <cluster> from the internal NREL network:

$ ssh [email protected]

Connect <username> to Kestrel on the Internal/NREL network:

$ ssh [email protected]

Connect <username> to Kestrel DAV nodes on the internal/NREL network:
$ ssh -Y [email protected]

Connect <username> to Kestrel from a non-NREL network:
$ ssh [email protected]

Please log out of all sessions once your work is complete. Idle sessions will be automatically disconnected after a period of inactivity, which may cause unsaved work to be lost.

Learn more about connecting to our systems.

Submitting Jobs

To get started quickly:

Use salloc with account, time limit, and resources/partition to request an interactive job
Use sbatch with a job script to submit a non-interactive job to the batch queue

Both of these commands require an account and walltime be specified with -A and -t.

Use srun during a job to submit executables to the pool of nodes within your job after using either of the commands above (if you use srun outside of a job, it will request a resource allocation for you similar to salloc).

Slurm will automatically route your job to the appropriate partition (logical collection of nodes) based on the hardware features, walltime, node quantity, and other attributes of your job submission, and insert your job into the queue to run at a scheduled time in the future.

Use squeue to observe the current status of the job queue

We have also constructed a streamlined PBS to Slurm Analogous Command Cheat Sheet to get experienced HPC users going quickly.

Accessing the System

Access to an NREL HPC system is available only from within the NREL firewall, by the secure shell (SSH) protocol 2.

You may be accustomed to the graphical interface of your laptop or personal workstation, but tasks on HPC systems are typically executed with a command line interface via a terminal application. If you are using a Mac, your computer already has a terminal application and SSH.

If you are connecting using a Windows system, Windows now supports ssh natively at the command line via the command prompt, in Powershell, or via Windows Subsystem for Linux.

Alternately, a terminal package such as PuTTY that supports SSH may need to be installed. Configuring PuTTY to connect to Kestrel is only necessary the first time Kestrel is accessed. You may then log into Kestrel using your HPC account username and password. For instructions on configuring PuTTY to connect to Kestrel, see connecting to HPC systems. You may also consider applications such as Git for Windows or Cmder, which provide a terminal-emulator and compatibility layer for many Linux commands in Windows—particularly of note is the ssh command—allowing you to emulate the workflow detailed below on your Windows device.

To access an HPC system from a system at NREL, start the terminal application and enter:

ssh <username>@kestrel.hpc.nrel.gov

...where <username> should be replaced with your NREL HPC username. Remember to finally execute this command by hitting the return or enter key, after which you will be prompted for your password once your SSH login request successfully reaches the system you specified.

Upon successfully logging into an HPC system, your command-line prompt should contain your username and the hostname of the system you landed on like so:

[username@kl1 ~] CPU $

Kestrel login nodes will also indicate whether the login node is a CPU node or a GPU node, because CPU and GPU nodes have different CPU architecture (Intel vs. AMD).

Note that $ prepends the prompt for input to your shell application. In our documentation, we also use $ as a symbol to indicate user-input of the text that follows as a command to your terminal, but since the prompt is placed by your terminal do not actually type $ before any commands. $ is a special character in Bash (the default shell application on our systems) when used in a command. The $ in terminal prompts typically indicates you are a standard user without elevated privileges in Linux operating systems.

HPC systems run Linux, specifically CentOS. If you are unfamiliar with Linux systems or a standard command-line workflow, we encourage viewing a quick-guide to getting started with the Linux command line, or one of the National Institute for Computational Sciences seminars available for HPC-centric command line usage. If these resources prove to be insufficient, a quick web-search should reveal no shortage of introductions to terminal usage.

Running Your Programs

At this point, you have started a session on one of the system's login nodes (denoted by the kl# hostname in your shell prompt such as kl4—short for "Kestrel Login node 4") which are systems that serve as a gateway to the rest of the system. Below is common etiquette you should follow to avoid inappropriate use:

Please do not run your intensive applications on login nodes as you are likely sharing them with dozens of other users who will notice the degradation in responsiveness and notify HPC Operations. If you need to run arbitrary commands in real time before making a batch job, please see Interactive Jobs. The system is comprised of thousands of non-login "compute nodes" which are dedicated to running your applications.
Your /home directory has an enforced capacity of 50GB and should only store utility files, not data for jobs. Please use /scratch and/projects accordingly for job data, as you will get much faster file-manipulation throughput on those filesystems. For more information on the intended usage of each of Kestrel's mountpoints, see NREL systems (GitHub).

Login nodes are a shared resource, and are subject to process limiting based on usage. Each user is permitted up to 8 cores and 100GB of RAM at a time, after which the Arbiter2 monitoring software will begin moderating resource consumption, reducing priority and restricting further processes by the user until usage is reduced to acceptable limits.

In this context, any software or task you wish to run on the compute nodes is referred to as a "job." Kestrel uses the Slurm Workload Manager to schedule jobs submitted by users across the system. Part of Slurm's responsibility is to make sure each user gets a fair, optimized timeshare of HPC resources including any specific hardware features (e.g. GPUs, nodes with extra RAM, etc). Jobs can be any executable file, whether this is a shell script, which invokes several commands, a precompiled binary with MPI functionality, or anything else you could launch from the command line.

The most common job submissions are shell scripts which contain calls to several programs within them. Below is an example of what such a script might resemble. It is a simple shell script that tells each compute node in the job to output its ID, which is a unique number that represents that particular node during the job's duration.

#!/bin/bash

#SBATCH -t 1:00
#SBATCH --job-name=node_rollcall
#SBATCH --output=node_rollcall.%j.out
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=1

echo "Running on $SLURM_JOB_NUM_NODES nodes: $SLURM_NODELIST"
srun bash <<< 'echo "I am $SLURMD_NODENAME and my ID is $SLURM_NODEID"'

Using a text editor, you can create a file and paste in the contents of the above codeblock. The most common terminal-based text editors are listed below with links to a quickstart guide for them:

vi - see Colorado State University's Basic vi Commands

emacs - see GNU Emacs web page

nano - see nano Command Manual

Assuming you name the file rollcall.slurm, here is how you would submit it as a job:

$ sbatch -A <project_handle> rollcall.slurm

...where <project_handle> is the handle for one of the HPC project allocations you are associated with (you may also specify this with an #SBATCH directive at the top of your batch script). Note that every job must have an account (handle) specified, and that arguments to sbatch and srun must precede the executable file or they will be ignored. The #SBATCH directives allow you to specify command line arguments without having to supply them each time you call sbatch, however these directives will be ignored by srun or invoking this script manually from within an interactive job.

HPC systems at NREL set the environment variable $NREL_CLUSTER to help you identify what cluster your scripts are running on. You may use this variable in your shell or sbatch scripts to write code that is more portable between different clusters at NREL.

For more thorough demonstrations of Slurm's functionality and sample batch scripts, see Slurm Job Scheduler (GitHub).

NREL HPC on GitHub

The HPC User Documentation Site hosted on GitHub Pages contains more detailed information on the HPC systems hosted at NREL, including system details, application support, performance advice, and other helpful information for beginners and advanced users.

The NREL HPC GitHub repository features more tips and tricks for developing effective workflows on HPC systems. Users are welcomed and encouraged to contribute information they think will benefit the whole community.