User Basics
Follow these instructions to start using NREL's high-performance computing (HPC) system resources.
Before you can access NREL HPC systems, you will need an HPC user account.
Internal Connection
If you are on the HPC VPN, NREL External VPN, or on site with an NREL device, you may access login nodes at eagle.hpc.nrel.gov, which will round-robin forward you to one of:
- el1.hpc.nrel.gov
- el2.hpc.nrel.gov
- el3.hpc.nrel.gov
Similarly, DAV nodes can be accessed at eagle-dav.hpc.nrel.gov, which will forward your sessions to one of:
- ed1.hpc.nrel.gov
- ed2.hpc.nrel.gov
- ed3.hpc.nrel.gov
- ed5.hpc.nrel.gov
- ed6.hpc.nrel.gov
el4 and ed7 are isolated in a DMZ and serve as the login nodes for external connections mentioned below.
External Connection
If you are an external HPC user, you will need a one-time password multifactor token (OTP) for two-factor authentication. Please request a multifactor token registration code if you did not receive one with your account.
Once you have a multifactor token, you may login directly to eagle.nrel.gov or eagle-dav.nrel.gov for login nodes or DAV nodes respectively. These will land your session on systems which are isolated in a DMZ.
Alternatively, you may connect to the NREL HPC VPN (which also requires an OTP; please see our instructions on connecting to the HPC VPN) to be able to resolve the hostnames listed in the Internal Connection section.
SSH Connection Examples
Here are examples of using SSH from a terminal to login to HPC systems:
$ ssh username@eagle.hpc.nrel.gov # Internal connection
$ ssh -Y username@eagle-dav.hpc.nrel.gov # Internal connection with graphical capabilities
$ ssh username@eagle.nrel.gov # External connection
Idle login sessions will be automatically logged out.
Learn more about connecting to our systems.
Submitting Jobs
To get started quickly:
- Use srun [...] --pty $SHELL to request an interactive job
- Use sbatch with a job script to submit a job to be ran without interaction
Both of these commands require an account and walltime be specified with -A and -t.
- Use srun during a job to submit executables to the pool of nodes within your job after using either of the commands above (if you use srun outside of a job, it will request a resource allocation for you similar to salloc).
Slurm will automatically route your job to the appropriate partition (known as a "queue" on Peregrine) based on the hardware features, walltime, node quantity, and other attributes of your job submission.
- Use squeue to observe the current status of the job queue
For a more thorough guide on job submission practices, please see running jobs (GitHub).
We have also constructed a streamlined PBS to Slurm Analogous Command Cheat Sheet to get experienced HPC users going quickly.
Accessing the System
Access to an NREL HPC system is available only from within the NREL firewall, by the secure shell (SSH) protocol 2.
You may be accustomed to the graphical interface of your laptop or personal workstation, but tasks on HPC systems are typically executed with a command line interface via a terminal application. If you are using a Mac, your computer already has a terminal application and SSH.
If you are connecting using a Windows systems, a terminal package such as PuTTY that supports SSH needs to be installed. Configuring PuTTY to connect to Eagle is only necessary the first time Eagle is accessed. You may then log into Eagle using your HPC account username and password. For instructions on configuring PuTTY to connect to Eagle, see connecting to HPC systems. You may also consider applications such as Git for Windows or Cmder, which provide a terminal-emulator and compatibility layer for many Linux commands in Windows—particularly of note is the ssh command—allowing you to emulate the workflow detailed below on your Windows device.
To access an HPC system from a system at NREL, start the terminal application and enter:
ssh <username>@eagle.hpc.nrel.gov
...where <username> should be replaced with your NREL HPC username. Remember to finally execute this command by hitting the return or enter key, after which you will be prompted for your password once your SSH login request successfully reaches the system you specified.
Upon successfully logging into an HPC system, your command-line prompt should contain your username and the hostname of the system you landed on like so:
[username@el1 ~]$
Note that $ prepends the prompt for input to your shell application. In our documentation, we also use $ as a symbol to indicate user-input of the text that follows as a command to your terminal, but since the prompt is placed by your terminal do not actually type $ before any commands. $ is a special character in Bash (the default shell application on our systems) when used in a command. The $ in terminal prompts typically indicates you are a standard user without elevated privileges in Linux operating systems.
HPC systems run Linux, specifically CentOS. If you are unfamiliar with Linux systems or a standard command-line workflow, we encourage viewing a quick-guide to getting started with the Linux command line, or one of the National Institute for Computational Sciences seminars available for HPC-centric command line usage. If these resources prove to be insufficient, a quick web-search should reveal no shortage of introductions to terminal usage.
At this point, you have started a session on one of the system's login nodes (denoted by the el# hostname in your shell prompt such as el4—short for "Eagle Login node 4") which are systems that serve as a gateway to the rest of the system. Below is common etiquette you should follow to avoid inappropriate use:
- Please do not run your intensive applications on login nodes as you are likely sharing them with dozens of other users who will notice the degradation in responsiveness and notify HPC Operations. If you need to run arbitrary commands in real time before making a batch job, please see Interactive Jobs. The system is comprised of thousands of non-login "compute nodes" which are dedicated to running your applications.
- Your /home directory has an enforced capacity of 50GB and should only store utility files, not data for jobs. Please use /scratch and/projects accordingly for job data, as you will get much faster file-manipulation throughput on those filesystems. For more information on the intended usage of each of Eagle's mountpoints, see NREL systems (GitHub).
Login nodes are a shared resource, and are subject to process limiting based on usage. Each user is permitted up to 8 cores and 100GB of RAM at a time, after which the Arbiter2 monitoring software will begin moderating resource consumption, restricting further processes by the user until usage is reduced to acceptable limits.
In this context, any software or task you wish to run on the compute nodes is referred to as a "job." Eagle uses the Slurm (Simple Linux Utility for Resource Management) Workload Manager to schedule jobs submitted by users across the system. Part of Slurm's responsibility is to make sure each user gets a fair, optimized timeshare of HPC resources including any specific hardware features (e.g. GPUs, nodes with extra RAM, etc). Jobs can be any executable file, whether this is a shell script, which invokes several commands, a precompiled binary with MPI functionality, or anything else you could launch from the command line.
The most common job submissions are shell scripts which contain calls to several programs within them. Below is an example of what such a script might resemble. It is a simple shell script that tells each compute node in the job to output its ID, which is a unique number that represents that particular node during the job's duration.
#!/bin/bash
#SBATCH -t 1:00
#SBATCH --job-name=node_rollcall
#SBATCH --output=node_rollcall.%j.out
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=1
echo "Running on $SLURM_JOB_NUM_NODES nodes: $SLURM_NODELIST"
srun bash <<< 'echo "I am $SLURMD_NODENAME and my ID is $SLURM_NODEID"'
Using a text editor, you can create a file and paste in the contents of the above codeblock. The most common terminal-based text editors are listed below with links to a quickstart guide for them:
vi - see Colorado State University's Basic vi Commands
emacs - see GNU Emacs web page
nano - see nano Command Manual
Assuming you name the file rollcall.slurm, here is how you would submit it as a job:
$ sbatch -A <project_handle> rollcall.slurm
...where <project_handle> is the handle for one of the HPC project allocations you are associated with (you may also specify this with an #SBATCH directive at the top of your batch script). Note that every job must have an account (handle) specified, and that arguments to sbatch and srun must precede the executable file or they will be ignored. The #SBATCH directives allow you to specify command line arguments without having to supply them each time you call sbatch, however these directives will be ignored by srun or invoking this script manually from within an interactive job.
HPC systems at NREL set the environment variable $NREL_CLUSTER to help you identify what cluster your scripts are running on. The below example shows how you can determine what cluster the script is running on and submit jobs differently to accommodate each cluster's differences:
if [[ ${NREL_CLUSTER} = "peregrine" ]]; then
qsub <batch_file> -A <project-handle>
elif [[ ${NREL_CLUSTER} = "eagle" ]]; then
sbatch <batch_file>
fi
For more thorough demonstrations of Slurm's functionality and sample batch scripts, see running Jobs (GitHub) and its child pages.
NREL HPC on GitHub
The NREL HPC GitHub repository features more tips and tricks for developing effective workflows on HPC systems. Users are welcomed and encouraged to contribute information they think will benefit the whole community.
Share