Introduction to HyPER-C3

Cluster Structure

The Emory Hybrid High-Performance Computing Platform for Education and Research Community Cloud Custer (HyPER C3) is a distributed computing system hosted on AWS at Emory. The cluster is designed to provide researchers with the computational resources they need to conduct research. HyperC3 cluster is HIPAA compliant. This means that the cluster is designed to protect the privacy and security of patient health information.

HyPer-C3 Structure

Account

Login

🔹 For Windows Users: Open Command Prompt and run the following commands (adjust the path as needed):

type C:\Users\YourUsername\ssh_ca.pub >> %USERPROFILE%\.ssh\known_hosts
echo @cert-authority cirrostratus.it.emory.edu >> %USERPROFILE%\.ssh\known_hosts
echo @cert-authority ondemand.it.emory.edu >> %USERPROFILE%\.ssh\known_hosts

🔹 For Mac/Linux Users: Open a terminal and run the following commands (adjust the path as needed):

echo "@cert-authority cirrostratus.it.emory.edu $(cat /path/to/ssh_ca.pub)" >> ~/.ssh/known_hosts
echo "@cert-authority ondemand.it.emory.edu $(cat /path/to/ssh_ca.pub)" >> ~/.ssh/known_hosts

Node

Storages

Cost

The use of the community compute partitions, the home storage and scratch storage are at no cost to the users. Group share over 1TB per Sponsor User (i.e. PI) will be charged at $0.3 per GB-mo.

Need help: hpc.help@emory.edu

Mount Remote Directory to Local Computer

mkdir -p /Users/jyang51/HYPERC3/ # create directory on local computer
sshfs jyang51@cirrostratus.it.emory.edu:/users/jyang51/ /Users/jyang51/HYPERC3/ -o auto_cache -ovolname=HYPERC3 -o follow_symlinks

Copying Data to/from the Cluster

  1. Command rsync is recommended

  2. General command by using scp as follows:

  3. SFTP

  4. Globus Data Transfer. To gain access to Globus / HySci, please email the Network Team at hysci.help@emory.edu.

  5. AWS CLI2

Handle Data Files on Linux Cluster

Command man [command] would give help page for the following commands, e.g., man rsync. Type q to exit from the help page.

  1. Command rsync is recommended for copying data on cluster
  2. Command cp also copys data within cluster
  3. Delete data or directory by rm
  4. Make directories by mkdir
  5. Move a directory by mv
  6. List all files under a directory by ls
  7. List all files with their sizes by ls -l -h
  8. Use vi or nano to edit text files on the cluster
  9. Read text file by less, cat
  10. Consider gzip your text file to save space by gzip
  11. Open gzipped text file by zcat [file_name.gz] | less -S
  12. Command man to see user manual/instructions of Linux tools, e.g., man ls
  13. Use pipe | to take output from the command before | as the input for the command after |
  14. Command awk is highlly recomended for handling large text files.
  15. Split your screen by tmux. See tmux cheat sheet.

Software Management

Available software

Conda

Using a software installed on the cluster

Singularity: A container platform for HPC.

Install a software without root access

Setup one's .bashrc file

General Job Workflow

Small analysis that does not require large data input/output

Analysis that requires larger data

Submit Jobs by SLURM

Basic SLURM commands

#!/bin/bash
#SBATCH --job-name=normal.R
#SBATCH -–account=general # Use account=’a100v100’ for a100 gpus
#SBATCH --nodes=1 # Number of nodes requested
#SBATCH --ntasks=1 # Number of tasks
SBATCH --partition=a10g-1-gm24-c32-m128 # Name of the partition
#SBATCH --time=4:00:00 # set 4hrs of running time. default time is 7-00:00:00

## This puts all output files in a separate directory.
#SBATCH --output=Out/normal.%A_%a.out
#SBATCH —error=Err/normal.%A_%a.err
--gpus=1 # Enter no.of gpus needed
#SBATCH --mem=8G # Memory Needed
#SBATCH --mail-type=begin # send mail when job begins
#SBATCH --mail-type=end # send mail when job ends
#SBATCH --mail-type=fail # send mail if job fails
#SBATCH --mail-user=[netid]@emory.edu # Replace with your mailid

## Submitting 10 instances of the bash commands listed below
#SBATCH —array=0-10

## For notification purposes. Use your Emory email address only!
#SBATCH —mail-user=<your_email_address>@emory.edu
#SBATCH --mail-type=END,FAIL

## The following are your commands
conda init bash > /dev/null 2>&1 #initializes Conda for Bash shell
source ~/.bashrc 
conda activate myenv
python example.py
Rscript /home/<user>/normal.R

Best Practices and Limitations for Environment Setup

Additional Resources