diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 8977551..95f4def 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -26,10 +26,11 @@ - [Challenges](./chapter2/challenges.md) - [M3](./chapter3/chapter3.md) + - [Getting Started](./chapter3/start.md) - [Logging In](./chapter3/login.md) - [Linux Commands](./chapter3/linux-cmds.md) - - [Compiling](./chapter3/compiling.md) - [M3's Shared Filesystem](./chapter3/shared-fs.md) + - [Software and Tooling](./chapter3/software-tooling.md) - [Bash Scripts](./chapter3/bash.md) - [Job batching & SLURM](./chapter3/slurm.md) - [Challenges](./chapter3/challenges.md) diff --git a/src/chapter3/bash.md b/src/chapter3/bash.md index 6700959..aada975 100644 --- a/src/chapter3/bash.md +++ b/src/chapter3/bash.md @@ -1 +1,42 @@ # Bash Scripts + +Bash is both a command line interface and a scripting language. Linux commands are generally using Bash. Bash scripts are a series of commands that are executed in order. Bash scripts are useful for automating tasks that you do often, or for running a series of commands that you don't want to type out every time. In our case, Bash scripts are used for running jobs on M3. + +In terms of use, Bash can encapsulate any command you would normally run in the terminal into a script that can be easily reused. For example you could have a script that automatically navigates to a directory and activates a virtual environment, or a script that compiles and runs a C program. + +The basic syntax of a bash file is as follows: + +```bash +#!/bin/bash + +# This is a comment + +echo "Hello World" +``` + +We can save this file as `hello.sh` and run it using the following command: `source hello.sh`. This will print `Hello World` to the terminal. + +Let's walk through the file. The first line is `#!/bin/bash`. This is called a shebang, and it tells the system that this file is a bash script. The second line is a comment, and is ignored by the system. The third line is the actual command that we want to run. In this case, we are using the `echo` command to print `Hello World` to the terminal. + +Bash can do a lot more, including basic arithmetic, if statements, loops, and functions, however these are not really necesary for what we are doing. If you want to learn more about bash, you can find a good tutorial [here](https://linuxconfig.org/bash-scripting-tutorial). + +For our use, the main things we need to be able to do are to run executables and files. We do this the exact same way as if manually running them in the terminal. For example, if we want to run a python script, we can do the following: + +```bash +#!/bin/bash + +# This will run hello.py using the python3 executable +python3 hello.py +``` + +If we want to compile and then run a C program, we can do the following: + +```bash +#!/bin/bash + +# This will compile hello.c and then run it +gcc hello.c -o hello +./hello +``` + +Using bash scripts not only saves a lot of time and effort, but it also makes it easier to run jobs on M3 using SLURM. We will go over how to do this in the next section. diff --git a/src/chapter3/challenges.md b/src/chapter3/challenges.md index 9358534..21677c7 100644 --- a/src/chapter3/challenges.md +++ b/src/chapter3/challenges.md @@ -1 +1,15 @@ # Challenges + +## Challenge 1 + +Something simple to start off. Create a bash script called `hello.sh` that prints "Hello World" to the screen. Submit this job to the queue using `sbatch`. Check the status of the job using `squeue`. Once the job has finished, check the output using `cat`. You can find the output file in the directory you submitted the job from. + +## Challenge 2 + +Something a bit more involved. Clone your [challenges repository](https://github.com/MonashDeepNeuron/HPC-Training-Challenges.git) into your personal folder in the scratch directory. Then, in this same directory, create a submission script that will install python 3.10 using miniconda, create a virtual environment, install the necessary dependencies, and clone and run the `alexnet_stl10.py` script in the M3 section. Remember, don't directly load python using module, follow the instructions in the [software tooling](./software-tooling.md#python) chapter. +Once completed, commit and push your changes as well as the output. + +## Challenge 3 + +A continuation of challenge 2. Edit your submission script so that you get a gpu node, and run the script using the gpu. +Commit and push your changes as well as the output. diff --git a/src/chapter3/chapter3.md b/src/chapter3/chapter3.md index 3152926..16097dd 100644 --- a/src/chapter3/chapter3.md +++ b/src/chapter3/chapter3.md @@ -1 +1,7 @@ # M3 + +[M3](https://docs.massive.org.au/M3/index.html) is part of [MASSIVE](https://https://www.massive.org.au/), which is a High Performance Computing facility for Australian scientists and researchers. Monash University is a partner of MASSIVE, and provides as majority of the funding for it. M3 is made up of multiple different types of servers, with a total of 5673 cores, 63.2TB of RAM, 5.6PB of storage, and 1.7 million CUDA cores. + +M3 utilises the [Slurm](https://slurm.schedmd.com/) workload manager, which is a job scheduler that allows users to submit jobs to the cluster. We will learn a bit more about this later on. + +This book will take you through the basics of connecting to M3, submitting jobs, transferring data to and from the system and some other things. If you want to learn more about M3, you can read the [M3 documentation](https://docs.massive.org.au/M3/index.html). This will give you a more in-depth look at the system, and how to use it. diff --git a/src/chapter3/compiling.md b/src/chapter3/compiling.md deleted file mode 100644 index 6d6a98c..0000000 --- a/src/chapter3/compiling.md +++ /dev/null @@ -1 +0,0 @@ -# Compiling diff --git a/src/chapter3/imgs/aaf.png b/src/chapter3/imgs/aaf.png new file mode 100644 index 0000000..b68eaa8 Binary files /dev/null and b/src/chapter3/imgs/aaf.png differ diff --git a/src/chapter3/imgs/gurobi.png b/src/chapter3/imgs/gurobi.png new file mode 100644 index 0000000..25a8047 Binary files /dev/null and b/src/chapter3/imgs/gurobi.png differ diff --git a/src/chapter3/imgs/gurobi2.png b/src/chapter3/imgs/gurobi2.png new file mode 100644 index 0000000..3e5c4de Binary files /dev/null and b/src/chapter3/imgs/gurobi2.png differ diff --git a/src/chapter3/imgs/hpcid.png b/src/chapter3/imgs/hpcid.png new file mode 100644 index 0000000..be747b6 Binary files /dev/null and b/src/chapter3/imgs/hpcid.png differ diff --git a/src/chapter3/imgs/join_project.png b/src/chapter3/imgs/join_project.png new file mode 100644 index 0000000..070d055 Binary files /dev/null and b/src/chapter3/imgs/join_project.png differ diff --git a/src/chapter3/imgs/putty_key_not_cached.png b/src/chapter3/imgs/putty_key_not_cached.png new file mode 100644 index 0000000..6bd2ed3 Binary files /dev/null and b/src/chapter3/imgs/putty_key_not_cached.png differ diff --git a/src/chapter3/imgs/putty_start.png b/src/chapter3/imgs/putty_start.png new file mode 100644 index 0000000..9043566 Binary files /dev/null and b/src/chapter3/imgs/putty_start.png differ diff --git a/src/chapter3/linux-cmds.md b/src/chapter3/linux-cmds.md index b148553..7057ccc 100644 --- a/src/chapter3/linux-cmds.md +++ b/src/chapter3/linux-cmds.md @@ -1 +1,47 @@ # Linux Commands + +Even if you are already familiar with linux, please read through all of these commands, as some are specific to M3. + +## Basic Linux Commands + +| Command | Function | +| --- | --- | +| `pwd` | prints current directory | +| `ls` | prints list of files / directories in current directory (add a `-a` to list everything, including hidden files/directories | +| `mkdir` | makes a directory | +| `rm ` | deletes *filename*. add `-r` to delete directory. add `-f` to force deletion (be really careful with that one) | +| `cd ` | move directory. | +| `vim` or `nano` | bring up a text editor | +| `cat ` | prints contents of file to terminal | +| `echo` | prints whatever you put after it | +| `chmod ` | changes permissions of file | +| `cp` | copy a file or directory| +| `mv ` | move or rename file or directory | + +> Note: `.` and `..` are special directories. `.` is the current directory, and `..` is the parent directory. These can be used when using any command that takes a directory as an argument. Similar to these, `~` is the home directory, and `/` is the root directory. For example, if you wanted to copy something from the parent directory to the home directory, you could do `cp ../ ~/`, without having to navigate anywhere. + +## Cluster Specific Commands + +| Command | Function | Flags +| --- | --- | --- | +| `show_job` | prints information about your jobs | +| `show_cluster` | prints information about the cluster | +| `user_info` | prints information about your account | +| `squeue` | prints information about your jobs | `-u ` to print information about a specific user | +| `sbatch ` | submit a job to the cluster | +| `scontrol show job ` | prints information about a specific job | +| `scancel ` | cancel a job | + +## M3 Specific Commands + +| Command | Function | +| --- | --- | +| `module load ` | load a module | +| `module unload ` | unload a module | +| `module avail` | list available modules | +| `module list` | list loaded modules | +| `module spider ` | search for a module | +| `module help ` | get help for a module | +| `module show ` | show details about a module | +| `module purge` | unload all modules | +| `module swap ` | swap two modules | \ No newline at end of file diff --git a/src/chapter3/login.md b/src/chapter3/login.md index 40f7307..df7ad61 100644 --- a/src/chapter3/login.md +++ b/src/chapter3/login.md @@ -1 +1,77 @@ # Logging In + +First you will need to ssh into a login node in the cluster. You do this by doing the following: + +## Windows + +If you are using windows, the best way to ssh into m3 is by using [puTTY](https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html). + +Once installed and opened, you will see a page like this: + +![puTTY config page](./imgs/putty_start.png) + +Type in your m3 username followed by `@m3.massive.org.au` and press enter or the Open button. + +If it the first time accessing M3 from this client then you may see something like this: + +![puTTY auth page](./imgs/putty_key_not_cached.png) + +Just click Accept, and puTTY will add the cluster's ssh fingerprint to cache. + +## Mac / Linux + +On macOS or linux, ssh is built into the terminal, so just copy the following into your shell, substituting username for your username. + +```bash +ssh username@m3.massive.org.au +``` + +You may get a similar warning to the above image about the server identity, just type `yes` or `Y` to accept it and add the ssh key to cache. + +Everything from now on will be the same across whatever computer you are using to access the cluster. + +The first thing to pop up will be a request for a password. Don't worry when you don't see your cursor moving when typing, this is just for security. Your password is still being recorded. + +Once you have logged in, you will come to a page that looks like this: + +```txt ++----------------------------------------------------------------------------+ +| Welcome to MASSIVE M3 | +| | +| For assistance please contact help@massive.org.au or (03) 9902 4845 | +| The MASSIVE User Guide https://docs.massive.org.au | ++----------------------------------------------------------------------------+ + + - Useful Slurm Commands: + squeue + sbatch + scontrol show job + scancel + + - Massive User Scripts: + show_job + show_job + show_cluster + user_info + + - Slurm Sample Scripts are Here: + /usr/local/hpcusr/latest/training/samples/slurm/ + + - We recommend using smux to compile and test code on compute nodes. + - How to use smux: https://docs.massive.org.au/M3/slurm/interactive-jobs.html + + For more details, please see: + https://docs.massive.org.au/M3/slurm/slurm-overview.html +------------------------------------------------------------------------------ + +Use MASSIVE Helpdesk to request assistance with MASSIVE related computing +questions and problems. Email to help@massive.org.au and this will generate +a ticket for your issue. + +------------------------------------------------------------------------------ + + +[jasparm@m3-login2 ~]$ +``` + +Once you are done and want to logout, just type `exit`. This will close the connection. diff --git a/src/chapter3/shared-fs.md b/src/chapter3/shared-fs.md index 9861660..aae9384 100644 --- a/src/chapter3/shared-fs.md +++ b/src/chapter3/shared-fs.md @@ -1 +1,58 @@ # M3's Shared Filesystem + +When we talk about a shared filesystem, what we mean is that the filesystem that M3 uses allows multiple users or systems to access, manage, and share files and directories over a network, concurrently. It enables users to collaborate on projects, share resources, and maintain a unified file structure across different machines and platforms. In addition to this, it enables the many different compute nodes in M3 to access data from a single source which users also have access to, simplifying the process of running jobs on M3. + +Very simply, the way it works is that the home, project and scratch directories are mounted on every node in the cluster, so they are accessible from any node. + +M3 has a unique filesystem consisting of three main important parts (for you). + +## Home Directory + +There is each user's personal directory, which only they have access to. This has a ~10GB allocation, and should store any hidden files, configuration files, or other files that you don't want to share with others. This is backed up nightly. + +## Project Directory + +This is the shared project directory, for all users in MDN to use. This has a ~1TB allocation, and should be used only for project specific files, scripts, and data. This is also backed up nightly, so in the case that you accidentally delete something important, it can be recovered. + +## Scratch Directory + +This is also shared with all users in MDN, and has more allocation (~3TB). You may use this for personal projects, but keep your usage low. In general it is used for temporary files, larger datasets, and should be used for any files that you don't need to keep for a long time. This is not backed up, so if you delete something, it's gone forever. + +## General Rules + +- Keep data usage to a minimum. If you have a large amount of data, consider moving it to the scratch directory. If it is not necessary to keep it, consider deleting it. +- Keep your home directory clean. +- In general, it is good practice to make a directory in the shared directory for yourself. Name this your username or name, to make it easily identifiable. This is where you should store your files for small projects or personal use. +- The project directory is not for personal use. Do not store files in the project directory that are not related to MDN. Use the scratch directory instead. + +## Copying files to and from M3 + +### Using scp + +You can copy files to M3 using the `scp` command. This is a command line tool that is built into most linux distributions, and is available on Windows through [PuTTY](https://www.putty.org/). + +#### Linux / Mac + +To copy a file to M3, use the following command: + +```bash +scp @m3.massive.org.au: +``` + +For example, if I wanted to copy a file called `test.txt` to my home directory on M3, I would use the following command: + +```bash +scp test.txt jasparm@m3.massive.org.au:~ +``` + +To copy a file from M3 to your local machine, use the following command: + +```bash +scp @m3.massive.org.au: +``` + +So, to bring that same file back to my local machine, I would use the following command: + +```bash +scp jasparm@m3.massive.org.au:~/test.txt . +``` diff --git a/src/chapter3/slurm.md b/src/chapter3/slurm.md index 8c10c4e..d6b623d 100644 --- a/src/chapter3/slurm.md +++ b/src/chapter3/slurm.md @@ -1 +1,70 @@ # Job batching & SLURM + +Launching and running jobs on M3 is controlled by [SLURM](https://slurm.schedmd.com/). You don't really need to know a lot about it in order to use it, so this section will take you through the basics of what you will need for what we are doing. + +If you want a complete guide on SLURM in M3, you can find it [here](https://docs.massive.org.au/M3/slurm/slurm-overview.html). + +## Submitting simple jobs + +As we discussed in the previous section we use bash scripts to run jobs on M3. We can submit these jobs using the `sbatch` command. For example, if we have a bash script called `hello.sh` that contains the following: + +```bash +#!/bin/bash + +#SBATCH --ntasks=1 +#SBATCH --mem=1MB +#SBATCH --time=0-00:01:00 +#SBATCH --job-name=hello +#SBATCH --partition=m3i +#SBATCH --mail-user=jmar0066@student.monash.edu +#SBATCH --mail-type=BEGIN,END,FAIL + +echo "Hello World" +``` + +We can submit this job using the following command: + +`sbatch hello.sh` + +This will submit the job to the queue, and you will get an email when the job starts, finishes, or fails. You can also check the status of your job using the `squeue` command. + +## Options + +You might have noticed the `#SBATCH` lines in the bash script. These are called options, and they tell SLURM how to run the job. The options we used in the example above are: + +- `ntasks`: The number of tasks or processes to run. +- `mem`: The amount of memory to allocate to the job. +- `time`: The maximum amount of time the job can run for. +- `job-name`: The name of the job. Up to 15 characters. +- `partition`: The partition to run the job on. +- `mail-user`: The email address to send job status emails to. +- `mail-type`: The types of emails to send. + +> Note: In the case of M3, a task is essentially the same as a process. This is **not** the same as a cpu core. You can have a task that uses one or multiple cores. You can also have multiple tasks comprising the same job, each with one or multiple cores being utilised. It can get quite confusing, so if you are unsure about what you need, just ask. There is also more information in the M3 docs. + +There are a lot more options that you can use, and you can find a more complete list [here](https://docs.massive.org.au/M3/slurm/simple-batch-jobs.html). + +In particular, if you want to run multithreading or multiprocessing jobs, or you need a gpu, there are more options you need to configure. + +## Interactive jobs + +Sometimes you might want to actually connect to the node that you are running your job on, in order to see what is happening or to set it up before running the job. You can do this using the `smux` command. Similar to regular batch jobs, you can set options when you start the interactive session. An example of this is: + +`smux new-session --ntasks=1 --time=0-00:01:00 --partition=m3i --mem=4GB` + +This will start an interactive session on a node with 1 cpu, 1 minute of time, and 4GB of memory. There are again other options available, and you can find a more complete explanation [here](https://docs.massive.org.au/M3/slurm/interactive-jobs.html). + +### Connecting to interactive jobs + +Typically when you start an interactive job it will not start immediately. Instead, it will be queued up once it has started you will need to connect to it. You can do this by running `smux a`, which will reconnect you to the session. If you want to disconnect from the session but leave it running, you can press `Ctrl + b` followed by `d`. This will disconnect you from the session, but leave it running. You can reconnect to it later using `smux a`. If you want to kill the session, if you are connected just run `exit`, otherwise if you are in a login node run `scancel `. You can find the job id using `show_job`. + +## Checking the status of jobs, finding out job IDs, and killing jobs + +A couple of useful commands for general housekeeping are: + +- `squeue`: This will show you the status of all jobs currently running on M3. +- `show_job`: This will show you the status of all jobs you have submitted. +- `squeue -u `: This will show you the status of all jobs submitted by a particular user currently running. +- `scancel `: This will kill a job with a particular job id. +- `scancel -u `: This will kill all jobs submitted by a particular user. +- `show_cluster`: This will show you the status of the cluster, including any nodes that are offline or in maintenance. diff --git a/src/chapter3/software-tooling.md b/src/chapter3/software-tooling.md new file mode 100644 index 0000000..b832d3f --- /dev/null +++ b/src/chapter3/software-tooling.md @@ -0,0 +1,91 @@ +# Software and Tooling + +Software and development tooling is handled a little differently on M3 than you might be used to. In particular, because M3 is a shared file system, you do not have access to `sudo`, and you cannot install software on the system manually. Instead, you will need to use the `module` command to load software and development tools. + +## Module + +The `module` command is used kind of as an alternative to package managers like `apt` or `yum`, except it is managed by the M3 team. It allows you to load software and development tools into your environment, and is used to load software on M3. To see a comprehensive list of commands go [here](./linux-cmds.md#m3-specific-commands). + +In general, however, you will only really need to use `module load` and `module unload`. These commands are used to load and unload software and development tools into your environment. + +For most of the more popular software packages, like gcc, there are multiple different versions available. You will need to specify which version you want to load based on your needs. + +## C + +### GCC + +To load GCC, you can run the following command: + +```bash +module load gcc/10.2.0 +``` + +This will load GCC 10.2.0 into your environment, and you can use it to compile C/C++ programs as described in the [Intro to C](../chapter2/intro-to-c.md) chapter. To unload GCC, you can run the following command: + +```bash +module unload gcc/10.2.0 +``` + +## Python + +Python is a bit of a special case on M3. This is because of how many different versions there are, as well as how many different packages are available. To make things easier, it is recommended that you use miniconda or anaconda to manage your python environments instead of using the system python. + +These instructions are based off the M3 docs, which can be found [here](https://docs.massive.org.au/M3/software/pythonandconda/pythonandconda.html#pythonandconda). + +### Miniconda + +#### Installing Miniconda + +To install Miniconda on M3, there is a dedicated install script that you can use. This will install miniconda into your default scratch space, i.e. `/vf38_scratch//miniconda3`. To install miniconda, run the following command: + +```bash +module load conda-install + +# To install miniconda to the default location +conda-install + +# To install miniconda to a custom location +conda-install your/install/location +``` + +#### Activating Miniconda + +To activate the base conda environment, run the following command: + +```bash +source your/install/location/miniconda/bin/activate +``` + +You will notice that once activated, `(base)` will appear in the prompt before your username. + +To create and activate Python environments within Miniconda, follow these steps: + +```bash +# Create a new environment +# Change env-name to whatever you want to call your environment +conda create --name env-name python=3.10 + +# Activate the environment +conda activate env-name +``` + +#### Managing Python packages + +Use the following commands to install and manage Python packages: + +```bash +# Install a package +conda install package-name + +# Update a package +conda update package-name + +# You can also change the version of packages by adding a = and the version number + +# Remove a package +conda remove package-name +``` + +#### Deactivating Miniconda + +To deactivate the conda environment you are in, run `conda deactivate`. To exit conda entirely run `conda deactivate` again. You will know you have fully exited conda when `(base)` is no longer in the prompt. diff --git a/src/chapter3/start.md b/src/chapter3/start.md new file mode 100644 index 0000000..ec39953 --- /dev/null +++ b/src/chapter3/start.md @@ -0,0 +1,31 @@ +# Getting Started + +## Request an account + +In order to access M3, you will need to request an account. To do this, follow this link: [HPC ID](https://hpc.erc.monash.edu.au/karaage/aafbootstrap). This should take you to a page this this: ![HPC ID](./imgs/aaf.png) + +Type in Monash, as you can see here. Select Monash University, and tick the Remember my organisation box down the bottom. Once you continue to your organisation, it will take you to the Monash Uni SSO login page. You will need to login with your Monash credentials. + +You should now see something like this: ![HPC ID System](./imgs/hpcid.png) + +Once you are here, there are a couple things you will need to do. The first, and most important is to set your HPC password. This is the password you will use to login to M3. To do this, go to home, then click on Change Linux Password. This will take you through the steps of setting your password. + +Once you have done this, you can move on to requesting access to the MDN project and getting access to gurobi. + +## Add to project + +To request to join the MDN project, again from the Home page click on Join Exiting Project. You should see a screen like this: ![Join Project](./imgs/join_project.png) + +In the text box type `vf38` and click search. This is the project code for MDN. Then select the project and click submit. You will now have to wait for the project admins to approve your request. Once they have done this, you will be able to access the project. This should not take longer than a few days, and you will get an email telling you when you have access. + +## Access gurobi + +As part of the work that we do, you will need access to [Gurobi](https://www.gurobi.com). M3 has an agreement with Gurobi, however you need to specifically request access to use it. + +To do this, in HPC ID, click on Software Agreements on the left. Then Add Software. This will bring up a list of all available software on M3. Scroll down until you find gurobi, and click on it. This will bring up a page like this: ![Gurobi](./imgs/gurobi.png) + +Except instead of saying accepted at the bottom it will ask you to tick a box saying that you agree to TOS ![TOS](./imgs/gurobi2.png) + +Click I accept, and follow the steps. It should tell you that you will have to wait for approval to access gurobi. This should not take longer than a few days, and you will get an email telling you when you have access. + +Once you have access to everything, you are ready to get started with M3. Good job!!