wfngrid

Wings for NONMEM on the NESI HPC3

Last Updated: 26 June 2025

Prerequisites

You will need to have installed WFN on a Windows computer. WFN is a command line tool for using NONMEM.

Wings for NONMEM (WFN) has links to the New Zealand eScience Infrastructure (NeSI) system high performance computing resources (HPC3). When using WFN this is primarily used to run NONMEM jobs using parallel processing on multiple CPUs (‘nodes’) on HPC3.

The link between your Windows computer and HPC3 is managed by the Remote Job Management (RJM) system. RJM is a set of tools created by NeSI most recently maintained by Chris Scott. In order to use the RJM system you will need to run rjm_configure.

IMPORTANT
NeSI HPC3 is only accessible by applying for an account and registration of individual users.

Apply for access with https://www.nesi.org.nz/researchers/apply-access-our-services before trying to run rjm-configure. This only needs to be done once for each user.

rjm-configure needs to be run on each computer you plan to connect to HPC3. rjm-configure is located in the wfn7\bin directory. You can run it by opening a WFN window and typing

rjm_configure

You will be prompted to provide a NeSI user name (used to apply for access to NeSI) and a NeSI project name (uoa00106 for Auckland Pharmacometrics Group users). rjm-configure will suggest a directory name for storing NONMEM related files on HPC3. You should accept the suggested name.

During the process of configuration you will be prompted to provide credentials for access to NeSI. The images below show examples. You should accept the defaults.

A screenshot of a web page

AI-generated content may be incorrect.

A screenshot of a computer

AI-generated content may be incorrect.

Once you completed rjm-configure you can start using WFN to submit NONMEM jobs to HPC3.

Managing Files on HPC3 and Checking on HPC3 job progress

Using ondemand to Access your NeSI home directory

Open ondemand

1. https://ondemand.nesi.org.nz/public/

2. Click on Login (you may be prompted to show a 2 factor authentication code shown on Google Authenticator).

3. The ondemand system opens in the Dashboard page with the following options:

4. Open a File manager window by clicking on Files so you can see the directories and files stored in your NeSI home directory.

5. Open an HPC3 NeSI terminal window by clicking on Clusters then _NeSI HPC Shell Access or use this link:

https://ondemand.nesi.org.nz/pun/sys/shell/ssh/login.hpc.nesi.org.nz

This terminal windows gives you command line access to your files stored on the HPC3 system. This is mainly used to query how jobs are running on HPC3 using slurm commands. Slurm is a specialized operating system used to manage multiple users with access to the HPC3 system of compute nodes.

Ask for a long directory listing with the command ls -l

A screenshot of a computer

AI-generated content may be incorrect.

This is my NeSI home directory. There are two symbolic links MY_PROJECT and MY_RJM that provide a simple way to access the uoa00106 project directory and your personal set of rjm-job files. It is recommended to create these links by copying and pasting the following commands into the nesi terminal window. Be sure to change nesiname to your NeSI user name

ln -s /nesi/nobackup/uoa00106 ~/MY_PROJECT

ln -s /nesi/nobackup/uoa00106/nesiname/rjm-jobs/ ~/MY_RJM

The ~ before the symbolic link name refers to your NeSI home directory. This is the location where the symbolic link will be created.

Advanced RJM Tools Options

The environment variable rjm_dir may be set to point to a different directory for running jobs on HPC3 e.g.

set rjmdir=/projects/myProject/myUPI/rjm_PK

You should change myProject to your project and myUPI to your personal identifier. This can be useful if you are running batches of jobs that you want to be able to easily identify.

Using WFN with NeSI

1. Open a WFN window. You should now be able to use NeSI by calling nmgog:

nmgog theopd

2. The nmgog command will start the job on HPC3. When it finishes you should see the usual results that are displayed by WFN.

3. The nmgog, nmbsg, nmbsig, nmrtg and nmgosimg commands work similarly to nmgo,nmbs, nmbsi, nmrt and nmgosim but submit NONMEM runs to the cluster. The number of cpus is set by default to 4 and walltime to 4:0:0 (4 hours) for cluster jobs. These defaults can be changed by setting the cpus and walltime environment variables before the WFN commands.

set CPUS=24

Note that if you ask for a lot of CPUs your job may be put into a wait queue until there are enough CPUs available.

The WALLTIME variable is specified in the format hh:mm:ss. It controls the total run time for your job. You might estimate this from a run on a typical Windows machine and divide by 2 (it should be at least 2 times faster with 4 CPUs).

The default time for checking that the job is finished is 10 seconds. You may set the BATCHWAIT variable to a more suitable time if you have long jobs.

Request up to 24 hours for job to run

set WALLTIME=24:00:00

Check every 60 seconds to see if jobs have finished

set BATCHWAIT=60

4. The default memory requested is 500 gigabytes. If a run fails then check stderr.txt in the WFN results folder. This may indicate not enough memory. Try increasing the memory request in steps of 500 megabytes using the NMMEM variable. An error such as “compiler failure” also suggests increasing the memory.

The memory request can be changed with the NMMEM environment variable. Note that memory size must be specified as an integer with M or G suffix.

Request 500 megabytes of memory:

set NMMEM=500M

Request 2 gigabytes of memory:

set NMMEM=2G

Using nfmenv.bat

The Windows environment options such as CPUS and NMMEM may need to be tailored to specific NONMEM projects. The nmenv.bat script is useful for specifying these environment variables when opening a WFN window. The nmenv.bat should be placed in the directory where the WFN window opens. This is typically the NONMEM directory used to store NM-TRAN control streams for the project. The title command in nmenv.bat is useful for showing key environment variables.

@echo off

rem nmenv.bat

set args7=

set CPUS=20

set NMMEM=2G

title Project CPUS=%CPUS%

The args7 variable is by default set to -prdefault in order to use pre-compiled PREDPP code. When the NM-TRAN specifies changes to SIZES then the args7 variable should be unset. It is often useful to put this in nmenv.bat. Note that if you wish to run nmgo at the command line it is important to unset the CPUS variable. The CPUS variable should only be set when using WFN commands that use HPC3.

Slurm Commands

1. Memory usage for jobs on a particular date can be obtained using this command at the login node command prompt (use Mobaxterm) with a suitable user id and date:

sacct -u nesiname -S 2018-12-10

or if you have a jobid (eg. From using squeue while the job is running) this will show details of a running or completed job. The completed job stats show the maximum memory usage (maxRSS). This may be useful in estimating the requested memory size (see nmmem).

sacct -j jobid

2. Job status can be obtained using this command at the login node command prompt with a suitable user id:

squeue -u nesiname

3. The job status of all users in the Auckland Pharmacometrics Group project can be displayed using:

squeue -A uoa00106

4. The squeue command shows job numbers which can be used with sview. The sview command can be used to find information about each job but is rather clumsy to use when trying to find a particular job.

sview

5. From mid-February 2021 WFN maintains a log of jobs run using NeSI. WFN includes a smrg command that retrieves job statistics derived from the job numbers and merges the output of nm_seff and sacct to show CPU, memory and walltime efficiency.

smrg

The results of smrg are collected into a file called smrg_stat.csv. This can be viewed using Excel and specific rows selected using the Excel data filter. Each set of job statistics starts with the date and time of the run and the job ID (slurm job number).

This example shows 3 rows selected with Date containing ’04-06’. The results for each job include the user name, the state of the job when it finished, and 4 sections with statistics describing efficiency of requested CPU, memory and walltime.

The CPU section shows the number of CPUs requested and the CPU efficiency. When CPUeff% is less than 2% it typically means the job finished with an error detected by NM-TRAN or NONMEM. Models similar to Job number 18921115 which ran successfully had a CPUeff% of 99%. For reasons currently unexplained jobs running with NM7.5.0 report a State of ‘FAILED’ even though the NONMEM job completed normally. The theopd test job ran successfully but with low CPU efficiency because each individual has only a few observations.

Date	Job ID	User	State	CPUS	CPUeff%
2021-04-06_17.51.03.219209	18921115	jmor616	COMPLETED	24	1.25
2021-04-06_08.22.38.430877	18924529	nhol004	FAILED	4	16.67
2021-04-06_08.44.33.746999	18925119	nhol004	COMPLETED	1	29.58

The memory section shows the requested memory (ReqMem). By default this is 250Mc for NONMEM jobs. The ‘c’ suffix indicates this is memory requested per core (similar to per CPU). The user can request more memory by changing the NMMEM environment variable which is also shown in this section. The main memory demand for NONMEM is during job compilation and the memory required is similar for both small and large NONMEM models. The Memory values shows the memory used by all the parallel tasks. This increases in proportion to the number of CPUs. The MaxRSS statistic is hard to interpret. One definition is "Maximum individual resident set size out of the group of resident set sizes associated with all tasks in job." When CPUeff% is high it is usually several times bigger than Memory. It is not clear how it can be smaller than Memory. The MEMeff% statistic is somehow related to ReqMem and Memory or MaxRSS but it is not clear how.

Date	Job ID	ReqMem	NMMEM	Memory	MaxRSS	MEMeff%
2021-04-06_17.51.03.219209	18921115	250Mc	250M	11.72GB	24K	0.16
2021-04-06_08.22.38.430877	18924529	250Mc	250M	1.95GB	759K	0.14
2021-04-06_08.44.33.746999	18925119	20Mc	20M	40.00MB	94K	15.94

The walltime section shows the Elapsed time the job ran (d-hh:mm:ss), the TotalCPU time which is approximately Elapsed time multiplied by the number of CPUs, and Walltime (the clock time that elapses from starting to finishing the job). The WallEff% is the percent of Walltime taken by Elapsed.

Date	Job ID	Elapsed	TotalCPU	Walltime	WallEff%
2021-04-06_17.51.03.219209	18921115	0:00:10	00:00.0	1-00:00:00	0.01
2021-04-06_08.22.38.430877	18924529	0:00:03	00:07.1	4:00:00	0.1
2021-04-06_08.44.33.746999	18925119	0:03:33	0:00:00	0:05:00	71

The final section indicates the NONMEM version (NMVER) and the job name. For NONMEM jobs this will be the same as the model file name with a suffix indicating the Windows shell command counter for that job (‘_cmd1’). WFN cluster commands that run multiple jobs in parallel such as nmbsg are filtered to show only the first shell command.

Date	Job ID	NMVER	Name
2021-04-06_17.51.03.219209	18921115	744	sevo_fixPKPD_popM_popP_popR_fixed_fixE0_d_cmd1
2021-04-06_08.22.38.430877	18924529	750	theopdg_cmd1
2021-04-06_08.44.33.746999	18925119	750	smrg_cmd1

The smrg command may be run at any time. It takes just under 4 minutes to complete with a 2 month collection of jobs. If someone else is running smrg you will get a warning message and the previous most recent version of the smrg_stat.csv file will be copied to the directory you used to call smrg.

6. Other commands to find out about jobs are described here:

https://support.nesi.org.nz/hc/en-gb/articles/360000205215-Useful-Slurm-Commands

Ondemand Dashboard Apps

The ondemand dashboard provides several apps that might be useful.

These apps require the user to configure some settings for the app. Once launched the app takes a minute or so to set itself up. Once it is running you can then click on Launch to get it started. You can use the dashboard to cancel a running app so that it no longer consumes HPC3 resources.

A screenshot of a computer

AI-generated content may be incorrect.

The Virtual Desktop looks like this:

A computer screen shot of a mouse

AI-generated content may be incorrect.

The Virtual Desktop offers a File System view. By clicking on your nesiname you can see directories and files in your home directory. This is an alternative to using the Files view offered in the ondemand dashboard.

It also offers a terminal view.

The Jupyter lab app offers a combined File system and terminal view.

It is a matter of personal preference which app you use to manage your files and give commands to the operating system.