Wings for NONMEM on the NESI HPC3
Home | Installation | Control Streams | Bootstrap
| Randomization Test | Visual
Predictive Check | Autocovariate | Files | References
Last Updated: 26 June 2025
You will need to have installed WFN on
a Windows computer. WFN is a command line tool for using NONMEM.
Wings
for NONMEM (WFN) has links to the New Zealand eScience Infrastructure
(NeSI) system high performance computing resources (HPC3). When using WFN this is
primarily used to run NONMEM jobs using parallel processing on multiple CPUs (‘nodes’)
on HPC3.
The link between your Windows computer
and HPC3 is managed by the Remote Job Management (RJM) system. RJM is a set of
tools created by NeSI most recently maintained by Chris Scott. In order to use
the RJM system you will need to run rjm_configure.
IMPORTANT
NeSI HPC3 is only accessible by applying for an
account and registration of individual users.
Apply for access with https://www.nesi.org.nz/researchers/apply-access-our-services
before trying to run rjm-configure. This only needs to be
done once for each user.
rjm-configure needs to be run on each computer you
plan to connect to HPC3. rjm-configure is located in the wfn7\bin
directory. You can run it by opening a WFN window and typing
rjm_configure
You will be prompted to provide a NeSI
user name (used to apply for access to NeSI) and a NeSI project name (uoa00106
for Auckland Pharmacometrics Group users). rjm-configure will
suggest a directory name for storing NONMEM related files on HPC3. You should
accept the suggested name.
During the process of configuration
you will be prompted to provide credentials for access to NeSI. The images
below show examples. You should accept the defaults.
Once you
completed rjm-configure you can start using WFN to submit NONMEM jobs to HPC3.
Open ondemand
1.
https://ondemand.nesi.org.nz/public/
2.
Click on Login (you may be
prompted to show a 2 factor authentication code shown on Google Authenticator).
3.
The ondemand system opens in
the Dashboard page with the following options:
4.
Open a File manager window
by clicking on Files
so you can see the directories and files stored in your
NeSI home directory.
5.
Open an HPC3 NeSI terminal
window by clicking on Clusters then _NeSI HPC Shell Access or
use this link:
https://ondemand.nesi.org.nz/pun/sys/shell/ssh/login.hpc.nesi.org.nz
This terminal windows
gives you command line access to your files stored on the HPC3 system. This is
mainly used to query how jobs are running on HPC3 using slurm commands. Slurm is
a specialized operating system used to manage multiple users with access to the
HPC3 system of compute nodes.
Ask for a long directory
listing with the command ls -l
This is my NeSI home directory. There are two symbolic
links MY_PROJECT and MY_RJM that provide a simple way to access the uoa00106
project directory and your personal set of rjm-job files. It is recommended to
create these links by copying and pasting the following commands into the nesi
terminal window. Be sure to change nesiname to your NeSI user name
ln -s
/nesi/nobackup/uoa00106 ~/MY_PROJECT
ln -s /nesi/nobackup/uoa00106/nesiname/rjm-jobs/ ~/MY_RJM
The ~ before the symbolic link name refers to your NeSI
home directory. This is the location where the symbolic link will be created.
The environment variable rjm_dir
may be set to point to a different directory for running jobs on HPC3 e.g.
set rjmdir=/projects/myProject/myUPI/rjm_PK
You should change myProject to your project and myUPI
to your personal identifier. This can be useful if you are running batches of
jobs that you want to be able to easily identify.
1. Open a WFN window. You should now be able to use NeSI by calling nmgog:
nmgog theopd
2.
The nmgog command will start the job on HPC3. When it finishes you
should see the usual results that are displayed by WFN.
3.
The nmgog, nmbsg, nmbsig, nmrtg and
nmgosimg commands work similarly to nmgo,nmbs,
nmbsi, nmrt and nmgosim but
submit NONMEM runs to the cluster. The number of cpus is set by default to 4
and walltime to 4:0:0 (4 hours) for cluster jobs. These defaults can be changed
by setting the cpus and walltime environment variables before the WFN commands.
set CPUS=24
Note that if you ask for a
lot of CPUs your job may be put into a wait queue until there are enough CPUs
available.
The WALLTIME
variable is specified in the format hh:mm:ss. It controls the total run time for
your job. You might estimate this from a run on a typical Windows machine and
divide by 2 (it should be at least 2 times faster with 4 CPUs).
The default time for
checking that the job is finished is 10 seconds. You may set the BATCHWAIT
variable to a more suitable time if you have long jobs.
Request up to 24 hours for
job to run
set WALLTIME=24:00:00
Check every 60 seconds to see if jobs have finished
set BATCHWAIT=60
4.
The default memory
requested is 500 gigabytes. If a run fails then check stderr.txt in the WFN results folder. This may indicate not enough memory. Try
increasing the memory request in steps of 500 megabytes using the NMMEM variable. An error such as “compiler failure” also
suggests increasing the memory.
The memory request can be
changed with the NMMEM environment
variable. Note that memory size must be specified as an integer with M or G
suffix.
Request 500 megabytes of
memory:
set NMMEM=500M
Request 2 gigabytes of
memory:
set NMMEM=2G
The Windows environment options such as CPUS and NMMEM
may need to be tailored to specific NONMEM projects. The nmenv.bat script is useful for specifying these
environment variables when opening a WFN window. The nmenv.bat should be placed in the directory where the WFN window opens. This is typically
the NONMEM directory used to store NM-TRAN control streams for the project. The
title command in nmenv.bat is useful for showing key environment variables.
@echo off
rem nmenv.bat
set args7=
set CPUS=20
set NMMEM=2G
title Project CPUS=%CPUS%
The args7 variable is by default set to -prdefault in order to use pre-compiled PREDPP code. When the NM-TRAN specifies
changes to SIZES then the args7 variable should be unset. It is often useful to put this in nmenv.bat. Note that if you wish to run nmgo at the command line it is important to unset the CPUS variable. The CPUS variable should only be set when using WFN commands that use HPC3.
1.
Memory usage for jobs on a
particular date can be obtained using this command at the login node command
prompt (use Mobaxterm) with a suitable user id and date:
sacct -u nesiname -S 2018-12-10
or if you have a jobid (eg. From using squeue while
the job is running) this will show details of a running or completed job. The
completed job stats show the maximum memory usage (maxRSS). This may be useful
in estimating the requested memory size (see nmmem).
sacct -j jobid
2.
Job status can be obtained using this command
at the login node command prompt with a
suitable user id:
squeue -u nesiname
3.
The job status of all
users in the Auckland Pharmacometrics Group project can be displayed using:
squeue -A uoa00106
4.
The squeue command shows job numbers which can
be used with sview. The sview command can be used to find information about
each job but is rather clumsy to use when trying to find a particular job.
sview
5.
From mid-February 2021 WFN
maintains a log of jobs run using NeSI. WFN includes a smrg command that
retrieves job statistics derived from the job numbers and merges the output of
nm_seff and sacct to show CPU, memory and walltime efficiency.
smrg
The results of smrg are
collected into a file called smrg_stat.csv. This can be viewed using Excel and
specific rows selected using the Excel data filter. Each set of job statistics
starts with the date and time of the run and the job ID (slurm job number).
This example shows 3 rows
selected with Date containing ’04-06’. The results for each job include the
user name, the state of the job when it finished, and 4 sections with
statistics describing efficiency of requested CPU, memory and walltime.
The CPU section shows the
number of CPUs requested and the CPU efficiency. When CPUeff% is less than 2%
it typically means the job finished with an error detected by NM-TRAN or
NONMEM. Models similar to Job number 18921115 which ran successfully had a
CPUeff% of 99%. For reasons currently unexplained jobs running with NM7.5.0
report a State of ‘FAILED’ even though the NONMEM job completed normally. The
theopd test job ran successfully but with low CPU efficiency because each
individual has only a few observations.
Date |
Job ID |
User |
State |
CPUS |
CPUeff% |
2021-04-06_17.51.03.219209 |
18921115 |
jmor616 |
COMPLETED |
24 |
1.25 |
2021-04-06_08.22.38.430877 |
18924529 |
nhol004 |
FAILED |
4 |
16.67 |
2021-04-06_08.44.33.746999 |
18925119 |
nhol004 |
COMPLETED |
1 |
29.58 |
The memory section shows
the requested memory (ReqMem). By default this is 250Mc for NONMEM jobs. The
‘c’ suffix indicates this is memory requested per core (similar to per CPU).
The user can request more memory by changing the NMMEM environment variable
which is also shown in this section. The main memory demand for NONMEM is
during job compilation and the memory required is similar for both small and
large NONMEM models. The Memory values shows the memory used by all the
parallel tasks. This increases in proportion to the number of CPUs. The MaxRSS
statistic is hard to interpret. One definition is "Maximum
individual resident set size out of the group of resident set sizes associated
with all tasks in job." When CPUeff% is high it is usually several times
bigger than Memory. It is not clear how it can be smaller than Memory. The
MEMeff% statistic is somehow related to ReqMem and Memory or MaxRSS but it is
not clear how.
Date |
Job ID |
ReqMem |
NMMEM |
Memory |
MaxRSS |
MEMeff% |
2021-04-06_17.51.03.219209 |
18921115 |
250Mc |
250M |
11.72GB |
24K |
0.16 |
2021-04-06_08.22.38.430877 |
18924529 |
250Mc |
250M |
1.95GB |
759K |
0.14 |
2021-04-06_08.44.33.746999 |
18925119 |
20Mc |
20M |
40.00MB |
94K |
15.94 |
The walltime section shows
the Elapsed time the job ran (d-hh:mm:ss), the TotalCPU time which is
approximately Elapsed time multiplied by the number of CPUs, and Walltime (the
clock time that elapses from starting to finishing the job). The WallEff% is
the percent of Walltime taken by Elapsed.
Date |
Job ID |
Elapsed |
TotalCPU |
Walltime |
WallEff% |
2021-04-06_17.51.03.219209 |
18921115 |
0:00:10 |
00:00.0 |
1-00:00:00 |
0.01 |
2021-04-06_08.22.38.430877 |
18924529 |
0:00:03 |
00:07.1 |
4:00:00 |
0.1 |
2021-04-06_08.44.33.746999 |
18925119 |
0:03:33 |
0:00:00 |
0:05:00 |
71 |
The final section
indicates the NONMEM version (NMVER) and the job name. For NONMEM jobs this
will be the same as the model file name with a suffix indicating the Windows
shell command counter for that job (‘_cmd1’). WFN cluster commands that run
multiple jobs in parallel such as nmbsg are filtered to show only the first
shell command.
Date |
Job ID |
NMVER |
Name |
2021-04-06_17.51.03.219209 |
18921115 |
744 |
sevo_fixPKPD_popM_popP_popR_fixed_fixE0_d_cmd1 |
2021-04-06_08.22.38.430877 |
18924529 |
750 |
theopdg_cmd1 |
2021-04-06_08.44.33.746999 |
18925119 |
750 |
smrg_cmd1 |
The smrg command may be
run at any time. It takes just under 4 minutes to complete with a 2 month
collection of jobs. If someone else is running smrg you will get a warning
message and the previous most recent version of the smrg_stat.csv file will be
copied to the directory you used to call smrg.
6.
Other commands to find out
about jobs are described here:
https://support.nesi.org.nz/hc/en-gb/articles/360000205215-Useful-Slurm-Commands
The ondemand dashboard provides several apps that
might be useful.
These apps require the user to configure some settings
for the app. Once launched the app takes a minute or so to set itself up. Once it
is running you can then click on Launch to get it started. You can use the dashboard
to cancel a running app so that it no longer consumes HPC3 resources.
The Virtual Desktop looks
like this:
The Virtual Desktop offers
a File System view. By clicking on your nesiname you can see directories and
files in your home directory. This is an alternative to using the Files view
offered in the ondemand dashboard.
It also offers a terminal
view.
The Jupyter lab app offers
a combined File system and terminal view.
It is a matter of personal
preference which app you use to manage your files and give commands to the
operating system.