Wings for NONMEM on the NESI Grid

 

Home | Installation | Control Streams | Bootstrap | Randomization Test | Visual Predictive Check | Autocovariate | Files | References

 

Last Updated: 16 January 2016

 

IMPORTANT
The NeSI Grid is only accessible after application for project login rights and registration of individual users https://identity.bestgrid.org/registration.

(University of Auckland Pharmacometrics users have the NeSI Project ID  uoa00106)

 

Installation to run WFN on the PAN cluster

1.     Install WFN version 732 (or later) and check that it works with nmgo theopd in the %WFNHOME%\run directory

2.     Jobs are submitted to the PAN cluster and results downloaded using a set of Remote Job Management Tools.

3.     Download the Remote Job Management tools http://cluster.ceres.auckland.ac.nz/rjm/windows/rjm_executables.zip zip archive and extract the files into your %WFNHOME%\bin directory.

4.     Open a WFN command window.

5.     Type the command rjm_configure, press enter and follow the instructions to set up a RJM passphrase and give your NeSI login name and password. This needs to be done only once by each user of a particular machine. The RJM passphrase may be the same as your University of Auckland net password. University of Auckland Pharmacometrics users have the NeSI Project code uoa00106.

6.     Open a WFN window. You should now be able to use the grid by calling nmgog:

 

nmgog theopd

7.     You will be asked for the RJM passphrase by a program called Pageant that sits in the Windows tray. The passphrase will be remembered for any runs until you logout (or restart your computer).

8.     The nmgog command will start the job on the PAN cluster. When it finishes you should see the usual results that are displayed by WFN.

9.     The nmgog, nmbsg, nmbsig, nmrtg and nmgosimg commands work similarly to nmgo,nmbs, nmbsi, nmrt and nmgosim but submit NONMEM runs to the grid. The number of cpus is set by default to 4 and walltime to 1:0:0 (1 hour) for grid jobs but can be changed by setting the cpus and walltime environment variables before calling these batch files.

10.   The walltime variable is specified in the format hh:mm:ss. It controls the total run time for your job. You might estimate this from a run on a typical Windows machine and divide by 2 (it should be at least 2 times faster with 4 CPUs). The default time for checking that the job is finished is 60 seconds. You may set the batchwait variable to a more suitable time if you have long jobs.

;allow 24 hours for job to finish

set walltime=24:0:0

; check every 300 seconds  to see if jobs have finished

set batchwait=300

11.   Note that if you ask for a lot of CPUs your job may be put into a wait queue until there are enough CPUs available.

12.   Checking on job status: You can check on the status of your jobs using this URL.

https://web.ceres.auckland.ac.nz/portal/#/portal/hpc/cgi-bin/noheader/summary.cgi

You can select your UPI and then see a list of jobs running and jobs scheduled to run.

13.   Maintenance: You will need to be able to clean up (delete) files from the PAN cluster. You can do this using the Mobaxterm utility.

14.   Install Mobaxterm which an X-terminal client that will let you login and manage your files.

https://wiki.auckland.ac.nz/display/CER/Access+and+data+transfer#Accessanddatatransfer-Preparingtousessh

15.   Login to login.uoa.nesi.org.nz with Mobaxterm using your grid identifiers eg. Your UoA Netpassword

16.   You can use the left hand pane of Mobaxterm to view your job directories.

17.   Enter this in the directory name box (top of left hand pane). If your project number is not uoa00106 then use your own project number instead.

/gpfs1m/projects/uoa00106

18.   Then click on the directory named with your UPI. You will then see a directory called rjm_jobs.

19.   If you click on rjm_jobs you can explore your job directories.

20.  You can clean up all your files by selecting rjm_jobs then deleting it (right click and delete or click on X delete icon)

21.   Advanced RJM Tools Options:  Detailed information about RJM tools can be found at this URL. WFN users may occasionally want to use rjm_batch_cancel and rjm_batch_cancel. Note that these commands are executed from the WFN command window in the directory where you started a grid job or batch of jobs. They use the *dirs.txt file created by WFN. If this file is missing or if the directories listed in this file are missing then these rjm tool commands will not work.

https://wiki.auckland.ac.nz/display/CER/Lightweight+Remote+Job+Management

22.  The environment variable rjm_dir may be set to point to a different directory for running jobs on the cluster e.g.

set rjmdir=/projects/myProject/myUPI/rjm_PK

You should change myProject to your project and myUPI to your personal identifier

23.  This can be useful if you are running batches of jobs that you want to be able to easily identify e.g. to look at results on the cluster.

24.  The environment variable nmwaitonly may be set to y eg.

set nmwaitonly=y

25.  This is rarely needed but it is possible that jobs have run and completed but not yet been downloaded (e.g. you logged out or restarted your computer). By setting nmwaitonly to y and rerunning the grid command (e.g. nmbsg) the rjm_batch_wait tool to download results will be restarted and no new job will be submitted.

26.  You can unset nmwaitonly or set it to n in order to restore the default behavior.

27.  nmstopg will cancel all jobs started by rjm_batch_submit (e.g. nmgog, nmbsg) using the *localdirs.txt file(s) in the current directory to identify submitted jobs.