|
TIFR - PORTABLE BATCHING SYSTEM |
|
|
Specifying Job Resource Requirements
Specifying Per-Job CPU Time Limit
Specifying Per-Job Memory Size Limit
Specifying a Time For Submitting Batch Job
Submitting PBS Jobs With Input From a File
Submitting a PBS Job With Input From the Terminal
The qsub command submits a series of commands to be executed as a batch job.
The qsub Command Format
The syntax for the qsub command is:
qsub [ option ] [ script-file ]
where option is one or more of the options to the qsub command, and script-file is the name of the file that contains the shell script (if one is used) for the job submission. If you do not specify a script-file, then the commands are taken directly from the terminal (standard input). If the job commands are entered from the command line, the end of job is indicated by typing control-d on a blank line. Submission of a script file is the preferred method as it avoids errors due to typing mistakes.
The option field is usually of the format:
-flag [ param ]
Example:
-A g12345
Some options have multiple parameters which may be entered using a comma separated list in the following format:
-flag [ param=value[,param=value,...] ]
Example:
-l mem=50mw,cput=02:00:00
Note that there are no spaces on either side of the comma.
This may also be expressed as individual instances of the same option:
-flag param=value -flag param=value
Example:
-l mem=50mw -l cput=02:00:00
The most frequently used option for the qsub command are identified in Table below.
Options for the qsub command may be entered on the command-line, in the forms shown above, or embedded as comments at the top of the script file before the first executable command using the identifying string #PBS followed by the desired options. For example:
#PBS -l cput=2:30:00
...
# first executable command comes next
#PBS directives encountered after the first executable command will be considered comments. PBS options entered on the command line take precedence over those found in the script file.
Explanation of qsub Options
Option |
Action |
-a date-time |
Specifies an earliest date and/or time at which PBS can run the request. |
-A account |
Causes the job to be executed under the account specified by account. |
-C directive prefix |
Defines the prefix that declares a directive to the qsub command within the script file. |
-e path |
Directs the standard error output produced by the request to the stated file path. |
-h |
Specifies that a user hold be applied to the job at submission time. |
-I |
Declares the job is to be run "interactively". |
-j join |
Declares that the standard output and error streams of
the job should be merged (joined). The values for join can be: oe standard output and error streams are merged in the standard output file eo standard error and output streams are merged in the standard error file |
-k keep |
Defines which (if either) of the standard output or
standard error will be retained at the host where the request was executed. Possible
values for keep include: o standard output stream only e standard error stream only oe both standard output and error streams kept in the standard output file eo both standard error and output streams kept in the standard error file |
-l resource_list |
Specifies the resources that are required by the job and establishes a limit to the amount of resource that can be consumed. See Table 3.? for a list of available resources. |
-m mail_options |
Defines the set of conditions under which the execution server will send a mail message about the job. |
-N name |
Declares a name for the job. |
-o path |
Directs the standard output produced by the request to the stated file path. |
-p priority |
Defines the priority of the job. |
-r y|n |
Declares whether the job can be rerun. The default is y, yes, the job can be rerun. |
-c when |
Specifies that the batch request will or will not be
checkpointed and when it will occur. Options for
-c:
n
No checkpointing is to be performed. s Checkpoint at system shutdown. c Checkpoint at default minimum time. c=mins Checkpoint every mins minutes. |
-q queue |
Defines the destination of the job. If this option is not used, the job is submitted to batch queues. The -q option identifies the complex, or class of queues the job should be considered for, however, job memory and time requirements will determine the exact queue selected. Possible values for queue are: small, medium, long and verylong. |
| -W additional_attributes | Allows for the specification of additional job attributes. |
-v variable_list |
Expands the list of environment variables that are exported to the job. |
-S shell-name |
Specifies the UNICOS shell to interpret the request. |
-u user-name |
Runs the request under the specified user name. |
-V |
Declares that all environment variables in the qsub command's environment are to be exported to the batch job. |
-z |
Directs that qsub is to not write the job identifier assigned to the job to the command's standard output. |
Specifying Job Resource Requirements
One of the most important aspects of creating a PBS batch job is to accurately specify the system resource requirements of the job. The reason for being as accurate as possible, rather than, for example, asking for the maximums, is to allow the scheduler to maximize system usage.
The -l flag, described in the Table above, is used to request the system resources the job needs. Multiple resources may be requested in a single instance of the flag with a comma separated list, or multiple instances of the flag may be used.
For example:
% qsub -l cput=1:00:00,mem=40MW
runjob OR
% qsub -l cput=1:00:00 -l mem=40MW runjob
will produce the same result of requesting 1 hour of CPU time and 40 megawords of memory for the job that is contained in a script file named runjob.
The following table lists most of the resource request parameters recongized by the scheduler for the qsub -l flag.
Frequently Used Job Resource(-l) Options
qsub -l Option |
Action |
|
cput=HH:MM:SS |
Specifies the maximum amount of CPU time the job may run. Normally specified as hours, minutes, and seconds, but may also be specified as seconds only. Default is 300 seconds. |
| pcput=HH:MM:SS pcput=SSSSS |
Specifies maximum time an individual process may run. Secifications are the same as for cput. The default is whatever cput was set to. |
| mem=NNMW mem=NNNKw mem=NNNNmb |
Specifies the maximum amount of memory the job is expected to use. Memory may be specified as megawords--Cray words are 8 bytes--or bytes. Two letter abbreviations independent of case include: gw, mw, kw, w, gb, mb, kb, b. |
| pmem=NNMW pmem=NNNKw pmem=NNNNmb |
Specifies the maximum amount of memory a process may use. Memory specifications are the same as for mem. |
srfs_big=NNN |
Identifies the amount of high speed disk storage space on the /big file system required by the job. Storage specifications for NNN are the same as for memory, ie., mw, mb, etc. As the parameter name indicates, this file system is controlled by Session Reserveable File System (SRFS). What this indicates is that the space is guarenteed to be there up to the request limit before the job is allowed to start. A variable, $BIGDIR, holds the path to a directory owned by you. |
srfs_fast=NNN |
Identifies the amount of storage space on the /fast file system required by the job. The /fast file system resides on the Solid State Device (SSD) and provides faster access for well formed (exact multiples of 512 words) reads and writes. The specifications for NNN are the same as for srfs_big. As the parameter name indicates, this file system is controlled by Session Reserveable File System (SRFS). What this indicates is that the space is guarenteed to be there up to the request limit before the job is allowed to start. A variable, $FASTDIR, holds the path to a directory owned by you. |
ncpus=NN |
Identifies the number of CPU's the job will need. This set the environment variable NCPUS to NN. Setting this parameter allows the scheduler to better schedule the CPU resource. |
Specifying Per-Job CPU Time Limit
The -lcput is used with the qsub command to specify a per-job maximum CPU time limit. If the combined total time of all processes associated with the job exceed the time limit set by this parameter, the job will terminate. The syntax of qsub -lcput is:
qsub -lcput=time-limit [ script-file ] where time-limit is of the format:
[[hours:]minutes:]seconds[.milliseconds]
For example:
-lcput=4:30:00
PBS will take the smallest unit possible for any specification, meaning, for example:
-lcput=4:30:00
-lcput=4:30
are not equivalent. The former reads: four hours, thirty minutes, while the latter reads: four minutes, thirty seconds.
The run-time of any program may vary, relative to the system load at the time of the run. Therefore, be sure to include a little extra time to account for these variations.
There is also a per-process time limit parameter,
pcput, which has the same format, but places a limit on the amount of time any single
process may use, as opposed to the limit for the whole job. This parameter should always
be used in conjunction with the "job" time limit (cput). If cput is omitted, the
default time of 5 minutes is set. Any job requiring more than this amount will abort even
if the "process" time limit (pcput) has been set to a higher value.
Specifying Per-Job Memory Size Limit
The -lmem flag is used with the qsub command to specify a per-job maximum memory size limit for all processes required to complete the batch job. If any of process, or combination of concurrent processes, in the job exceed the size limit, the request terminates. The syntax of qsub -lmem is:
qsub -lmem=size-limit [ script-file ] where size-limit refers to the maximum amount of memory allowed for the job. The default size unit is measured in bytes, although it is possible to specify other units. Acceptable memory size units include: words (w), kilobytes (kb), kilowords (kw), megabytes (mb), megawords (mw), gigabytes (gb), and gigawords (gw).
The following example specifies the size limit in megawords:
qsub -lmem=16mw -lcput=1200 myjob There is also a
per-process memory limit, pmem, which follows the same format, but places a limit on any
one process, as opposed to the whole job. Failing to specify the "job" memory
limit results in the default value, 4 MW. This will cause a job requiring more than the
default to fail regardless of whether the "process" memory limit has been set to
something higher.
Specifying a Time For Submitting Batch Job
qsub -a is used to specify the date and time after which the batch job should be submitted. This does not mean that the job will start at this time. If your job is still queued when the specified time arrives, it will remain queued and will not run until PBS initiates it. If it is designated by PBS to run before the specified time, then it will go into a wait state until the specified time arrives. The format of qsub -a is:
qsub -a date-time [ script-file ]
or in a script:
#PBS -a date-time
where date-time corresponds to a particular date and time. The date_time argument is in the following form:
[[[[CC]YY]MM]DD]hhmm[.SS]
The following are examples of valid date and time specifications:
qsub -a
0800
Submit at the next 8:00 A.M.
qsub -a 199604010000 Submit
at 12:00 A.M. on April 1, 1996
qsub -a 09301800
Submit at 6:00
P.M. on September 30
Without the -a date-time option, the current date and
time are assumed (that is, the job is submitted immediately). The job is also submitted
immediately if the date and time specified have already passed.
Specifying Job Dependency and File Staging With the -W Flag
The -W option provides two categories of options that allow you to control the order your jobs run and that identify specific files to be copied to temporary locations during job execution. The first category is designated by the depend= string and provides dependency controls based on the PBS job identifier and the state of the job or jobs identified. The second category is designated by the strings stagein and stageout= and provides the ability to identify files to copy files to temporary locations before (stagein) execution for use during job processing, and copy out (stageout) to permanent storage after job completion.
The following table describes some of the dependency options. These options make it possible to control the order in which your jobs are run, or to make one job dependent on the successful completion or failure of another job.
Job Dependency Options (-W depend=option)
qsub -W depend=option |
Description |
| depend=after:jobid[:jobid] | Schedule this job to run after job(s) jobid[:jobid] has started running. |
| depend=afterok:jobid[:jobid] depend=afternotok:jobid |
Schedule this job to run only after job(s) jobid[:jobid] has completed running without error, or conversely (afternotok, with errors. |
| depend=afterany:jobid[:jobid] | Schedule this job to run after job(s) jobid[:jobid] has terminated running. |
| depend=on:count | The on dependency is used with the before dependencies. count is used to indicate the number of jobs with dependencies on this job. NOTE: count must be accurate or the job will either be released to run early, or be held forever waiting for the remaining dependency. |
| depend=before:jobid[:jobid] | Start this job before job(s) jobid[:jobid]. Of course, this means you have to submit this job before any of the jobs specified have started. |
| depend=beforeok:jobid[:jobid] depend=beforenotok:jobid |
The job(s) specified must will be run only if the this job terminates without errors, or conversely (beforenotok), with errors. |
| depend=beforeany:jobid[:jobid] | The jobs specified with jobid[:jobid] may be started after this this job has terminated for any reason. |
The following example illustrates one way to use the dependency options. In this example the first job in a series is submitted to the machine TIFR C7 through the pbs server, TIFR C7. The second job is then submitted specifying that it may run only after the successful completion of the first job.
tifrc7 % qsub job_01
4500.tifrc7.tifr.res.in
tifrc7% qsub -W depend=afterok:4500@tifrc7 job_02
4502.tifrc7.tifr.res.in
tifrc7 %
A couple other dependency options provide the ability
to indicate that you want certain jobs to run at the same time. For more information
on the -W depend= options see the man page on qsub.
PBS provides several environment variables that may be used during job execution. These variables are created when the job begins to run prior to the execution of the first command in the job script. These variables are included in the job's global envirnoment and may be used in the script for a variety of purposes including user identification, the location of the directory from which the job was submitted, the job's PBS ID, and other useful information.
The following is a table of the environment variables provided by PBS:
PBS Environment Variables
Variable |
Purpose |
| PBS_O_WORKDIR | Contains the full path to the directory from which the job was submitted. |
| PBS_O_HOST | Contains the name of the host on which the qsub command was executed. |
| PBS_O_QUEUE | Represents the queue to which the job was submitted. |
| PBS_JOBID | Contains the PBS job identifier. This ID includes the
name of the pbs server, i.e. 234.tifrc7.tifr.res.in. This variable may be used to uniquely name files associated with this job. |
| PBS_JOBNAME | Holds the job name supplied by the user. If none supplied, then STDIN. |
| PBS_QUEUE | Contains the name of the queue in which the job is executed. |
| PBS_ENVIRONMENT | This variable is set to PBS_BATCH by PBS. When the job session begins this variable may be tested in the shell startup files (.cshrc, .login, .profile) to determine if the current session is a PBS job. |
An alternative to PBS_O_WORKDIR is to use /R/machine_name/u/userid/pwd as the full path to the directory from which the job was submitted.
A job is submitted to the batch queues by executing the qsub command. This may be done by entering commands interactively through standard input (your terminal), or by specifying a script file on the qsub command-line. In general, a script is considered safer. For both cases, the submitted commands are stored in a temporary file so that later changes will not affect previously queued batch jobs. Immediately after you submit a PBS request, the system returns a confirmation message specifying the identification number of the job. After completing the batch job execution, PBS returns standard output and the standard error to the directory from which the qsub command was invoked or as directed by qsub directives (-e path -o path).
Whether the mode of input is from a terminal or a file, a job always starts in your home directory. Therefore, if the job requires access to a specific file not residing in your home directory, either change to the directory (cd) containing the file, as a part of the batch job, or give an explicit path to the file. PBS provides an environment variable, $PBS_O_WORKDIR, which is set to the directory from which the job is submitted.
Secondly, you must remember which shell PBS is using for interpreting the submitted batch commands. The default shell is determined by the first character of the script, unless you use the -S option to specify a particular shell. You can specify /bin/sh (POSIX shell), /bin/csh (C shell), or /bin/ksh (korn shell).
The following two sections describe how to submit a job, first using a script file, and secondly, by entering input at the terminal.
Submitting PBS Jobs With Input From a File
The qsub Command Format above, the qsub command format was described as:
qsub [ option ] [ script-file ]
In order to provide job input from a file the script-file field must be specified on the command line. For example:
% qsub runjob
where runjob is the name of a file located in the current directory, containing comments and executable commands. The option field is used to supply resource and other requirements for the job. #PBS directives specified in the script are overridden by options specified on the command-line. A common practice is to set up a script that includes all the directives necessary to run the job. On occasion, these directives will be overridden by entering the particular option on the command-line.
When the qsub command execution completes a PBS job identifier is returned for tracking purposes. EG:
% qsub runjob
9012.tifrc7.tifr.res.in
A copy of the script is saved by PBS for later execution. The originating script can then be modified without affecting the job you just submitted. When the job is scheduled to run the commands in the script will be executed by the designated command shell. At job completion, or job failure, the standard output and standard error of the job are returned to the owner. These files (or file, depending on qsub options used) are returned as the first 15 characters of the job name followed by dot (.) followed by the letter "o" for standard output, or the letter "e" for standard error. For example:
% ls in the directory the from which job was submitted will list these output files -- > job.e9012 job.o9012
Submitting a PBS job with input from a file allows you to submit a number of commands and options to be processed collectively. This speeds up the submission of multiple commands and saves repetitive typing. In the event of an unsuccessful run, it also allows you to check for errors made during job submission, or in the commands executed.
Submitting a PBS Job With Input From the Terminal
The second method of submitting a job to the batch queues is by entering it by hand at the terminal. This method is most useful for short scripts that require minimal typing. By it's very nature, this method is subject to typographical errors, and is not recommended for regular use. Saving a script in a file makes it easier to debug should problems arise.
The steps to submitting a job via terminal input are:
enter the qsub command and press return (enter).
You may include any of the PBS options, but, you must not specify a job script. When the qsub command does not find a script file it will prompt you for input from the keyboard.
enter shell commands one line at a time
Anything valid in a script may be entered. The only problem is that you can not go back and fix a typographical error once you have hit return.
enter control-d on a blank line
At this point, PBS will return the job identifier, just as it does when submitting a script file.
The following example submits a short job to the long queue with a memory limit of 10MW and a job CPU time limit of 100 seconds.
% qsub -l mem=10MW,cput=100
./testjob
246.tifrc7.tifr.res.in
% qstat -au xyz
-------------------------
PBS Batch Request Summary
-----------------------------------
PBSid Jobname Username
Queue NDS TSK SID REQMEM REMTIME S
------ ---------- -------- ---------- --- --- ----- ------
-------- - ---- ------------------------
246
test
xyz
long
-
- 0
10mw
00:01:25 R
As with submitting a script file, the standard output
and standard error files are returned to the directory from which the job was submitted
unless otherwise modified by qsub options or #PBS directives.