TIFR - PORTABLE BATCHING SYSTEM

 

Prev

Table of Contents

Next

The qsub Command

Specifying Job Resource Requirements

Specifying Per-Job CPU Time Limit

Specifying Per-Job Memory Size Limit

Specifying a Time For Submitting Batch Job

PBS Environment Variables

Submitting a Job

Submitting PBS Jobs With Input From a File

Submitting a PBS Job With Input From the Terminal

The qsub Command

The qsub command submits a series of commands to be executed as a batch job.

The qsub Command Format

The syntax for the qsub command is:

qsub [ option ] [ script-file ]

where option is one or more of the options to the qsub command, and script-file is the name of the file that contains the shell script (if one is used)  for the job submission. If you do not specify a script-file, then the commands are taken directly from the terminal (standard input). If the job commands are entered from the command line, the end of job is indicated by typing control-d on a blank line. Submission of a script file is the preferred method as it avoids errors due to typing mistakes.

The option field is usually of the format:

-flag [ param ]

Example:

-A g12345

Some options have multiple parameters which may be entered using a comma separated list in the following format:

-flag [ param=value[,param=value,...] ]

Example:

-l mem=50mw,cput=02:00:00

Note that there are no spaces on either side of the comma.

This may also be expressed as individual instances of the same option:

-flag param=value -flag param=value

Example:

-l mem=50mw -l cput=02:00:00

The most frequently used option for the qsub command are identified in Table below.

Options for the qsub command may be entered on the command-line, in the forms shown above, or embedded as comments at the top of the script file before the first executable command using the identifying string #PBS followed by the desired options. For example:

#PBS -l cput=2:30:00
...
# first executable command comes next

#PBS directives encountered after the first executable command will be considered comments. PBS options entered on the command line take precedence over those found in the script file.

Explanation of qsub Options

Option

Action

 -a date-time

Specifies an earliest date and/or time at which PBS can run the request.

 -A account

Causes the job to be executed under the account specified by account. 

 -C directive prefix

Defines the prefix that declares a directive to the qsub command within the script file.

-e path

Directs the standard error output produced by the request to the stated file path.

 -h 

Specifies that a user hold be applied to the job at submission time.

-I 

Declares the job is to be run "interactively".

  -j join 

Declares that the standard output and error streams of the job should be merged (joined). The values for join   can be:
oe      standard output and error streams are merged in the standard output file
eo     standard error and output streams are merged in the standard error file

-k keep

Defines which (if either) of the standard output or standard error will be retained at the host where the request was executed. Possible values for keep include:
     o       standard output stream only
     e       standard error stream only
    oe      both standard output and error streams kept in the standard output file
    eo      both standard error and output streams kept in the standard error file

 -l resource_list

Specifies the resources that are required by the job and establishes a limit to the amount of resource that can  be consumed. See Table 3.? for a list of available resources.

-m mail_options

 Defines the set of conditions under which the execution server will send a mail message about the job.

-N name

Declares a name for the job.

 -o path

Directs the standard output produced by the request to the stated file path.

 -p priority

Defines the priority of the job.

-r y|n

Declares whether the job can be rerun. The default is y, yes, the job can be rerun.

  -c when

Specifies that the batch request will or will not be checkpointed and when it will occur. Options for -c:                     n                No checkpointing is to be performed.
            s                Checkpoint at system shutdown.
            c                Checkpoint at default minimum time.
            c=mins        Checkpoint every mins minutes.

 -q queue

Defines the destination of the job. If this option is not used, the job is  submitted to batch queues. The -q  option identifies the complex, or class of queues the job should be considered for, however, job memory and  time requirements will determine the exact queue selected. Possible values for queue are: small, medium, long and verylong.
 -W additional_attributes Allows for the specification of additional job attributes.

 -v variable_list

Expands the list of environment variables that are exported to the job.

-S shell-name

 Specifies the UNICOS shell to interpret the request.

-u user-name

 Runs the request under the specified user name.

 -V

  Declares that all environment variables in the qsub command's environment are to be exported to the batch job.

-z

 Directs that qsub is to not write the job identifier assigned to the job to the command's standard output.
 

TOP

Specifying Job Resource Requirements

One of the most important aspects of creating a PBS batch job  is to accurately specify the system resource requirements of the job. The reason for being as accurate as possible, rather than, for example, asking for the maximums, is to allow the scheduler to maximize system usage.

The -l flag, described in the Table above, is used to request the system resources the job needs. Multiple resources may be requested in a single instance of the flag with a comma separated list, or multiple instances of the flag may be used.

For example:

% qsub -l cput=1:00:00,mem=40MW runjob         OR
% qsub -l cput=1:00:00 -l mem=40MW runjob

will produce the same result of requesting 1 hour of CPU time and 40 megawords of memory for the job that is contained in a script file named runjob.

The following table lists most of the resource request parameters recongized by the scheduler for the qsub -l flag.

Frequently Used Job Resource(-l) Options

qsub -l Option

 Action

cput=HH:MM:SS
cput=MM:SS
cput=SSSSS

Specifies the maximum amount of CPU time the job may run. Normally specified as hours, minutes, and seconds, but may also be specified as seconds only. Default is 300 seconds. 
pcput=HH:MM:SS
pcput=SSSSS
 Specifies maximum time an individual process may run. Secifications are the same as for cput. The default is  whatever cput was set to. 
mem=NNMW
mem=NNNKw
     mem=NNNNmb
Specifies the maximum amount of memory the job is expected to use. Memory may be specified as megawords--Cray words are 8 bytes--or bytes. Two letter abbreviations independent of case include: gw, mw, kw, w, gb, mb, kb, b. 
pmem=NNMW
pmem=NNNKw
pmem=NNNNmb
 Specifies the maximum amount of memory a process may use. Memory specifications are the same as for mem.

srfs_big=NNN

 Identifies the amount of high speed disk storage space on the /big file system required by the job. Storage specifications for NNN are the same as for memory, ie., mw, mb, etc. As the parameter name indicates, this file system is controlled by Session Reserveable File System (SRFS). What this indicates is that the space is guarenteed to be there up to the request limit before the job is allowed to start. A variable, $BIGDIR, holds the path to a directory owned by you. 

srfs_fast=NNN

Identifies the amount of storage space on the /fast file system required by the job. The /fast file system resides on the Solid State Device (SSD) and provides faster access for well formed (exact multiples of 512 words) reads and writes. The specifications for NNN are the same as for srfs_big. As the parameter name indicates, this file system is controlled by Session Reserveable File System (SRFS). What this indicates is that the space is  guarenteed to be there up to the request limit before the job is allowed to start. A variable, $FASTDIR, holds the path to a directory owned by you. 

ncpus=NN

 Identifies the number of CPU's the job will need. This set the environment variable NCPUS to NN. Setting this  parameter allows the scheduler to better schedule the CPU resource. 
 

TOP

Specifying Per-Job CPU Time Limit

The -lcput is used with the qsub command to specify a per-job maximum CPU time limit. If the combined total time of all processes associated with the job exceed the time limit set by this parameter, the job will terminate. The syntax of qsub -lcput is:

qsub -lcput=time-limit [ script-file ] where time-limit is of the format:

[[hours:]minutes:]seconds[.milliseconds]

For example:

-lcput=4:30:00

PBS will take the smallest unit possible for any specification, meaning, for example:

-lcput=4:30:00
-lcput=4:30

are not equivalent. The former reads: four hours, thirty minutes, while the latter reads: four minutes, thirty seconds.

The run-time of any program may vary, relative to the system load at the time of the run. Therefore, be sure to include a little extra time to account for these variations.

There is also a per-process time limit parameter, pcput, which has the same format, but places a limit on the amount of time any single process may use, as opposed to the limit for the whole job. This parameter should always be used in conjunction with the "job" time limit (cput). If cput is omitted, the default time of 5 minutes is set. Any job requiring more than this amount will abort even if the "process" time limit (pcput) has been set to a higher value.
 

TOP

Specifying Per-Job Memory Size Limit

The -lmem flag is used with the qsub command to specify a per-job maximum memory size limit for all processes required to complete the batch job. If any of process, or combination of concurrent processes, in the job exceed the size limit, the request terminates. The syntax of qsub -lmem is:

qsub -lmem=size-limit [ script-file ] where size-limit refers to the maximum amount of memory allowed for the job. The default size unit is measured in bytes, although it is possible to specify other units. Acceptable memory size units include: words (w), kilobytes (kb), kilowords (kw), megabytes (mb), megawords (mw), gigabytes (gb), and gigawords (gw).

The following example specifies the size limit in megawords:

qsub -lmem=16mw -lcput=1200 myjob There is also a per-process memory limit, pmem, which follows the same format, but places a limit on any one process, as opposed to the whole job. Failing to specify the "job" memory limit results in the default value, 4 MW. This will cause a job requiring more than the default to fail regardless of whether the "process" memory limit has been set to something higher.

TOP

Specifying a Time For Submitting Batch Job

qsub -a is used to specify the date and time after which the batch job should be submitted. This does not mean that the job will start at this time. If your job is still queued when the specified time arrives, it will remain queued and will not run until PBS initiates it. If it is designated by PBS to run before the specified time, then it will go into a wait state until the specified time arrives. The format of qsub -a is:

qsub -a date-time [ script-file ]

or in a script:

#PBS -a date-time

where date-time corresponds to a particular date and time. The date_time argument is in the following form:

[[[[CC]YY]MM]DD]hhmm[.SS]

The following are examples of valid date and time specifications:

qsub -a 0800                   Submit at the next 8:00 A.M.
qsub -a 199604010000     Submit at 12:00 A.M. on April 1, 1996
qsub -a 09301800            Submit at 6:00 P.M. on September 30

Without the -a date-time option, the current date and time are assumed (that is, the job is submitted immediately). The job is also submitted immediately if the date and time specified have already passed.
 

 Specifying Job Dependency and File Staging With the -W Flag

The -W option provides two categories of options that allow you to control the order your jobs run and that identify specific files to be copied to temporary locations during job execution. The first category is designated by the depend= string and provides dependency controls based on the PBS job identifier and the state of the job or jobs identified. The second category is designated by the strings stagein and stageout= and provides the ability to identify files to copy files to temporary locations before (stagein) execution for use during job processing, and copy out (stageout) to permanent storage after job completion.

The following table describes some of the dependency options. These options make it possible to control the order in which your jobs are run, or to make one job dependent on the successful completion or failure of another job.

 Job Dependency Options (-W depend=option)

qsub -W depend=option

Description

depend=after:jobid[:jobid] Schedule this job to run after job(s) jobid[:jobid] has started running. 
depend=afterok:jobid[:jobid]
depend=afternotok:jobid
 Schedule this job to run only after job(s) jobid[:jobid] has completed running without error, or  conversely (afternotok, with errors. 
depend=afterany:jobid[:jobid] Schedule this job to run after job(s) jobid[:jobid] has terminated running. 
depend=on:count The on dependency is used with the before dependencies. count is used to indicate the number of  jobs with dependencies on this job. NOTE: count must be accurate or the job will either be released to run early, or be held forever waiting for the remaining dependency.
 depend=before:jobid[:jobid] Start this job before job(s) jobid[:jobid]. Of course, this means you have to submit this job before any of the jobs specified have started.
depend=beforeok:jobid[:jobid]
      depend=beforenotok:jobid
 The job(s) specified must will be run only if the this job terminates without errors, or conversely   (beforenotok), with errors.
 depend=beforeany:jobid[:jobid] The jobs specified with jobid[:jobid] may be started after this this job has terminated for any  reason.

      The following example illustrates one way to use the dependency options. In this example the first job in a series is submitted to the machine TIFR C7  through the pbs server, TIFR C7. The second job is then submitted specifying that it may run only after the successful completion of the first job.

tifrc7 %  qsub job_01
4500.tifrc7.tifr.res.in
tifrc7% qsub -W depend=afterok:4500@tifrc7 job_02
4502.tifrc7.tifr.res.in
tifrc7 %

A couple other dependency options provide the ability to indicate that you want certain jobs to run at the same time.  For more information on the -W depend= options see the man page on qsub.
 

TOP

 PBS Environment Variables

PBS provides several environment variables that may be used during job execution. These variables are created when the job begins to run prior to the execution of the first command in the job script. These variables are included in the job's global envirnoment and may be used in the script for a variety of purposes including user identification, the location of the directory from which the job was submitted, the job's PBS ID, and other useful information.

The following is a table of the environment variables provided by PBS:

PBS Environment Variables

 Variable

Purpose

PBS_O_WORKDIR Contains the full path to the directory from which the job was submitted.
PBS_O_HOST Contains the name of the host on which the qsub command was executed.
PBS_O_QUEUE Represents the queue to which the job was submitted.
PBS_JOBID Contains the PBS job identifier. This ID includes the name of the pbs server, i.e. 
234.tifrc7.tifr.res.in. This variable may be used to uniquely name files associated with this job.
PBS_JOBNAME Holds the job name supplied by the user. If none supplied, then STDIN.
PBS_QUEUE Contains the name of the queue in which the job is executed.
PBS_ENVIRONMENT This variable is set to PBS_BATCH by PBS. When the job session begins this variable may be tested in the   shell startup files (.cshrc, .login, .profile) to determine if the current session is a PBS job.

       An alternative to PBS_O_WORKDIR is to use /R/machine_name/u/userid/pwd  as the full path to the directory from which the job was submitted. 

TOP

 Submitting a Job

A job is submitted to the batch queues by executing the qsub command. This may be done by entering commands interactively through standard input (your terminal), or by specifying a script file on the qsub command-line. In general, a script is considered safer. For both cases, the submitted commands are stored in a temporary file so that later changes will not affect previously queued batch jobs. Immediately after you submit a PBS request, the system returns a confirmation message specifying the identification number of the job. After completing the batch job execution, PBS returns standard output and the standard error to the directory from which the qsub command was invoked or as directed by qsub directives (-e path -o path).

Whether the mode of input is from a terminal or a file, a job always starts in your home directory. Therefore, if the job requires access to a specific file not residing in your home directory, either change to the directory (cd) containing the file, as a part of the batch job, or give an explicit path to the file. PBS provides an environment variable, $PBS_O_WORKDIR, which is set to the directory from which the job is submitted.

Secondly, you must remember which shell PBS is using for interpreting the submitted batch commands. The default shell is determined by the first character of the script, unless you use the -S option to specify a particular shell. You can specify /bin/sh (POSIX shell), /bin/csh (C shell), or /bin/ksh (korn shell).

The following two sections describe how to submit a job, first using a script file, and secondly, by entering input at the terminal. 

TOP

Submitting PBS Jobs With Input From a File

The qsub Command Format above, the qsub command format was described as:

qsub [ option ] [ script-file ]

In order to provide job input from a file the script-file field must be specified on the command line. For example:

% qsub runjob

where runjob is the name of a file located in the current directory, containing comments and executable commands. The option field is used to supply resource and other requirements for the job. #PBS directives specified in the script are overridden by options specified on the command-line. A common practice is to set up a script that includes all the directives necessary to run the job. On occasion, these directives will be overridden by entering the particular option on the command-line.

When the qsub command execution completes a PBS job identifier is returned for tracking purposes. EG:

% qsub runjob
9012.tifrc7.tifr.res.in

A copy of the script is saved by PBS for later execution. The originating script can then be modified without affecting the job you just submitted. When the job is scheduled to run the commands in the script will be executed by the designated command shell. At job completion, or job failure, the standard output and standard error of the job are returned to the owner. These files (or file, depending on qsub options used) are returned as the first 15 characters of the job name followed by dot (.) followed by the letter "o" for standard output, or the letter "e" for standard error. For example:

% ls   in the directory the from which job was submitted  will list these output files -- > job.e9012 job.o9012

Submitting a PBS job with input from a file allows you to submit a number of commands and options to be processed collectively. This speeds up the submission of multiple commands and saves repetitive typing. In the event of an unsuccessful run, it also allows you to check for errors made during job submission, or in the commands executed.

TOP

Submitting a PBS Job With Input From the Terminal

The second method of submitting a job to the batch queues is by entering it by hand at the terminal. This method is most useful for short scripts that require minimal typing. By it's very nature, this method is subject to typographical errors, and is not recommended for regular use. Saving a script in a file makes it easier to debug should problems arise.

The steps to submitting a job via terminal input are:
enter the qsub command and press return (enter).

You may include any of the PBS options, but, you must not specify a job script. When the qsub command does not find a script file it will prompt you for input from the keyboard.

enter shell commands one line at a time

Anything valid in a script may be entered. The only problem is that you can not go back and fix a typographical error once you have hit return.

enter control-d on a blank line

At this point, PBS will return the job identifier, just as it does when submitting a script file.

The following example submits a short job to the long  queue with a memory limit of 10MW and a job CPU time limit of 100 seconds.

% qsub -l mem=10MW,cput=100
./testjob
246.tifrc7.tifr.res.in

% qstat -au xyz
-------------------------
PBS Batch Request Summary
-----------------------------------
PBSid  Jobname    Username Queue      NDS TSK  SID  REQMEM  REMTIME S
------ ---------- -------- ---------- --- --- ----- ------ -------- - ---- ------------------------
246         test              xyz             long            -           -        0        10mw          00:01:25     R

As with submitting a script file, the standard output and standard error files are returned to the directory from which the job was submitted unless otherwise modified by qsub options or #PBS directives.
 

TOP