The SUN Grid Engine is a batch system. Users submit jobs which are placed in queues and the jobs are then executed, depending on the system load, the opening hours of the queues and the job priority.
It is recommended that all jobs longer than a couple of minutes are to be enqueued. This allows a system administrator to suspend jobs if for some reason another user wants to do timing experiments. In the present configuration, users can still log on to Albireo and execute code at any time. We however strongly recommend users to leave Albireo idle on nights and weekends to allow enqueued jobs to take full advantage of the machine. If, for some reasons, this policy does not work other measures will be taken to give enqueued jobs exclusive usage.
It is also suggested that long jobs (in terms of execution time) are given a low priority to allow short jobs to pass.
The system has several queues defined but for normal usage only two are open, one for MPI-jobs and one for multithreaded/serial jobs.
If a job is still running and the queue closes, the system will suspend the job until the queue opens again.
Albireo$ qsub -help CODINE 5.2 usage: qsub [options] [-a date_time] request a job start time [-clear] skip previous definitions for job [-cwd] use current working directory [-C directive_prefix] define command prefix for job script [-e path_list] specify standard error stream path(s) [-h] place user hold on job [-help] print this help [-j y|n] merge stdout and stderr stream of job [-m mail_options] define mail notification events [-now y[es]|n[o]] start job immediately or not at all [-M mail_list] notify these e-mail addresses [-N name] specify job name [-o path_list] specify standard output stream path(s) [-p priority] define job's relative priority [-pe pe-name slot_range] request slot range for parallel jobs [-q destin_id_list] bind job to queue(s) [-v variable_list] export these environment variables [-V] export all environment variables [-@ file] read commandline input from file [{script|-} [script_args]] date_time [[CC]YY]MMDDhhmm[.SS] destin_id_list queue[ queue ...] job_id_list job_id[,job_id,...] mail_address username[@host] mail_list mail_address[,mail_address,...] mail_options `e' `b' `a' `n' `s' path_list [host:]path[,[host:]path,...] priority -1023 - 1024 slot_range [n[-m]|[-]m] - n,m > 0 variable_list variable[=value][,variable[=value],...]Examples:
albireo$ qsub -q weekend -V -p -23 super_code.sh
Submits the script super_code.sh
to the
weekend
queue. Sets the job priority to -23 ( the valid
range is [-1023,1024] )
albireo$ qsub -a 103004.45 -cwd -q night -pe mpi 2-12 -m e -M henrikl@tdb.uu.se -V super_mpi_code.sh
Submits the script super_mpi_code.sh
to the night queue
(-q night
) using the parallel environment
mpi
, 2-12 processors with the following extras. Start the
job at 04:45, 30:th of October (-a 103004.45
), use the
current working directory (-cwd
), mail me (-M
henrikl@tdb.uu.se
) when the job ends (-m e
) and
use all login environment variables (-V
).
These flags can also be stated in the job script file, where the above flags are passed using the sentinel #$. See the example scripts below.
albireo$ qsub script.sh
To avoid mishaps always make shure that the full path to your executable is supplied. You can also use the flag -V.
The mail options can be clustered (-m bes
) and mean
These sample script should cover the basic needs. Just edit the template files below.
#!/bin/sh # # (c) 2000 Sun Microsystems, Inc. # # All commands use the sentinel #$ # # --------------------------- # User needs to customize the following items # enclosed by <> # #$ -N SuperSimulation #$ -S /bin/sh #$ -o super.output #$ -e super.error #$ -M samuel@tdb.uu.se #$ -m es # --------------------------- # # Execute the job from the current working directory #$ -cwd # # Parallel environment request # --------------------------- # User needs to customize the following items # enclosed by <> # #$ -l cre # # CPU_Numbers_requested, useor - # # Example: 2 or 4-22 where the latter gives 4 to 22 CPU:s # depending on the amount of idle CPU:s at # execution time # #$ -pe mpi 8 # --------------------------- # # All resources are defined here # # Choose your queue # #$ -q albireo.cre # # Job priority # #$ -p 0 # # --------------------------- # # Put compilations here # # --------------------------- # # Execution # # --------------------------- # # User needs to customize the following items # enclosed by <> # # /opt/codine/mpi/MPRUN -np $NSLOTS -Mf $TMPDIR/machines ./supersimulation # ---------------------------
albireo$ chmod u+x script_name
Monitors the queues.
The default behavior is to list all jobs with no queue status
information. If you supply the flag -f
you will also see
queue status and pending jobs.
Deletes jobs from queues
User must supply a job_id given at submission or by qstat
Monitors the system, example
albireo$ qhost HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS -------------------------------------------------------------------------------- global - - - - - - - albireo solaris 30 17.04 7.5G 2.8G 4.0G 32.0M
SUN Grid Engine software is a batch system where jobs (formulated as shell scripts) are put into queues and executed when the resource requirements of the job are fulfilled. Jobs are sorted in FIFO (first-in-first-out) fashion according to their priority. The job priority can only be lowered by an ordinary user. Jobs not eligible for execution will be placed in the pending job pool. The jobs are also sorted by Equal-share-scheduling which means that within each priority level jobs are sorted among different users. This prevents a user from "pushing" other users downwards by submitting a series of jobs (from a shell-script).
When a job has ended, the console output of the script will be put into files in the users home directory. The names of the files are composed of the job script file name, an appended dot sign followed by an "o" for stout file and an "e" for the stderr file and finally the unique job ID. These files can be merged and placed in other locations by suppling the right flags, described below. So if a user submits the job "simple.sh" the system will answer: your job 231 ("simple.sh") has been submitted . When the job has been executed the output will be called, "simple.sh.o231" and "simple.sh.e231".
The system has some, still limited, support for parallel programs
called parallel environments. A queue can be defined as a
parallel queue containing a number of
You can also require a range of slots, (3-13) where you will be given
at least 3 and at most 13 slots depending on the amount of slots left
when there are at least 3 slots free in the queue.
General information on SUN Grid Engine (used to be CODINE 5.1 by Gridware )is available at http://www.sun.com/gridware. The
Grid Engine software is aimed at controlling the computational
resources in a heterogeneous environment. For our purposes, we use the
software as a batch-system incorporating only one host (Albireo).
Manuals
The user manual is also available
Direct specific questions to me, henrikl@tdb.uu.se. Use the same adress if you would like to discuss the policys and extensions, changes to the system.
Back to Albireo home page