What is PBS?

PBS is an acronym for Portable Batch System. TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS.

Getting Started

You need server and some clients. Server also can have client package installed, so users can login to server (ssh) and submit jobs to queues. For Ubuntu 10.04 it's reccomended to manually download and install packages from Maverick repository - Lucid packages are old and problematic.

Set up your Server

  1. Install the required packages
    sudo aptitude install torque-server torque-scheduler torque-client

    the rest should be installed as dependencies.
    NOTE: there is a bug in Maverick torque-scheduler package - check if in /etc/init.d/torque-scheduler proper PIDFILE is set

    PIDFILE=/var/spool/torque/sched_priv/sched.lock
  2. Set the server and queue parameters:
    • start server in 'create' mode
    • set up server settings (scheduling = True, query_other_jobs = True, and later default_queue)
    • define nodes
    • create and set up queues (queue_type,resources_max,started,enabled)

    You can do it by using qmgr command, either interactive or not (with -c option), except first 'create' command.

Set up Clients

  1. Install the required packages
    sudo aptitude install torque-client torque-mom
  2. Set up node - set default server in /var/spool/torque/server_name

  3. Check if pbs_mom is starting automatically (/etc/rcXd..., you can write pbs_mom command in /etc/rc.local file)

Using Torque PBS system

Basics of pbs scripts

  • #! sha-bang line

  • -l PBS parameters, including number of CPUs (cores), memory and time needed

  • -q name of the queue you want to submit to

  • -m notification settings

  • -V load user environment variables

#PBS -l nodes=1:ppn=8,mem=1g,walltime=72:00:00
#PBS -q YourQueue_name_here
#PBS -m abe
#PBS -V

cd /to_some_directory

some_commands and_parameters

Submitting, viewing and managing jobs

Basic tasks:

  • submit job to the queue qsub yourfile.pbs

  • check what jobs are running on which nodes qstat -an

  • remove job from queue qdel XXX.host

This is only a stub - to be continued soon...

Resources

http://www.clusterresources.com/products/torque-resource-manager.php


CategoryScience

TorquePbsHowto (last edited 2011-01-31 12:17:00 by janusz-mordarski)