Java MPI

To use Java with MPI, you will have to use the "LAM" version of MPI (not MPICH), see LAM/MPI. There are two extra steps to use LAM, you must first boot-up a "virtual machine" containing all the cluster nodes that you've been assigned by SGE, and then you must halt this virtual machine when your job-script is done.

Quick Example:

#!/bin/tcsh
#$ -S /bin/bash -cwd
#$ -j y
#$ -pe low-all 4
lamboot -H -s $TMPDIR/machines
lam-mpirun C /usr/bin/java my_java_prog
lamhalt -H

The odd letters and command line arguments are important: '-H', '-s', and 'C' (with no dash) -- they tell the LAM programs how to start-up.

  • lamboot - this program starts up the virtual machine that future LAM runs will operate on
    • The $TMPDIR/machines argument is an SGE-provided list of machines that your job has been allocated for this run
  • lam-mpirun - this runs the actual program, in this case /usr/bin/java 
    • Note that your Java program is actually sent as a command-line argument to the Java interpreter
    • The 'C' (capital-C with no dash) is a command-line argument to tell lam-mpirun that you want to run on all machines in the virtual machine, i.e. you want to use all the compute-power that SGE assigned to you.
  • lamhalt - this shuts down the virtual machine and cleans up after the LAM runs are finished

You should have all of your Java class files installed in the directory that you run the lam-mpirum command from (or otherwise set the Java search-path). You can use the '-cwd' option to SGE (shown in the first '#$' comment line) to start your SGE job from the directory where you issued 'qsub' or you can simply issue a 'cd' command from within the shell-script itself.


PLEASE NOTE: it is VERY IMPORTANT that you run the lamhalt command when your job is finished, otherwise several LAM 'daemon' processes may linger after your job is over and could drain resources from other, future jobs.


ALSO NOTE: you may need to specify an bit-size to SGE so that all of the LAM daemons are the same type.

#$ -l arch=lx26-amd64
#$ -l arch=lx26-x86