We have the usual GNU compilers installed, both g77, the GNU Fortran compiler, gcc, the GNU C compiler, and g++, the GNU C++ compiler. The GNU compilers are located in /usr/bin. The standard BLAS and LAPACK (linear algebra) libraries in /usr/lib. To compile MPICH programs, use mpicc and mpif77 and they will automatically find the correct headers and libraries.
To turn on the usual compiler optimizations, you should compile with:
gcc -O3 -o myprog myprog.c
or
g77 -O3 -o myprog myprog.f
The "-O" option used to go up to "-O5", and there are other specific optimizations that you can turn on separately (the "-O" flag just turns on a known-good set of other options). See the gcc man page for more details.
Using the Intel Compilers
Additionally, we have the Intel compilers installed in /opt/intel. These are high performance compilers which are created by Intel and optimized for Intel's chips. The Fortran compiler, ifort, supports the F77, F90, and F95 standards, as well as some compatibility with Cray-Fortran and DEC-Fortran extensions. Note that by default it expects F95 syntax. The C compiler, icc, supports both the C and C++ languages. To turn on the usual compiler optimizations, you should compile with:
icc -O3 -o myprog myprog.c
or
ifort -O3 -o myprog myprog.f95
There are other specific optimizations that you can turn on separately (the "-O" flag just turns on a known-good set of other options), "-O3" is the highest level. To tune the code for Xeon chips only, try "-tpp7 -xW -march=pentium4 -mcpu=pentium4". Other options to try might be "-ipo" for inter-procedural optimizations and "-unroll" for loop-unrolling. See the icc and ifort man pages for more details (they are in /opt/intel/cc/9.0/man and /opt/intel/fc/9.0/man).
Note: for the 64-bit machines, the compilers are in /opt/intel/cce/ and /opt/intel/fce.
For F77, fixed-format code (the standard layout where columns 0-7 have special meaning):
ifort -FI -O3 -o myprog myprog.f77
The "-FI" option indicates fixed-format data. There are a number of other options to turn on various vendor-specific extensions:
| -stack_temps or -nostack_temps | indicates whether to put temporary arrays on the stack or heap, default is -nostack_temps which means those arrays are placed on the heap |
| -fpp | forces the use of the Fortran pre-processor; sometimes used with F77 code since by default F77 does NOT use a pre-processor |
| -i2, -i4, -i8 | default size for integer variables (2,4, or 8 bytes) |
| -r8, -r16 | default size for real variables (8 or 16 bytes) |
| -72, -80, -132, -extend_source | number of columns to parse for fixed-format source code; -extend_source is 132 columns |
| -dps, -nodps | enable (default) or disable DEC PARAMETER statement |
| -1, -onetrip | force all DO loops to be executed at least once |
| -auto, -auto_scalar | make all variables AUTOMATIC, or just all scalar variables |
| -save | make all variables SAVE'd (static allocation), except those within a recursive subroutine |
| -u, -implicitnone | set IMPLICIT NONE by default |
| -vms | enable VMS and DEC extensions |
| -zero | initialize all variables to zero |
| -nus | do not append an underscore to the names of external subroutines |
| -lowercase, -uppercase | force all routine names to lower- or upper-case |
| -posixlib | link with the POSIX library |
| -Vaxlib | link with the portability library |
The "-Vaxlib" option seems to be quite useful, but you may want to play with the other libraries in the /opt/intel/fc/lib directory. Additionally, we have recompiled the BLAS and LAPACK libraries for use with the Intel compilers. To link against them, use:
ifort -FI -o myprog myprog.f -L/opt/intel/fc/lib -llapack -lblas -Vaxlib
Additionally, the Intel compilers support OpenMP which is often a quick way for people to add parallelism to their existing code. It amounts to the programmer identifying which loops can be done in parallel, and the compiler then creates the necessary parallel code. To use this feature, you must have inserted the OpenMP comment lines, then compile with:
ifort -FI -O3 -openmp -o myprog myprog.f77
You may want to experiment with:
ifort -O3 -tpp7 -xW -o myprog myprog.f
The "-xW" and "-tpp7" options turns on specialized code for the Pentium 4 and Xeon chips. Also:
ifort -O3 -tpp7 -xW -opt_report -o myprog myprog.f
The "-opt_report" option should generate an optimization report on the screen. Note that this report actually goes to stderr, so you need to be careful if you want to capture it to a file:
ifort -O3 -tpp7 -xW -opt_report -o myprog myprog.f >& report.out
One last experimental thing to try is the Intel auto-parallelizer which should work something like OpenMP; it will try to determine which loops make sense to parallelize.
ifort -O3 -parallel -o myprog myprof.f
32- versus 64-Bit
In general, the machine you compile on will determine the bit-size of the executable. While there are compiler flags you can set to change this, you may also need different libraries which are not installed on the opposite system. To keep things simple, compile 32-bit applications on cluster1 and cluster2; compile 64-bit applications on cluster3 and cluster4.
Again, keep in mind that 32-bit applications can run on 32- or 64-bit machines (and hence can run anywhere in the cluster); 64-bit applications can only run on 64-bit machines. To allow your job to run on the largest number of machines in the cluster, you may want to compile all applications for 32-bit.
See the Bit Size page for more info on 32- and 64-bit issues.