I'm running ROMS 2.1 for an embayment with time-dependent river discharge (point sources) and external salinity. Time stamps for the discharge and b.c. data coincide (monthly).
When I run this setup on a cluster with MPI, the river discharge gets correctly interpolated between the past and the future values, but the external salinity seems to get interpolated between the past(?) and zero. This recurs after each boundary condition update so that the boundary salinity value in time looks like sawtooth.
This problem is not seen with single-thread and OpenMP executables running an identical setup on the same computer and others. Our cluster is Xserve G5 with IBM xlf; I've tried both LAM 7.1.2 and OpenMPI 1.1, and have the same problem.
Superficially, it looks as if time indexing got stuck, so that the updated value would always go into one of the two-step records while the other record would never get updated; but I don't understand (1) why this happens to b.c. and not to river discharge, since both are read in with get_ngfld; and (2) why it happens only with MPI.
Thank you for any help and pointers - Mitsuhiro.
Time-dependent salinity boundary condition and MPI
Update - Thanks to David Darr, I had an opportunity to compile and run the setup on a Linux/Opteron cluster with a PGI compiler and LAM. The model ran without the problem mentioned.
I'm suspecting this is an IBM compiler problem; next, I will try the Absoft Pro Fortran on our Apple cluster.
If anyone's doing a similar setup (time-dependent tracer boundary condition from netCDF file, Apple cluster, IBM compiler, LAM or OpenMPI) I'd appreciate hearing from you. Even if only "we are seeing no such problem here".
Mitsuhiro
I'm suspecting this is an IBM compiler problem; next, I will try the Absoft Pro Fortran on our Apple cluster.
If anyone's doing a similar setup (time-dependent tracer boundary condition from netCDF file, Apple cluster, IBM compiler, LAM or OpenMPI) I'd appreciate hearing from you. Even if only "we are seeing no such problem here".
Mitsuhiro
-
- Posts: 55
- Joined: Tue Dec 13, 2005 7:23 pm
- Location: Univ of Maryland Center for Environmental Science
We have similar problems using MPI. The sequential run and MPI runs results are different especially at open Boundary. Also results are different when we use different number of partitions using MPI . We did use IBM cpp compiler and mvapich 1.2.6.
We have a 20 node linux cluster, if you don't mind I want to try your runs on my machine and send back results to see whether the same problems occur! Please write back to me at wenlong@hpl.umces.edu
Thanks,
Wen Long
We have a 20 node linux cluster, if you don't mind I want to try your runs on my machine and send back results to see whether the same problems occur! Please write back to me at wenlong@hpl.umces.edu
Thanks,
Wen Long
Oh thanks! And sorry for a late reply, it's summer
I was intrigued to hear that your results were dependent on domain partitioning. In our case, I tried several different domain decompositions, and results did not depend on them - the boundary value problem persisted no matter how I divided up the domain (within the possibilities of the grid).
Some of the files needed to run our model are rather large (~1GB) although not all information is used. Maybe I could extract the relevant portions and send them to you along with our cppdefs.h and code modifications.
Mitsuhiro.
I was intrigued to hear that your results were dependent on domain partitioning. In our case, I tried several different domain decompositions, and results did not depend on them - the boundary value problem persisted no matter how I divided up the domain (within the possibilities of the grid).
Some of the files needed to run our model are rather large (~1GB) although not all information is used. Maybe I could extract the relevant portions and send them to you along with our cppdefs.h and code modifications.
Mitsuhiro.
To Wen Long : A note on tiling -
We take extreme measures to ensure that the model behaves identically whether the domain is tiled or not. A simulation tiled with one configuration should yield identical results to a simulation tiled in a different manner (or with only 1 tile). If it does not, then there is something wrong.
IF the mpi and serial simulations differ, especially at open boundaries, then there might be a problem with the way the boundary information is being applied. Rivers can be tricky.
If you can reproduce a problem with a simple example that would be helpful.
-john
We take extreme measures to ensure that the model behaves identically whether the domain is tiled or not. A simulation tiled with one configuration should yield identical results to a simulation tiled in a different manner (or with only 1 tile). If it does not, then there is something wrong.
IF the mpi and serial simulations differ, especially at open boundaries, then there might be a problem with the way the boundary information is being applied. Rivers can be tricky.
If you can reproduce a problem with a simple example that would be helpful.
-john
Resolved
Forgot to mention that this problem has been resolved. It was indeed an IBM compiler issue and specifically the optimization flag (D'oh!) -O4. Downgrading to -O3 eliminated the problem.
I thank Bohyun Bahng for an alert spot on this. Mitsuhiro
I thank Bohyun Bahng for an alert spot on this. Mitsuhiro