I turned on the floating point exception checking on the ifort compiler and found that pm has zero values at (0,Mm) and (21,Mm) (thus producing zero divide in metrics.F), with a 2x2 tiling with 4 processors. It looks like information is not getting transferred properly in Utility/get_grid.F. Perhaps there is a problem in the new mp_exchange2d routine when NSperiodic is true. It is likely that the same error occurs with EW_PERIODIC, but I haven't had time to check that yet.
I would attach the grid file I created but the file extension nc does not seem to be permitted for upload on this forum. The output from the run is:
------------
Code: Select all
Process Information:
Node # 0 (pid= 8430) is active.
Node # 2 (pid= 8428) is active.
Node # 1 (pid= 8427) is active.
Node # 3 (pid= 8429) is active.
Model Input Parameters: ROMS/TOMS version 3.2
Thursday - February 5, 2009 - 9:37:48 PM
-----------------------------------------------------------------------------
River Plume Test
Operating system : Linux
CPU/hardware : x86_64
Compiler system : ifort
Compiler command : /usr/local/mpich2-1.0.5p4/bin/mpif90
Compiler flags : -heap-arrays -g -fpe0 -traceback -free
Input Script : Apps/RIVERPLUME1/ocean_riverplume1.in
SVN Root URL : https://www.myroms.org/svn/omlab/branches/kate
SVN Revision :
Local Root : /heim/paul/roms_3.1k_912
Header Dir : Apps/RIVERPLUME1
Header file : riverplume1.h
Analytical Dir: /heim/paul/roms_3.1k_912/ROMS/Functionals
Resolution, Grid 01: 0039x0067x013, Parallel Nodes: 4, Tiling: 002x002
Physical Parameters, Grid: 01
=============================
21600 ntimes Number of timesteps for 3-D equations.
120.000 dt Timestep size (s) for 3-D equations.
20 ndtfast Number of timesteps for 2-D equations between
each 3D timestep.
1 ERstr Starting ensemble/perturbation run number.
1 ERend Ending ensemble/perturbation run number.
0 nrrec Number of restart records to read from disk.
T LcycleRST Switch to recycle time-records in restart file.
360 nRST Number of timesteps between the writing of data
into restart fields.
1 ninfo Number of timesteps between print of information
to standard output.
T ldefout Switch to create a new output NetCDF file(s).
360 nHIS Number of timesteps between the writing fields
into history file.
1 ntsAVG Starting timestep for the accumulation of output
time-averaged data.
360 nAVG Number of timesteps between the writing of
time-averaged data into averages file.
0.0000E+00 tnu2(01) Horizontal, harmonic mixing coefficient (m2/s)
for tracer 01: temp
0.0000E+00 tnu2(02) Horizontal, harmonic mixing coefficient (m2/s)
for tracer 02: salt
5.0000E-06 Akt_bak(01) Background vertical mixing coefficient (m2/s)
for tracer 01: temp
5.0000E-06 Akt_bak(02) Background vertical mixing coefficient (m2/s)
for tracer 02: salt
5.0000E-06 Akv_bak Background vertical mixing coefficient (m2/s)
for momentum.
3.0000E-04 rdrg Linear bottom drag coefficient (m/s).
3.0000E-03 rdrg2 Quadratic bottom drag coefficient.
2.0000E-02 Zob Bottom roughness (m).
1 lmd_Jwt Jerlov water type.
3.0000E+00 theta_s S-coordinate surface control parameter.
4.0000E-01 theta_b S-coordinate bottom control parameter.
50.000 Tcline S-coordinate surface/bottom layer width (m) used
in vertical coordinate stretching.
1025.000 rho0 Mean density (kg/m3) for Boussinesq approximation.
0.000 dstart Time-stamp assigned to model initialization (days).
0.00 time_ref Reference time for units attribute (yyyymmdd.dd)
0.0000E+00 Tnudg(01) Nudging/relaxation time scale (days)
for tracer 01: temp
0.0000E+00 Tnudg(02) Nudging/relaxation time scale (days)
for tracer 02: salt
0.0000E+00 Znudg Nudging/relaxation time scale (days)
for free-surface.
0.0000E+00 M2nudg Nudging/relaxation time scale (days)
for 2D momentum.
0.0000E+00 M3nudg Nudging/relaxation time scale (days)
for 3D momentum.
0.0000E+00 obcfac Factor between passive and active
open boundary conditions.
4.000 T0 Background potential temperature (C) constant.
32.000 S0 Background salinity (PSU) constant.
-1.000 gamma2 Slipperiness variable: free-slip (1.0) or
no-slip (-1.0).
T Hout(idFsur) Write out free-surface.
T Hout(idUbar) Write out 2D U-momentum component.
T Hout(idVbar) Write out 2D V-momentum component.
T Hout(idUvel) Write out 3D U-momentum component.
T Hout(idVvel) Write out 3D V-momentum component.
T Hout(idWvel) Write out W-momentum component.
T Hout(idOvel) Write out omega vertical velocity.
T Hout(idTvar) Write out tracer 01: temp
T Hout(idTvar) Write out tracer 02: salt
T Hout(idDano) Write out density anomaly.
T Hout(idVvis) Write out vertical viscosity coefficient.
T Hout(idTdif) Write out vertical T-diffusion coefficient.
T Hout(idSdif) Write out vertical S-diffusion coefficient.
T Hout(idHsbl) Write out depth of surface boundary layer.
T Hout(idHbbl) Write out depth of bottom boundary layer.
Output/Input Files:
Output Restart File: ocean_rst.nc
Output History File: ocean_his.nc
Output Averages File: ocean_avg.nc
Input Grid File: Apps/RIVERPLUME1/riverplume1_grid.nc
IO Variable Information File: ROMS/External/varinfo.dat
Tile partition information for Grid 01: 0039x0067x0013 tiling: 002x002
tile Istr Iend Jstr Jend Npts
Number of tracers: 2
0 1 20 1 34 8840
1 21 39 1 34 8398
2 1 20 35 67 8580
3 21 39 35 67 8151
Tile minimum and maximum fractional grid coordinates:
(interior points only)
tile Xmin Xmax Ymin Ymax grid
0 0.50 20.50 0.50 34.50 RHO-points
1 20.50 39.50 0.50 34.50 RHO-points
2 0.50 20.50 34.50 67.50 RHO-points
3 20.50 39.50 34.50 67.50 RHO-points
0 1.00 20.50 0.50 34.50 U-points
1 20.50 39.00 0.50 34.50 U-points
2 1.00 20.50 34.50 67.50 U-points
3 20.50 39.00 34.50 67.50 U-points
0 0.50 20.50 1.00 34.50 V-points
1 20.50 39.50 1.00 34.50 V-points
2 0.50 20.50 34.50 67.00 V-points
3 20.50 39.50 34.50 67.00 V-points
Maximum halo size in XI and ETA directions:
HaloSizeI(1) = 93
HaloSizeJ(1) = 141
TileSide(1) = 41
TileSize(1) = 1025
Activated C-preprocessing Options:
RIVERPLUME1 River Plume Test
ANA_BSFLUX Analytical kinematic bottom salinity flux.
ANA_BTFLUX Analytical kinematic bottom temperature flux.
ANA_INITIAL Analytical initial conditions.
ANA_PSOURCE Analytical point sources and sinks.
ANA_SMFLUX Analytical kinematic surface momentum flux.
ANA_SRFLUX Analytical kinematic shortwave radiation flux.
ANA_SSFLUX Analytical kinematic surface salinity flux.
ANA_STFLUX Analytical kinematic surface temperature flux.
ASSUMED_SHAPE Using assumed-shape arrays.
AVERAGES Writing out time-averaged fields.
AVERAGES_AKS Writing out time-averaged vertical S-diffusion.
AVERAGES_AKT Writing out time-averaged vertical T-diffusion.
DJ_GRADPS Parabolic Splines density Jacobian (Shchepetkin, 2002).
DOUBLE_PRECISION Double precision arithmetic.
EASTERN_WALL Wall boundary at Eastern edge.
LMD_BKPP KPP bottom boundary layer mixing.
LMD_CONVEC LMD convective mixing due to shear instability.
LMD_MIXING Large/McWilliams/Doney interior mixing.
LMD_NONLOCAL LMD convective nonlocal transport.
LMD_RIMIX LMD diffusivity due to shear instability.
LMD_SKPP KPP surface boundary layer mixing.
MASKING Land/Sea masking.
MIX_GEO_TS Mixing of tracers along geopotential surfaces.
MPI MPI distributed-memory configuration.
NONLINEAR Nonlinear Model.
NONLIN_EOS Nonlinear Equation of State for seawater.
NS_PERIODIC North-South periodic boundaries.
POWER_LAW Power-law shape time-averaging barotropic filter.
PROFILE Time profiling activated .
!RST_SINGLE Double precision fields in restart NetCDF file.
SALINITY Using salinity.
SOLVE3D Solving 3D Primitive Equations.
SPLINES Conservative parabolic spline reconstruction.
TS_A4HADVECTION Fourth-order Akima horizontal advection of tracers.
TS_A4VADVECTION Fourth-order Akima vertical advection of tracers.
TS_DIF2 Harmonic mixing of tracers.
TS_PSOURCE Tracers point sources and sinks.
UV_ADV Advection of momentum.
UV_COR Coriolis term.
UV_U3HADVECTION Third-order upstream horizontal advection of 3D momentum.
UV_C4VADVECTION Fourth-order centered vertical advection of momentum.
UV_QDRAG Quadratic bottom stress.
UV_PSOURCE Mass point sources and sinks.
VAR_RHO_2D Variable density barotropic mode.
WESTERN_WALL Wall boundary at Western edge.
INITIAL: Configuring and initializing forward nonlinear model ...
Vertical S-coordinate System:
level S-coord Cs-curve at_hmin over_slope at_hmax
13 0.0000000 0.0000000 0.000 0.000 0.000
12 -0.0769231 -0.0253369 -1.154 -3.371 -5.588
11 -0.1538462 -0.0568884 -2.308 -7.285 -12.263
10 -0.2307692 -0.0971871 -3.462 -11.965 -20.469
9 -0.3076923 -0.1484861 -4.615 -17.608 -30.600
8 -0.3846154 -0.2119251 -5.769 -24.313 -42.856
7 -0.4615385 -0.2867031 -6.923 -32.010 -57.096
6 -0.5384615 -0.3700543 -8.077 -40.457 -72.836
5 -0.6153846 -0.4585665 -9.231 -49.355 -89.480
4 -0.6923077 -0.5502087 -10.385 -58.528 -106.671
3 -0.7692308 -0.6456884 -11.538 -68.036 -124.534
2 -0.8461538 -0.7485087 -12.692 -78.187 -143.681
1 -0.9230769 -0.8642669 -13.846 -89.470 -165.093
0 -1.0000000 -1.0000000 -15.000 -102.500 -190.000
Time Splitting Weights: ndtfast = 20 nfast = 29
Primary Secondary Accumulated to Current Step
1-0.0009651193358779 0.0500000000000000-0.0009651193358779 0.0500000000000000
2-0.0013488780126037 0.0500482559667939-0.0023139973484816 0.1000482559667939
3-0.0011514592651645 0.0501156998674241-0.0034654566136461 0.1501639558342180
4-0.0003735756740661 0.0501732728306823-0.0038390322877122 0.2003372286649003
5 0.0009829200513762 0.0501919516143856-0.0028561122363360 0.2505291802792859
6 0.0029141799764308 0.0501428056118168 0.0000580677400948 0.3006719858911028
7 0.0054132615310267 0.0499970966129953 0.0054713292711215 0.3506690825040981
8 0.0084687837865133 0.0497264335364439 0.0139401130576348 0.4003955160405420
9 0.0120633394191050 0.0493029943471183 0.0260034524767397 0.4496985103876603
10 0.0161716623600090 0.0486998273761630 0.0421751148367487 0.4983983377638233
11 0.0207585511322367 0.0478912442581626 0.0629336659689855 0.5462895820219859
12 0.0257765478740990 0.0468533167015507 0.0887102138430845 0.5931428987235365
13 0.0311633730493854 0.0455644893078458 0.1198735868924699 0.6387073880313823
14 0.0368391158442262 0.0440063206553765 0.1567127027366961 0.6827137086867587
15 0.0427031802506397 0.0421643648631652 0.1994158829873358 0.7248780735499240
16 0.0486309868367617 0.0400292058506332 0.2480468698240975 0.7649072794005571
17 0.0544704302037592 0.0375976565087951 0.3025173000278567 0.8025049359093522
18 0.0600380921294286 0.0348741349986072 0.3625553921572853 0.8 WPB:metrics:pm=0 at i,j = 21
67
373790709079594
19 0.0651152103984763 0.0318722303921357 0.4276706025557617 0.8692513013000951
20 0.0694434033194840 0.0286164698722119 0.4971140058752457 0.8978677711723070
21 0.0727201499285570 0.0251442997062377 0.5698341558038027 0.9230120708785448
22 0.0745940258796570 0.0215082922098099 0.6444281816834597 0.9445203630883546
23 0.0746596950216180 0.0177785909158270 0.7190878767050777 0.9622989540041816
24 0.0724526566618460 0.0140456061647461 0.7915405333669236 0.9763445601689278
25 0.0674437485167025 0.0104229733316538 0.8589842818836262 0.9867675335005817
26 0.0590334053485719 0.0070507859058187 0.9180176872321981 0.9938183194064003
27 0.0465456732896125 0.0040991156383901 0.9645633605218106 0.9979174350447904
28 0.0292219798521904 0.0017718319739095 0.9937853403740009 0.9996892670186999
29 0.0062146596259991 0.0003107329813000 1.0000000000000000 0.9999999999999998
ndtfast, nfast = 20 29 nfast/ndtfast = 1.45000
Centers of gravity and integrals (values must be 1, 1, approx 1/2, 1, 1) WPB:metrics:pm=0 at i,j =
0 67
:
1.000000000000 1.060707743385 0.530353871693 1.000000000000 1.000000000000
Power filter parameters, Fgamma, gamma = 0.28400 0.14200
exit status of rank 3: killed by signal 6
rank 2 in job 11 valborg_34829 caused collective abort of all ranks
exit status of rank 2: killed by signal 6
------------------------------------------------------------------------------------------
And the run time error output (from using the -g -fpe0 -traceback ifort options) is:
------------------------------------------------------------------------------------------
/heim/paul/roms_3.2_trunk % mpiexec -n 4 oceanG Apps/RIVERPLUME1/ocean_riverplume1.in > riverplume1.out
forrtl: error (73): floating divide by zero
Image PC Routine Line Source
oceanG 00000000005780D2 metrics_mod_mp_me 197 metrics.f90
oceanG 00000000005758F6 metrics_mod_mp_me 57 metrics.f90
oceanG 00000000004053ED initial_ 129 initial.f90
oceanG 0000000000404867 Unknown Unknown Unknown
oceanG 000000000040462D MAIN__ 97 master.f90
oceanG 00000000004044EA Unknown Unknown Unknown
libc.so.6 0000003DA8B1C40B Unknown Unknown Unknown
oceanG 000000000040442A Unknown Unknown Unknown
forrtl: error (73): floating divide by zero
Image PC Routine Line Source
oceanG 00000000005780D2 metrics_mod_mp_me 197 metrics.f90
oceanG 00000000005758F6 metrics_mod_mp_me 57 metrics.f90
oceanG 00000000004053ED initial_ 129 initial.f90
oceanG 0000000000404867 Unknown Unknown Unknown
oceanG 000000000040462D MAIN__ 97 master.f90
oceanG 00000000004044EA Unknown Unknown Unknown
libc.so.6 0000003DA8B1C40B Unknown Unknown Unknown
oceanG 000000000040442A Unknown Unknown Unknown
[cli_0]: aborting job:
Fatal error in MPI_Wait: Other MPI error, error stack:
MPI_Wait(140).............................: MPI_Wait(request=0x7fbfffe010, status0x96a414) failed
MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(413):
MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=4,errno=104:Connection reset by peer)
[cli_1]: aborting job:
Fatal error in MPI_Wait: Other MPI error, error stack:
MPI_Wait(140).............................: MPI_Wait(request=0x7fbfffe010, status0x96a414) failed
MPIDI_CH3_Progress_wait(212)..............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(413):
MPIDU_Socki_handle_read(633)..............: connection failure (set=0,sock=4,errno=104:Connection reset by peer)
10 /heim/paul/roms_3.2_trunk %
--------------------------------------------------------------------
I used to be able to run MPI with the NS_PERIODIC option and grid files, but I'm not sure when this problem arose. I will run with some earlier versions to try to identify when the problem first cropped up.