Post-compilation problems with ROMS

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
lolhsson
Posts: 23
Joined: Wed Jun 02, 2010 9:07 pm
Location: UC Berkeley

Post-compilation problems with ROMS

#1 Unread post by lolhsson »

Hi all,

I've successfully compiled and run ROMS many times, but have never encountered a problem like this. I hadn't recompiled in the last several releases; when I finally needed to, it took a little bit of troubleshooting, in that I needed to revert the Linux-ifort.mk back to an older version that had my netCDF pathways set up properly (I tried to put them into the newest version and ROMS couldn't find netCDF, wouldn't compile). Once that had been done, compilation proceeded as normal.

However, when I actually executed oceanM, I got error messages I'd never seen before, not within the ROMS logfile but within the job error script, as follows:
[n0000:11863] *** Process received signal ***
[n0000:11863] Signal: Segmentation fault (11)
[n0000:11863] Signal code: Address not mapped (1)
[n0000:11863] Failing at address: 0x150
[n0000:11859] *** Process received signal ***
[n0000:11859] Signal: Segmentation fault (11)
[n0000:11859] Signal code: Address not mapped (1)
[n0000:11859] Failing at address: 0x150
[n0000:11863] [ 0] /lib64/libc.so.6 [0x2b06180932d0]
[n0000:11863] [ 1] oceanM(wclock_on_+0x97) [0x483baf]
[n0000:11863] [ 2] oceanM(distribute_mod_mp_mp_bcasti_0d_+0x24) [0x49c2de]
[n0000:11863] [ 3] oceanM(inp_par_+0x284) [0x42acf2]
[n0000:11863] [ 4] oceanM(ocean_control_mod_mp_roms_initialize_+0xbb) [0x423e9f]
[n0000:11863] [ 5] oceanM(MAIN__+0xfd) [0x423991]
[n0000:11863] [ 6] oceanM(main+0x2a) [0x423882]
[n0000:11863] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b0618080994]
[n0000:11863] [ 8] oceanM [0x4237a9]
[n0000:11863] *** End of error message ***
[n0000:11855] *** Process received signal ***
[n0000:11855] Signal: Segmentation fault (11)
[n0000:11855] Signal code: Address not mapped (1)
[n0000:11855] Failing at address: 0x150
[n0000:11855] [ 0] /lib64/libc.so.6 [0x2b640b8602d0]
[n0000:11855] [ 1] oceanM(wclock_on_+0x97) [0x483baf]
[n0000:11855] [ 2] oceanM(distribute_mod_mp_mp_bcasti_0d_+0x24) [0x49c2de]
[n0000:11855] [ 3] oceanM(inp_par_+0x284) [0x42acf2]
[n0000:11855] [ 4] oceanM(ocean_control_mod_mp_roms_initialize_+0xbb) [0x423e9f]
[n0000:11855] [ 5] oceanM(MAIN__+0xfd) [0x423991]
[n0000:11855] [ 6] oceanM(main+0x2a) [0x423882]
[n0000:11855] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b640b84d994]
[n0000:11855] [ 8] oceanM [0x4237a9]
[n0000:11855] *** End of error message ***
[n0000:11857] *** Process received signal ***
[n0000:11857] Signal: Segmentation fault (11)
[n0000:11857] Signal code: Address not mapped (1)
[n0000:11857] Failing at address: 0x150
[n0000:11857] [ 0] /lib64/libc.so.6 [0x2b9f2187d2d0]
[n0000:11857] [ 1] oceanM(wclock_on_+0x97) [0x483baf]
[n0000:11857] [ 2] oceanM(distribute_mod_mp_mp_bcasti_0d_+0x24) [0x49c2de]
[n0000:11857] [ 3] oceanM(inp_par_+0x284) [0x42acf2]
[n0000:11857] [ 4] oceanM(ocean_control_mod_mp_roms_initialize_+0xbb) [0x423e9f]
[n0000:11857] [ 5] oceanM(MAIN__+0xfd) [0x423991]
[n0000:11857] [ 6] oceanM(main+0x2a) [0x423882]
[n0000:11857] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b9f2186a994]
[n0000:11857] [ 8] oceanM [0x4237a9]
[n0000:11857] *** End of error message ***
[n0000:11865] *** Process received signal ***
[n0000:11865] Signal: Segmentation fault (11)
[n0000:11865] Signal code: Address not mapped (1)
[n0000:11865] Failing at address: 0x150
[n0000:11865] [ 0] /lib64/libc.so.6 [0x2b049d2342d0]
[n0000:11865] [ 1] oceanM(wclock_on_+0x97) [0x483baf]
[n0000:11865] [ 2] oceanM(distribute_mod_mp_mp_bcasti_0d_+0x24) [0x49c2de]
[n0000:11865] [ 3] oceanM(inp_par_+0x284) [0x42acf2]
[n0000:11865] [ 4] oceanM(ocean_control_mod_mp_roms_initialize_+0xbb) [0x423e9f]
[n0000:11865] [ 5] oceanM(MAIN__+0xfd) [0x423991]
[n0000:11865] [ 6] oceanM(main+0x2a) [0x423882]
[n0000:11865] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b049d221994]
[n0000:11865] [ 8] oceanM [0x4237a9]
[n0000:11865] *** End of error message ***
[n0000:11862] *** Process received signal ***
[n0000:11862] Signal: Segmentation fault (11)
[n0000:11862] Signal code: Address not mapped (1)
[n0000:11862] Failing at address: 0x150
[n0000:11862] [ 0] /lib64/libc.so.6 [0x2aecd73df2d0]
[n0000:11862] [ 1] oceanM(wclock_on_+0x97) [0x483baf]
[n0000:11862] [ 2] oceanM(distribute_mod_mp_mp_bcasti_0d_+0x24) [0x49c2de]
[n0000:11862] [ 3] oceanM(inp_par_+0x284) [0x42acf2]
[n0000:11862] [ 4] oceanM(ocean_control_mod_mp_roms_initialize_+0xbb) [0x423e9f]
[n0000:11862] [ 5] oceanM(MAIN__+0xfd) [0x423991]
[n0000:11862] [ 6] oceanM(main+0x2a) [0x423882]
[n0000:11862] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2aecd73cc994]
[n0000:11862] [ 8] oceanM [0x4237a9]
[n0000:11862] *** End of error message ***
Several more of these seg faults, long enough that I won't paste here, and then:
[n0000.hadley:11853] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275
[n0000.hadley:11853] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_tm_module.c at line 572
[n0000.hadley:11853] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
mpirun noticed that job rank 0 with PID 11855 on node n0000.hadley exited on signal 11 (Segmentation fault).
[n0000.hadley:11853] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188
[n0000.hadley:11853] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_tm_module.c at line 603
--------------------------------------------------------------------------mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
[n0000.hadley:11854] OOB: Connection to HNP lost
I've never had things go so wrong that the ROMS logfile itself couldn't elucidate why things might be blowing up, but clearly something isn't right. (For the record, the ROMS logfile gets about two lines in and gives me a READ_PHYPAR (can't find Ngrids) error but that's presumably a product of oceanM's failure to communicate.)

I went ahead and repeated this process (compiling and attempting to execute) with several old, previously perfectly functional build scripts and input files, and all of them seemed to compile, and then came back with similar oceanM executable segmentation faults when run.

Please, does anyone have any ideas what's going wrong? Is the difference in formatting in the Linux-ifort.mk file in the new version that critical?

The new one looks like:

Code: Select all

ifdef USE_NETCDF4
        NC_CONFIG ?= nc-config
    NETCDF_INCDIR ?= $(shell $(NC_CONFIG) --prefix)/include
             LIBS := $(shell $(NC_CONFIG) --flibs)
else
    NETCDF_INCDIR ?= /usr/local/include
    NETCDF_LIBDIR ?= /usr/local/lib
             LIBS := -L$(NETCDF_LIBDIR) -lnetcdf
endif
While the old one looks like (with my specific pathways filled in, this is the ROMS default):

Code: Select all

ifdef USE_NETCDF4
    NETCDF_INCDIR ?= /opt/intelsoft/netcdf4/include
    NETCDF_LIBDIR ?= /opt/intelsoft/netcdf4/lib
      HDF5_LIBDIR ?= /opt/intelsoft/hdf5/lib
else
    NETCDF_INCDIR ?= /opt/intelsoft/netcdf/include
    NETCDF_LIBDIR ?= /opt/intelsoft/netcdf/lib
endif
It may be as simple as figuring out where my old pathways go in the new version (I do use the netcdf4 option) -- the mapping isn't 1:1. What is NC_CONFIG?

Best,
Liz

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Post-compilation problems with ROMS

#2 Unread post by kate »

You mention trying old build scripts and input files. Do you have a copy of the old code? That's what I would try next to make sure that still works. If not, you can ask the system guys if they changed anything.

Depending on which updates you did, you might need to change your input files.

nc-config is an executable that comes with the most recent versions of NetCDF. Can you do "which nc-config"?

Code: Select all

pacman2 432% which nc-config
/usr/local/pgi/bin/nc-config
The build with USE_NETCDF4 is now looking for this executable.

If I get a seg fault, I recompile with USE_DEBUG with its check for array bounds. If the seg fault is not about array bounds, I use a debugger.

BTW, Ngrids used to be set in the makefile and is now set in ocean.in.

lolhsson
Posts: 23
Joined: Wed Jun 02, 2010 9:07 pm
Location: UC Berkeley

Re: Post-compilation problems with ROMS

#3 Unread post by lolhsson »

Hi Kate -- thanks for the response.

It actually occurred to me that I haven't updated my .in file in a long time; I know that a variety of things have changed, like how boundary conditions are now defined in .in instead of .h. So that's something to do, but I feel like it doesn't really explain the segmentation faults in oceanM (or would it?). My old oceanM still runs fine with my old ocean.in.

As an aside, I am curious -- now that .in holds boundary conditions, do you have to recompile every time you edit them? Or was avoiding that necessity the point of moving them from .h to .in?

'which nc-config' comes up empty, but I feel like which isn't looking in the right places:
/usr/bin/which: no nc-config in (/global/software/centos-5.x86_64/modules/openmpi/1.2.8-intel/bin:/global/software/centos-5.x86_64/modules/intel/cce/10.1.018/bin:/global/software/centos-5.x86_64/modules/intel/fce/10.1.018/bin:/usr/kerberos/bin:/clusterfs/ohana/software/bin:/usr/local/bin:/bin:/usr/bin:/global/sched/centos-5.x86_64/gold/sbin:/global/sched/centos-5.x86_64/gold/bin:/global/software/centos-5.x86_64/moab/sbin:/global/software/centos-5.x86_64/moab/bin:/global/home/groups/allhands/bin:/global/home/users/lolhsson/bin:/global/software/centos-5.x86_64/modules/ncl/5.1.0-intel)
Since none of those paths are the netCDF 4.0 folders.

I tried turning USE_DEBUG on and running oceanG; came up with similar errors to oceanM. I'm not sure what I should be looking for in terms of array bounds -- would this show up in the logfile, the error file?

Thanks,
Liz

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: Post-compilation problems with ROMS

#4 Unread post by kate »

You really have to have a consistent ocean.in when running roms.
BTW, Ngrids used to be set in the makefile and is now set in ocean.in.
For the nc-config thing, do you have ncdump in your path? That's an older executable that comes with the NetCDF library. You want to have both on your path. A web search on "unix path" leads to this.

The boundary conditions were moved for the future when each of our Ngrids wants different options. With cpp, that would be rather a challenge.

rduran
Posts: 152
Joined: Fri Jan 08, 2010 7:22 pm
Location: Theiss Research

Re: Post-compilation problems with ROMS

#5 Unread post by rduran »

do you have to recompile every time you edit them?
Seems you should always stick to the bry conditions when you are restarting at least:

viewtopic.php?f=17&t=2411&hilit=NLM_LBC

But as far as running different experiments with the same executable and different BC in the .in file, seems that is OK, which is a nice thing.

lolhsson
Posts: 23
Joined: Wed Jun 02, 2010 9:07 pm
Location: UC Berkeley

Re: Post-compilation problems with ROMS

#6 Unread post by lolhsson »

Yeah, I really enjoy not having to do so anymore -- especially since all this arose from the challenge of recompiling after editing BCs, since last time I'd checked that was still necessary (as BCs were stored in .h)!

Thanks again for the responses, you two. :) I do appreciate your time.

User avatar
m.hadfield
Posts: 521
Joined: Tue Jul 01, 2003 4:12 am
Location: NIWA

Re: Post-compilation problems with ROMS

#7 Unread post by m.hadfield »

Hi Liz

You say you have netCDF libraries here?

Code: Select all

    NETCDF_INCDIR ?= /opt/intelsoft/netcdf4/include
    NETCDF_LIBDIR ?= /opt/intelsoft/netcdf4/lib
Then nc-config should be in /opt/intelsoft/netcdf4/bin.

To help the ROMS make system find it, you can put that directory on the path or specify the following in build.bash:

Code: Select all

export NC_CONFIG=/opt/intelsoft/netcdf4/bin/nc-config
nc-config was introduced a couple of years ago, IIRC. If your netCDF4 installation lacks it, you should consider updating. Alternatively, it's not that hard to adapt a copy of nc-config from a newer version, adapted for your installation. Install it wherever you like and point to it with NC_CONFIG.

Post Reply