two solutions for sst with the same configuration

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
xiaocongM
Posts: 12
Joined: Mon Mar 23, 2020 8:16 pm
Location: Xiamen University

two solutions for sst with the same configuration

#1 Unread post by xiaocongM »

Dear ROMS users,
I run my application on different platforms. Since I using the same configurations(cpp, initial, boundary, forcing), I expect the solutions to be the same or at least not far from each other. However, I got two very different solutions. The only difference between the two cases is the number of tiles. I give the relevant information in the following pictures. I would be very appreciate if anyone could give me some guidance.
the first figure is SST, the second is information about the platforms.
temp.png
fuzhou.png
xuexiao.png

pmaccc
Posts: 74
Joined: Wed Oct 22, 2003 6:59 pm
Location: U. Wash., USA

Re: two solutions for sst with the same configuration

#2 Unread post by pmaccc »

The image on the right looks like it is having problems getting open boundary information into the domain. Are you using exactly the same open boundary conditions in the dot-in, and the same values in the boundary file? Also if you are using nudging to climatology, is the climatology the same, and is the file of nudging timescales the same?

User avatar
ckharris
Posts: 40
Joined: Wed Nov 03, 2004 4:37 pm
Location: VIMS
Contact:

Re: two solutions for sst with the same configuration

#3 Unread post by ckharris »

I had a similar problem several years ago where we got different solutions depending on how many tiles we used. It turned out to be a bug in the compiler that we were using at the time. The differences went away when we did not use the optimization in the compiler command (the -O3) but then the model ran very slowly. For us: we switched compilers and that fixed the problem.

In your case:
The two "compiler commands" are different, one is mpif90 and the other is mpiifort. Have you tried testing the tiling but using the same compiled code?

Good luck.
Courtney Harris
Professor
Virginia Institute of Marine Sciences
http://www.vims.edu/about/directory/fac ... ris_ck.php

xiaocongM
Posts: 12
Joined: Mon Mar 23, 2020 8:16 pm
Location: Xiamen University

Re: two solutions for sst with the same configuration

#4 Unread post by xiaocongM »

Thanks for your promptly reply! Yes, I have checked all the files you referred, and find out that they are the same.
pmaccc wrote: Sun Oct 15, 2023 10:52 pm The image on the right looks like it is having problems getting open boundary information into the domain. Are you using exactly the same open boundary conditions in the dot-in, and the same values in the boundary file? Also if you are using nudging to climatology, is the climatology the same, and is the file of nudging timescales the same?
Thanks for your kindly reply. Yes, I had run the same case with different number of tiles, but the issue still exist. A teacher told me that the nodes which is available to many people may be in trouble. I will try private nodes to see if it will be fine.
ckharris wrote: Mon Oct 16, 2023 5:11 pm I had a similar problem several years ago where we got different solutions depending on how many tiles we used. It turned out to be a bug in the compiler that we were using at the time. The differences went away when we did not use the optimization in the compiler command (the -O3) but then the model ran very slowly. For us: we switched compilers and that fixed the problem.

In your case:
The two "compiler commands" are different, one is mpif90 and the other is mpiifort. Have you tried testing the tiling but using the same compiled code?
Good luck.

User avatar
jivica
Posts: 172
Joined: Mon May 05, 2003 2:41 pm
Location: The University of Western Australia, Perth, Australia
Contact:

Re: two solutions for sst with the same configuration

#5 Unread post by jivica »

It is not the problem of running on the different nodes on the cluster.
They are working or crashing the model in case memory problem etc.

You need to recompile your model with less aggressive optimization flags (i.e. do not use -O3) and then try again with the exactly the same inputs.
Different compilers (gfortran, ifort, cray) have different flags, so be careful about mix and match.

For example, I had a problem with compiling ARPACK/PARPACK and using -O3 flag (needed when you run 4D-VAR).

It is still striking to me that the difference is so large after not so long simulation.

Cheers,
Ivica

xiaocongM
Posts: 12
Joined: Mon Mar 23, 2020 8:16 pm
Location: Xiamen University

Re: two solutions for sst with the same configuration

#6 Unread post by xiaocongM »

Thanks for your kindly guidance. I tried several cpp flags, but none of them work. I gave up working on this server. Apologize for reply too late.

xiaocongM
Posts: 12
Joined: Mon Mar 23, 2020 8:16 pm
Location: Xiamen University

Re: two solutions for sst with the same configuration

#7 Unread post by xiaocongM »

Thanks for your kindly guidance. I tried several cpp flags, but none of them work. I gave up working on this server. Apologize for reply too late.
jivica wrote: Thu Oct 19, 2023 8:21 am It is not the problem of running on the different nodes on the cluster.
They are working or crashing the model in case memory problem etc.

You need to recompile your model with less aggressive optimization flags (i.e. do not use -O3) and then try again with the exactly the same inputs.
Different compilers (gfortran, ifort, cray) have different flags, so be careful about mix and match.

For example, I had a problem with compiling ARPACK/PARPACK and using -O3 flag (needed when you run 4D-VAR).

It is still striking to me that the difference is so large after not so long simulation.

Cheers,
Ivica

Post Reply