MPI 1 by 1 tile configuration --segmentation error
-
- Posts: 55
- Joined: Tue Dec 13, 2005 7:23 pm
- Location: Univ of Maryland Center for Environmental Science
MPI 1 by 1 tile configuration --segmentation error
Dear all,
I'm trying to test whether ROMS MPI parallel run and serial runs give different results.
In my case I found that MPI 1 by 8 (NtileI=1, NtileJ=8 ) compiled by mpich1.2.6 gives somewhat different results as compared to serial run compiled by ifort. Since the two compilers are different, I wondered whether the diffierence is due to the compilers.
Then, I tried to run the MPI code (using "mpirun -np 1 ./oceanM mytoms.in" ) with 1 by 1 tiling so that all my runs are using the same compiler, it gives a segmentation error immediately. I know that we are not expected to run MPI code with 1 by 1 (only one processor), but we should be able to do that right?
Here are the run details:
> mpirun -np 1 ./oceanM ./toms_Nz30_kw.in
*****....* (I omitted some stuff here)
INITIAL: Configurating and initializing forward nonlinear model ...
bash: line 1: 20616 Segmentation fault /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=node2 MPIRUN_PORT=32780 MPIRUN_PROCESSES='master:' MPIRUN_RANK=0 MPIRUN_NPROCS=1 MPIRUN_ID=15931 ./oceanM ./toms_Nz30_kw.in
It seems that the run failed immediately in the initializing phase,,, do you guys know why?
I will update with some other tests such as the upwelling case.
Thanks,
Wen
I'm trying to test whether ROMS MPI parallel run and serial runs give different results.
In my case I found that MPI 1 by 8 (NtileI=1, NtileJ=8 ) compiled by mpich1.2.6 gives somewhat different results as compared to serial run compiled by ifort. Since the two compilers are different, I wondered whether the diffierence is due to the compilers.
Then, I tried to run the MPI code (using "mpirun -np 1 ./oceanM mytoms.in" ) with 1 by 1 tiling so that all my runs are using the same compiler, it gives a segmentation error immediately. I know that we are not expected to run MPI code with 1 by 1 (only one processor), but we should be able to do that right?
Here are the run details:
> mpirun -np 1 ./oceanM ./toms_Nz30_kw.in
*****....* (I omitted some stuff here)
INITIAL: Configurating and initializing forward nonlinear model ...
bash: line 1: 20616 Segmentation fault /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=node2 MPIRUN_PORT=32780 MPIRUN_PROCESSES='master:' MPIRUN_RANK=0 MPIRUN_NPROCS=1 MPIRUN_ID=15931 ./oceanM ./toms_Nz30_kw.in
It seems that the run failed immediately in the initializing phase,,, do you guys know why?
I will update with some other tests such as the upwelling case.
Thanks,
Wen
I have certainly run MPI ROMS on one processor. You are likely running out of memory because the initialization does a lot of memory allocation. Things to check:
1. "ulimit -a" or "limit" for system limits on your resource use. This could be memoryuse or datasize or both. You might want to increase your stacksize too, just in case.
2. compile-time options. It could be that one eighth of your job fits in under 2 GB while the whole thing is larger than that. BENCHMARK3 is an example of that, using 5-6 GB total. Running it on eight processors doesn't require 64-bit address space while running it on one does. On the IBM, there is also a link-time option that needs to be set for large memory, with the 32-bit default being 256 MB.
3. Queue limits - are you running in a batch queue with a memory limit?
1. "ulimit -a" or "limit" for system limits on your resource use. This could be memoryuse or datasize or both. You might want to increase your stacksize too, just in case.
2. compile-time options. It could be that one eighth of your job fits in under 2 GB while the whole thing is larger than that. BENCHMARK3 is an example of that, using 5-6 GB total. Running it on eight processors doesn't require 64-bit address space while running it on one does. On the IBM, there is also a link-time option that needs to be set for large memory, with the 32-bit default being 256 MB.
3. Queue limits - are you running in a batch queue with a memory limit?
mpi
We certainly also use MPI, often with only 1 processor during our simple test cases. On my new PC Winows dualcore laptop, I use Cygwin as a unix emulator, MPICH2 for windows (really easy to install and link against), and Intel ifort 9.0x compiler. Of course we run the 'real' jobs on the 72 node linux cluster, but it is also handy to test/debug MPI 'on the road' with the laptop.
I would be very interested to know if there are any MPI tiling issues and also would help to pursue a solution.
Is there a way for users to ftp data / model setups to the discussion board??
-john
I would be very interested to know if there are any MPI tiling issues and also would help to pursue a solution.
Is there a way for users to ftp data / model setups to the discussion board??
-john
-
- Posts: 55
- Joined: Tue Dec 13, 2005 7:23 pm
- Location: Univ of Maryland Center for Environmental Science
Thanks to kate and john for the discussion.
I believe it is, as kate said, due to memory limit. I have used unlimited stack size, and it still deosn't work for my case (100*150*30 grid). Yet I tested the upwelling case, it is okay.
I wonder whether kate and john know what compiler option I should use to best optimize memory requirements or use disk swap for virtual memory,,
Wen
I believe it is, as kate said, due to memory limit. I have used unlimited stack size, and it still deosn't work for my case (100*150*30 grid). Yet I tested the upwelling case, it is okay.
I wonder whether kate and john know what compiler option I should use to best optimize memory requirements or use disk swap for virtual memory,,
Wen
Your problem should certainly fit within the 2 GB 32-bit address space. You might be hitting up against 256 MB. You mention unlimited stacksize, but we're allocating memory on the heap, not the stack. Did you ask for unlimited datasize and memoryuse?
What system is this and how much memory does it have?
What system is this and how much memory does it have?
-
- Posts: 55
- Joined: Tue Dec 13, 2005 7:23 pm
- Location: Univ of Maryland Center for Environmental Science
Hi Kate, thanks for the detailed questions,
Here are my settings for the limits:
> ulimit -SHacdflmnpstuv
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 26617
virtual memory (kbytes, -v) unlimited
Here is the system I'm using
> uname -sa
Linux master 2.6.12-ck2-suse93-osmp #134 SMP Thu Jul 14 12:16:10 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
It has 9 nodes, each with 2 processors.
The total memory for each node is about: 3346476k~=3.34Gb, with about 1.8G free.
I can run the code for my case with serial compiling, I mean I can run the oceanS executable without anyproblem, only that it is not compiled with MPICH, instead, intelfortran is used for the serial compile. It could happen that MPI compiling requires more memory per process.
Wen
Here are my settings for the limits:
> ulimit -SHacdflmnpstuv
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 26617
virtual memory (kbytes, -v) unlimited
Here is the system I'm using
> uname -sa
Linux master 2.6.12-ck2-suse93-osmp #134 SMP Thu Jul 14 12:16:10 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
It has 9 nodes, each with 2 processors.
The total memory for each node is about: 3346476k~=3.34Gb, with about 1.8G free.
I can run the code for my case with serial compiling, I mean I can run the oceanS executable without anyproblem, only that it is not compiled with MPICH, instead, intelfortran is used for the serial compile. It could happen that MPI compiling requires more memory per process.
Wen
Plenty of memory in other words. I don't know. I'd try searching your compiler documentation for things like:
which is the IBM limit and at least used to be 256 MB.-bmaxdata=<bytes>
Specifies the maximum amount of space to reserve
for the program data segment for programs where the
size of these regions is a constraint. The default
is -bmaxdata=0.
-
- Posts: 19
- Joined: Wed Apr 23, 2003 1:34 pm
- Location: IMR, Bergen, Norway
I was getting similar problems on an HP Itanium cluster. What happened there was that the limits in the user shell were not transferred by default to the MPI job. The problem was fixed in that case by adding the
-inherit_limits option on the mpiexec command line. Then the unlimited stacksize setting was transferred to all the processors.
Paul
-inherit_limits option on the mpiexec command line. Then the unlimited stacksize setting was transferred to all the processors.
Paul
-
- Posts: 19
- Joined: Wed Apr 23, 2003 1:34 pm
- Location: IMR, Bergen, Norway
Actually, the -inherit_limits is an option for mpirun/mpimon . The point is that you should be sure that the limits you have set in your user shell are actually transmitted to the MPI process. If not, then if you are using ifort, for example, you will likely get a segmentation fault because the default stacksize allocation is too small. To date, I have only found this problem to occur with mpimon from scali, but it could be something to check on in your installation.
Paul
Paul
I have had much success on linux machines, after increasing the stack size by applying
ulimit -s unlimited
I think the heap is typically unlimited (or very large) by default on most linux distributions. The BENCHMARK cases are good to see how well you can deal with very large domains.
Unfortunately on my (intel) Mac, it appears that ifort does not allow you to have arbitrarily large domains. I am still looking into this, but I think the issue is ultimately with the compiler, and I have seen some postings online that confirm this.
ulimit -s unlimited
I think the heap is typically unlimited (or very large) by default on most linux distributions. The BENCHMARK cases are good to see how well you can deal with very large domains.
Unfortunately on my (intel) Mac, it appears that ifort does not allow you to have arbitrarily large domains. I am still looking into this, but I think the issue is ultimately with the compiler, and I have seen some postings online that confirm this.