large workload for benchmarking

Post ROMS benchmark results

Moderators: arango, robertson

Post Reply
Message
Author
gnayar

large workload for benchmarking

#1 Unread post by gnayar »

Hi,
I have been using the benchmark workload (2048x256 grid size) for doing optimizations on Intel Xeon Phi. This turns out to be too small for scaling up to a cluster since each node adds 120 ranks and the grid size is too small for dividing up among so many ranks.

Does anyone have suggestions for other workloads from ROMS that we could try?

-gopal

User avatar
wilkin
Posts: 922
Joined: Mon Apr 28, 2003 5:44 pm
Location: Rutgers University
Contact:

Re: large workload for benchmarking

#2 Unread post by wilkin »

Just double the grid size parameter Lm and Mm yet again.

If you diff the benchmarks 2 and 3 cases you will see:

Code: Select all

queequeg:External wilkin$ diff ocean_benchmark3.in ocean_benchmark2.in
4c4
< !svn $Id: ocean_benchmark3.in 751 2015-01-07 22:56:36Z arango $
---
> !svn $Id: ocean_benchmark2.in 751 2015-01-07 22:56:36Z arango $
67c67
<        TITLE = Benchmark Test, Idealized Southern Ocean, Large Grid
---
>        TITLE = Benchmark Test, Idealized Southern Ocean, Medium Grid
94,95c94,95
<           Lm == 2048          ! Number of I-direction INTERIOR RHO-points
<           Mm == 256           ! Number of J-direction INTERIOR RHO-points
---
>           Lm == 1024          ! Number of I-direction INTERIOR RHO-points
>           Mm == 128           ! Number of J-direction INTERIOR RHO-points
You might need to decrease DT for stability.
John Wilkin: DMCS Rutgers University
71 Dudley Rd, New Brunswick, NJ 08901-8521, USA. ph: 609-630-0559 jwilkin@rutgers.edu

gnayar

Re: large workload for benchmarking

#3 Unread post by gnayar »

John, that is what I did for taking benchmark timings in symmetric mode - mpi ranks split between the host and the accelerator. With 120 ranks from Xeon Phi and 96 ranks from the host, the grid size of benchmark3 was too small to scale and I had to bump it up.

Now I can multiply it still further or look for another workload in ROMS that can be used as a cluster benchmark and hence the question.

Thank you John for your reply.

-gopal

Post Reply