[Beowulf] Poor bandwith from one compute node
Gus Correa
gus at ldeo.columbia.edu
Thu Aug 17 11:40:22 PDT 2017
On 08/17/2017 12:35 PM, Joe Landman wrote:
>
>
> On 08/17/2017 12:00 PM, Faraz Hussain wrote:
>> I noticed an mpi job was taking 5X longer to run whenever it got the
>> compute node lusytp104 . So I ran qperf and found the bandwidth
>> between it and any other nodes was ~100MB/sec. This is much lower than
>> ~1GB/sec between all the other nodes. Any tips on how to debug
>> further? I haven't tried rebooting since it is currently running a
>> single-node job.
>>
>> [hussaif1 at lusytp114 ~]$ qperf lusytp104 tcp_lat tcp_bw
>> tcp_lat:
>> latency = 17.4 us
>> tcp_bw:
>> bw = 118 MB/sec
>> [hussaif1 at lusytp114 ~]$ qperf lusytp113 tcp_lat tcp_bw
>> tcp_lat:
>> latency = 20.4 us
>> tcp_bw:
>> bw = 1.07 GB/sec
>>
>> This is separate issue from my previous post about a slow compute
>> node. I am still investigating that per the helpful replies. Will post
>> an update about that once I find the root cause!
>
> Sounds very much like it is running over gigabit ethernet vs
> Infiniband. Check to make sure it is using the right network ...
Hi Faraz
As others have said answering your previous posting about Infiniband:
- Check if the node is configured the same way as the other nodes,
in the case of Infinband, if the MTU is the same,
using connected or datagram mode, etc.
**
Besides, for Open MPI you can force it at runtime not to use tcp:
--mca btl ^tcp
or with the syntax in this FAQ:
https://www.open-mpi.org/faq/?category=openfabrics#ib-btl
If that node has an Infinband interface with a problem,
this should at least give a clue.
**
In addition, check the limits in the node.
That may be set by your resource manager,
or in /etc/security/limits.conf
or perhaps in the actual job script.
The memlock limit is key to Open MPI over Infiniband.
See FAQ 15, 16, 17 here:
https://www.open-mpi.org/faq/?category=openfabrics
**
Moreover, check if the mlx4_core.conf (assuming it is Mellanox HW)
is configured the same way across the nodes:
/etc/modprobe.d/mlx4_core.conf
See FAQ 18 here:
https://www.open-mpi.org/faq/?category=openfabrics
**
To increase the btl diagnostic verbosity (that goes to STDERR, IRRC):
--mca btl_base_verbose 30
That may point out which interfaces are actually being used, etc.
See this FAQ:
https://www.open-mpi.org/faq/?category=all#diagnose-multi-host-problems
**
Finally, as John has suggested before, you may want to
subscribe to the Open MPI mailing list,
and ask the question there as well:
https://www.open-mpi.org/community/help/
https://www.open-mpi.org/community/lists/
There you will get feedback from the Open MPI developers +
user community, and that often includes insights from
Intel and Mellanox IB hardware experts.
**
I hope this helps.
Gus Correa
>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
More information about the Beowulf
mailing list