The purpose of this article is a quick review of HPC Cluster components and setup, Hope this helps.
- Windows HPC Server 2008
- Thing to Remember in a InfiniBand network
- What to expect from InfiniBand:
- Software Architecture for Windows 2008 HPC server
- Common issues with HPC 2008 Server.
Windows HPC Server 2008
Windows HPC Server 2008 is composed of a cluster of servers that includes a single head node and one or more compute nodes, the head node controls and mediates all access to the cluster resources and is the single point of management, deployment, and job scheduling for the compute cluster.Note: You can install the client HPC pack on XP, win2k3xx machines to manage the HPC cluster.
Hardware Architecture of a HPC cluster
Best practice recommends implementing HPC Cluster on 3 different networks
- Application network (InfiniBand or 10 Gig Ethernet-MPI traffic)
- Private network (To manage the HPC cluster, PXE boot, Deployment, Management traffic)
- Enterprise network (connected to the corporate network)
Things to Remember in a InfiniBand network
InfiniBand:- For ultimate performance customers should have a non blocking fabric (that means no contention over the InfiniBand network)
- Subnet manager is one of the most important components for the InfiniBand environment, this software can be installed on the switch or on a standalone machine,
- Subnet Manager is responsible for discovering all paths between InfiniBand HCA'S (Host card adapters),
- If subnet manager is not installed or enabled, the switch ports will not initialise in the first place.
- To verify if subnet manager is working and enabled, you should check if the port status is Up (from the switch Gui) and at the back of the switch the blue status light should change from blinking to static.
- To get logs from the switch, telnet to the switch and run the ‘Capture' command
- First step towards troubleshooting InfiniBand switch issues is to start Analysing the port status and symbol errors; there should be no more than 1 or 2 symbol errors per port.
- To install the subnet manager you need a separate licence key, If the customer has bought InfiniBand components, customer will receive a CD with a serial number, customer needs to call the Qlogic support line and provide that serial number, from which they will generate a license number for subnet manager.
- Drivers for the InfiniBand HBA.there are 2 types of drivers available for these HCA'S
Mellanox (WINOF)
Qlogic
- From industry experience, many people have recommended to install WINOF driver rather than the Qlogic driver for optimal performance. Standard way of measuring IB performance in a HPC is to measure the MPI throughput and latency. Windows 2008 HPC server has built in tools to measure this. E.g (MPI ping pong)
What to expect from InfiniBand:
- IB uses 2.5GB/s per link
- SDR - 4 links - 10GB/s
- DDR - 4 links - 20GB/s
- DDR should translate to 900/1350MB/s bandwidth
In non-blocking configuration
- IB switch has no routing capability (you need subnet manager for that)
- Can configure LAGs (rare) no zoning or VLANs
- No MAC/Worldwide Port Name - uses LID identifier
- Subnet Manager contains the pathing logic
- Can be activated on the switch (enter licence key)
- Configured as Host machine for larger scale clusters
Software Architecture for Windows 2008 HPC server
Windows HPC Server 2008 installation involves installing the operating system on the head node, joining it to an Active Directory domain, and then installing HPC Pack 2008.Windows HPC Server 2008 supports five different network topologies
- Compute nodes isolated on a private network
- All nodes on both public and private network
- Compute nodes isolated on private and application network
- All nodes on enterprise, private, and application networks
- All nodes only on enterprise network
Application network is connected to a high speed network (Gigabit Ethernet, InfiniBand) for MPI Communication.
- What is MPI? Message passing Interface.
- MS-MPI is a high-speed networking interface that runs over Gigabit Ethernet, InfiniBand, Myrinet, or any network that provides a Winsock Direct, NetworkDirect, or TCP/IP interface.
- MPI is used as the communication protocol for application traffic. Fundamentally, MPI is the interconnection between nodes on an HPC cluster. MPI ties nodes together.
-
- What is network direct and Winsock Direct,
- ND Is a new Remote Direct Memory Access (RDMA) network interface providing dramatic latency improvements for MPI applications running over high-speed fabrics
- The NetworkDirect protocol bypasses Windows Sockets (Winsock) and the TCP/IP stack, using Remote Direct Memory Access (RDMA) on supported hardware to improve l and reduce CPU overhead. I have pasted the diagram from Microsoft site which they use to explain ND.
-
- When you install the HBA driver, you need to install the NETWORK Direct or Winsock Direct components separately. Please refer to driver manual for instructions.
- If you are using the Mellanox driver package, ‘installsp -L ‘ command will list if NetworkDirect or WinsockDirect is installed on the system. If none of these components are present, HPC will use the OS TCP/IP stack for communication, this will reduce performance greatly.
- To verify if Network direct is being used, during the network configuration wizard the network interface used for the application network should be detected as TRUE.
Common issues with HPC Server 2008.
Depending on the problem description, gather the appropriate logsPerformance issue:
- My CPU is 100 % utilized, however I am not getting the performance I expected.
Check which drivers are they using, Mellanox Winof or Qlogic, are they using network direct/ Winsock or TCP/IP.
- Why is my CPU not being used 100 %?
A common place to start would be by checking underlying infrastructure configuration, if there is any contention in the MPI (application) network, this may happen. Start by analyzing application network design then troubleshooting switch issues following that up with driver verification.
- How do I inject my HBA driver and also install the Network Direct component via the node template generation wizard?
You will need the INI files for the driver to do both at the same time; this operation cannot be done via the MSI setup.
Hope this helps
Huzeifa Bhai
No comments:
Post a Comment