AI Server Lenovo ThinkSystem SR680a V3 | 8× NVIDIA HGX H200

Lenovo ThinkSystem SR680a V3 AI server project

AI server deployment and factory integration

Cluster networking with InfiniBand NDR 400G

Factory-tested high-density AI infrastructure

Project Description

A ready high-performance platform was required for training and inference of neural networks, capable of handling large language models (LLM), generative AI, and heavy HPC workloads.

Client task: Maximum computational density in limited rack space with industrial-grade 24/7 reliability.
Key requirement: High-speed cluster networking support with InfiniBand NDR and full subsystem compatibility out of the box.

What was done

Delivery of the server fully factory-ready (rails, cables, cooling systems)
Verification of the configuration’s operability
Commissioning (optionally on the client’s site, with setup and testing)

Solution Composition (Key Configuration)

The server was delivered as a single factory-tested Lenovo platform with the following configuration:

GPU:

8x NVIDIA HGX H200 141GB (Hopper architecture).
CPU:

2x Intel Xeon Platinum 8568Y+ (48 cores / 96 threads).
RAM:

2 TB DDR5-5600 (32x 64 GB).
System Storage:

2x M.2 NVMe 960 GB (RAID 1, boot).
Data Storage:

8x NVMe U.2 3.84 TB.
Network:

NVIDIA ConnectX-7 NDR 400G (InfiniBand) + NVIDIA BlueField-3 DPU.
Power:

8x 2600 W, N+N redundancy.
Security:

TPM 2.0, Secure Boot, hardware RAID/VROC for M.2.

Tasks Solved and Applications

Training large language models (LLM) and generative networks
Unified memory across 8 HGX H200 GPUs (141 GB per accelerator) allows placement and training of models with hundreds of billions of parameters on a single node without complex parallelization. High throughput within the HGX platform accelerates gradient synchronization.
Real-time inference
With 2 TB of RAM and high-frequency Intel Xeon Platinum CPUs, the server handles peak loads for AI services (chatbots, recommendation systems, content generation) with minimal latency.
Scientific and engineering calculations (HPC)
Hopper architecture with FP64 and transformer engine support delivers high performance in simulations, computational fluid dynamics, molecular modeling, and other resource-intensive tasks.
Working as part of an AI cluster
NVIDIA ConnectX‑7 NDR 400G and BlueField‑3 adapters allow the server to be integrated into a high-speed InfiniBand network. This enables horizontal scaling by combining multiple nodes into a single distributed training cluster.
Enterprise analytics and virtualization
2 TB of memory and powerful CPUs make the platform suitable for workload consolidation, large dataset processing, and deployment of high-load databases.

Why This Configuration Was Chosen

The client required an all-in-one solution with balanced computational power, memory bandwidth, storage speed, and network capabilities. Lenovo ThinkSystem SR680a V3 with 8× HGX H200 is an NVIDIA-certified AI and HPC platform, ensuring stable operation of drivers, frameworks (PyTorch, TensorFlow, Megatron), and orchestration tools (Kubernetes with GPU support).

Result for the Client

A ready-to-use platform was delivered, reducing time-to-production for AI services. Models that previously required multi-node clusters can now fit on a single server. Prepared for scaling: the NDR 400G network subsystem is ready for cluster integration without hardware replacement.

About Our AMPERE Server Line

In addition to supplying ready-made solutions from leading vendors, we develop our own AMPERE server line for AI, HPC, and enterprise infrastructure. AMPERE servers are based on the same key components as NVIDIA HGX, Intel, and AMD reference platforms but offer several advantages:

Configuration flexibility — servers are built precisely for client needs: number of GPUs (1–8), accelerator types (H200, H100, A100, L40S, RTX), memory size and speed, disk subsystem composition, network adapters (InfiniBand, Ethernet, RoCE).
Optimal delivery times — our manufacturing base allows faster delivery compared to long supply chains of major vendors.
Full quality control — each server undergoes extended burn-in testing, including GPU, memory, NVMe, and network interface checks under conditions close to real operation.
Adaptation to client infrastructure — we can implement custom power, cooling, form factors, and supply servers without extraneous components (e.g., no brand stickers) for integration into existing environments.

The AMPERE line includes models from compact single-CPU inference platforms to powerful 8-GPU nodes, fully equivalent in performance and reliability to enterprise-grade solutions. All servers come with warranty, technical support, and post-warranty service options.

Thus, the client has the choice: use a certified Lenovo solution (as in this case) or rely on our AMPERE development for unique configurations, shorter delivery times, or specific integration requirements.

Need a similar AI infrastructure?

We deliver certified vendor servers and custom AMPERE solutions for AI, HPC, and enterprise workloads. Contact us to discuss your target configuration and deployment timeline.

Contact Manager Other Projects

Delivery of AI Server Lenovo ThinkSystem SR680a V3 (8× NVIDIA HGX H200 GPUs)