How to choose an AI server in 2026: Architecture, hardware, and power estimation

Key takeaways:

AI server choice strictly depends on workload type: for Training, interconnect bandwidth (NVLink) and VRAM capacity are critical; for Inference, low latency and tensor performance (TOPS/TFLOPS) matter most.
Graphics processors (GPU) are the foundation of AI. Selection is built around accelerator architecture (for example, NVIDIA Hopper/Blackwell or AMD Instinct) and its interaction with CPU.
Storage «Bottleneck» is solved only with NVMe SSD arrays using PCIe 5.0/6.0, capable of continuously feeding data to GPU.
Power consumption of one modern AI server (based on 8x GPU) can reach 10-12 kW, requiring liquid cooling systems (DLC).

1. Task split: Training vs Inference

Before selecting a configuration, you must define AI workload type. A server for building a neural network is fundamentally different from a server for running it.

Training: The process of creating a model (LLM, image generation). It requires massive compute power, huge VRAM capacity for storing weights, and ultra-fast communication between multiple GPU.
Inference: Using a trained model to answer user requests. Here response speed (Latency), energy efficiency, and parallel processing of thousands of small requests are more important. For inference, servers with 1-4 less powerful GPU (for example, NVIDIA L40S or RTX Ada Generation) are often sufficient.

2. Key AI server components

AI server architecture is built around graphics accelerators, but without proper balance of other components, expensive GPU will stay idle waiting for data.

Graphics accelerators (GPU)

GPU are the heart of artificial intelligence. Unlike CPU with dozens of cores, GPU have tens of thousands of small cores, ideal for parallel matrix computation.

VRAM capacity: To run open LLM (for example, Llama 3 70B), about 40-80 GB VRAM is required (depending on quantization). For training models with billions of parameters, servers with total memory from 640 GB are needed (for example, 8x NVIDIA H100 80GB or B200 configurations).
Interconnect: If a server has more than 2 GPU, standard PCIe communication becomes a bottleneck. You should select platforms with high-speed direct data exchange technologies between graphics cards (NVIDIA NVLink / NVSwitch or AMD Infinity Fabric).

Central processing unit (CPU)

In an AI server, CPU handles data preprocessing, task orchestration, and network traffic routing.

Requirements: It is recommended to install two high-frequency processors (Intel Xeon Scalable 5th/6th generation or AMD EPYC 4th/5th generation) with a large number of PCIe lanes for direct NVMe storage and network adapter connectivity.

System memory (RAM)

Sizing rule: System RAM volume should be at least 2x total VRAM volume of all installed graphics cards. If the server has 8 GPU with 80 GB each (total 640 GB VRAM), you will need 1.5 to 2 TB DDR5 system memory with ECC.

Storage subsystem (Storage)

Machine learning requires continuous reading of massive datasets. Using HDD or slow SATA SSD is unacceptable — GPU will be idle up to 90% of the time.

Standard: Only enterprise U.2/U.3 or E1.S/E3.S NVMe SSD with PCIe 5.0 support. Array read speed should be from 30 to 60 GB/s.

3. Comparison table: Choosing GPU for AI workloads (Current for 2026)

GPU model	Memory capacity (VRAM)	Primary use case	Power consumption	Key features
NVIDIA B200 / GB200	192 GB (HBM3e)	Training heavy LLM (GPT-class)	1000 W+	Extreme bandwidth, liquid cooling.
NVIDIA H100 / H200	80 GB / 141 GB	Universal: Training and complex Inference	700 W	Industry standard for data centers.
AMD Instinct MI300X	192 GB	Training and Inference of large models	750 W	Cost-effective alternative with massive memory buffer.
NVIDIA L40S	48 GB (GDDR6a)	Inference, video analytics, 3D rendering	350 W	Does not require NVLink, fits perfectly into standard PCIe servers.

4. Network infrastructure (Networking)

If a single server is not enough, AI clusters are combined in data centers. For synchronizing neural network weights between servers, standard 10G/25G Ethernet is not sufficient.

You will need adapters (SmartNIC / DPU) supporting 400GbE or 800GbE, as well as InfiniBand or RoCE v2 (RDMA over Converged Ethernet) for direct access to other servers' memory bypassing CPU.

5. Cooling and power: Data center physical limits

A modern 4U-8U server equipped with eight top-tier graphics accelerators consumes from 8 to 12 kilowatts (kW) of electricity.

Power: Ensure that your data center rack can handle such load for one unit block. Reinforced power supplies will be required (from 3000 W each with N+N or N+1 redundancy).

Cooling: Classic air cooling (cold-aisle air) is no longer enough for flagship solutions (Blackwell class). Consider server platforms with factory-integrated DLC (Direct Liquid Cooling) — direct liquid cooling on CPU and GPU chips.

Conclusion

Choosing a server for artificial intelligence is about finding bottlenecks. There is no point in buying eight of the most expensive NVIDIA or AMD graphics accelerators if you save on NVMe storage or network cards that cannot provide required data feed speed. For starting and inference testing, focus on servers with 1-4 mid-range cards (L40S / RTX 6000 Ada). For training foundational models, consider only HGX architectures with NVLink interconnect and cooling headroom.

Egor Streletskiy

Author, Head of Upgrade Center

Leading technical specialist and PC upgrade expert. Under his leadership, the Upgrade Center conducts diagnostics, optimization, and configuration customization. Possesses unique experience in overclocking and fine-tuning.