MangoBoost Sets New Benchmark for Multi-Node LLM Training on AMD GPUs in MLPerf Training v5.0

June 04, 2025 at 11:00 AM EDT

MangoBoost, a provider of cutting-edge system solutions for maximizing compute efficiency and scalability, has validated the scalability and efficiency of large-scale AI training on AMD Instinct™ MI300X GPUs through its MLPerf Training v5.0 submission. Tailored for enterprise data centers prioritizing performance, flexibility, and cost-efficiency, this milestone demonstrates that state-of-the-art LLM training is now viable beyond traditional vendor-locked GPU platforms.

Using 32 AMD Instinct™ MI300X GPUs across four nodes, MangoBoost fine-tuned the Llama2-70B-LoRA model in just 10.91 minutes, setting the fastest multi-node MLPerf benchmark on AMD GPUs to date. The system achieved near-linear scaling efficiency (95–100%), proving that MangoBoost’s stack can support practical and scalable LLM training in production environments.

Scalability and Efficiency for Enterprise Data Centers

The result showcases more than just benchmark success—it underscores how enterprises can reliably scale LLM training across clusters without network bottlenecks or rigid infrastructure dependencies.

Mango LLMBoost™: A full-featured MLOps software platform for large language models, supporting model parallelism, automatic tuning, batch scheduling, and advanced memory management.
Mango GPUBoost™ RoCEv2 RDMA: Inter-GPU communication hardware optimized for low-latency, high-throughput node-to-node communication, sustaining line-rate performance across thousands of concurrent QPs.

These technologies together deliver predictable and efficient multi-node training, ideal for organizations operating their own AI infrastructure or deploying on public cloud.

Industry-First MLPerf Training on AMD MI300X GPUs

This is the first-ever MLPerf Training submission on AMD GPUs spanning multiple nodes. MangoBoost’s platform demonstrated robust performance with a 4-node, 32-GPU cluster and confirmed compatibility with additional model sizes and structures—including Llama2-7B and Llama3.1-8B—in internal benchmarks. These results validate the generalizability of MangoBoost’s platform beyond benchmarks to diverse production-scale use cases.

"I'm excited to see MangoBoost's first MLPerf Training results, pairing their LLMBoost AI Enterprise MLOps software with their RoCEv2-based GPUBoost DPU hardware to unlock the full power of AMD GPUs, demonstrated by their scalable performance from a single-node MI300X to 2- and 4-node MI300X results on Llama2-70B LoRA. Their results underscore that a well-optimized software stack is critical to fully harness the capabilities of modern AI accelerators." — David Kanter, Founder, Head of MLPerf, MLCommons

Vendor-Neutral AI Infrastructure Enabled by AMD Collaboration

The achievement was made possible through deep collaboration with AMD and seamless integration with the ROCm™ software ecosystem, enabling full utilization of MI300X’s compute, memory bandwidth, and capacity. Enterprises are now empowered to choose infrastructure based on business needs—not vendor constraints.

"We congratulate MangoBoost on their MLPerf 5.0 training results on AMD GPUs and are excited to continue our collaboration with them to unleash the full power of AMD GPUs. In this MLPerf Training submission, MangoBoost has achieved a key milestone in demonstrating training results on AMD GPUs across 4 nodes (32 GPUs). This showcases how the AMD Instinct™ MI300X GPUs and ROCm™ software stack synergize with MangoBoost's LLMBoost™ AI Enterprise software and GPUBoost™ RoCEv2 NIC."

— Meena Arunachalam, Fellow, AI Performance Design Engineering, AMD

"At MangoBoost, we’ve shown that software-hardware co-optimization enables scalable, efficient LLM training without vendor lock-in. Our MLPerf result is a key milestone proving our technology is ready for enterprise-scale AI training with superior efficiency and flexibility," said CEO Jangwoo Kim.

MangoBoost continues to develop innovations in communication optimization, hybrid parallelism, topology-aware scheduling, and domain-specific acceleration to further scale performance in distributed AI workloads.

About MangoBoost

MangoBoost is a provider of cutting-edge, full-stack system solutions for maximizing compute efficiency and scalability. At the heart of the solutions is the MangoBoost Data Processing Unit (DPU), which ensures full compatibility with general-purpose GPUs, accelerators, and storage devices, enabling cost-efficient, standardized AI infrastructure. Founded in 2022 on a decade of research, MangoBoost is rapidly expanding its operations in the U.S., Canada, and Korea.

View source version on businesswire.com: https://www.businesswire.com/news/home/20250604585097/en/

This showcases how the AMD Instinct™ MI300X GPUs and ROCm™ software stack synergize with MangoBoost's LLMBoost™ AI Enterprise software and GPUBoost™ RoCEv2 NIC.

Contacts

Minwoo Son

Strategy & Operations Manager

minwoo.son@mangoboost.io

MangoBoost Sets New Benchmark for Multi-Node LLM Training on AMD GPUs in MLPerf Training v5.0

Contacts

Sections

Services

Wapakoneta Daily News

Wapakoneta, OH (45895)

Today

Tonight

MangoBoost Sets New Benchmark for Multi-Node LLM Training on AMD GPUs in MLPerf Training v5.0

Contacts

Sections

Services

Wapakoneta Daily News