Solutions

AI Lab

ML systems engineering for more efficient training and inference on the latest
GPU architectures — co-engineered with our customers

Philosophy

We optimize for capability per FLOP, not brute-forced scale — the same research philosophy behind DeepSeek, StepFun, and MinMax.

Our AI Lab works at the ML systems engineering level as a natural extension of our AI infrastructure — hardware-aware optimizations for a given GPU architecture paired with algorithms research. The key question we aim to answer: "Given a fixed compute budget, how do we achieve the highest efficiency?" This applies to both training and inference.

By understanding problems from first principles, we go beyond surface-level tuning to achieve higher efficiency without sacrificing model capabilities. This understanding translates into elegant, simple ML systems engineering — built for fast iteration cycles and rapid validation of ablations.

We work across the model lifecycle — with our customers, open-source projects like SGLang and vLLM, and NVIDIA's stack.

Model research

What to train and how to design it — architecture, scaling, training recipes, and post-training RL.

ML systems & AI infrastructure

How to run it efficiently — distributed training, RL frameworks, inference, compilation.

Research topics

We target the AI/ML systems engineering — from large-scale MoE inference for agentic and long-context workloads to AI compilers for efficiency and compute heterogeneity for distributed training.

01

Ultra-large-scale training clusters reliability

Improving the stability, reliability, and performance (e.g., tokens-per-second and MFU) of large-scale training clusters: The target techniques include cross-cluster, low-precision, fault-tolerance, and elastic training as well as GPU observability with novel approaches like eBPF.

02

Model and hardware co-optimization

Conducting research into advanced architectures and training and inference paradigms, with models architected for native efficiency on NVIDIA's forthcoming rack-scale systems, starting with GB300 NVL72 and Vera Rubin with LPU-based acceleration.

03

Inference model parallelization

Exploring model parallelism layout compute and memory access bottlenecks during inference. Including large-scale prefill-decoding disaggregation multi-node MoE with wide EP. The ongoing objective is to determine the optimal compute-communication overlap.

04

AI compilers & kernels

Focusing on kernel engineering inside compiler development for end-to-end system optimization. This is a highly relevant research area for emerging GPU architectures and novel deployments.

05

Agentic AI infrastructure

Researching the coordination layer required to run agentic systems efficiently and at scale. This includes agent runtimes, memory and context management, orchestration, evaluation harnesses, and observability. Our goal is to build the infrastructure for automated AI ablations targeting the above-mentioned tracks:

  • Closed-loop autoresearch
  • Evaluation-gated research systems capable of automated ablations
  • Bounded recursive self-improvement

How we can work with you

Public references of our co-research projects with customers and open-source support

Verda AI Lab talent program visual

Talent program

The AI lab talent program at Verda is a focused program for top-tier industry and university talent. We offer internships to recent Ph.D. graduates or promising candidates completing their research.

Our goal is to push the boundaries of AI efficiency by assembling a unique multi-disciplinary team of AI researchers and engineers — akin to the Top Seed Talent Program.

We offer research freedom to our team members. Everyone on the team sets their own research agenda, within the global alignment of our engineering targets and product development. The team as a whole maintains a portfolio of projects across different time horizons and levels of risk.

Loading open positions…