AI Cloud

From rapid prototyping to foundation training to scalable inference

Compute

GPU instances Fast access to the latest GPU instances from NVIDIA Instant clusters Self-service GPU clusters with InfiniBand interconnect

Serverless

Serverless containers Auto-scaling GPU containers for inference and batch jobs

Storage

Block storage High-speed NVMe virtual disks Shared filesystem POSIX-compliant storage Container registry OCI-compliant image storage

GB300 NVL72

New

1x tray to 2+ racks · NVLink v5

See full specs

B300 SXM6 30 CPUs 275 GB RAM 262 GB VRAM B200 SXM6 30 CPUs 184 GB RAM 180 GB VRAM RTX PRO 6000 30 CPUs 90 GB RAM 96 GB VRAM H200 SXM5 44 CPUs 182 GB RAM 141 GB VRAM H100 SXM5 30 CPUs 120 GB RAM 80 GB VRAM A100 SXM4 22 CPUs 120 GB RAM 80 GB VRAM

Solutions

AI Lab In-house AI Lab, contributing to frontier research and open-source projects Confidential computing Hardware-attested inference and fine-tuning

Resources

Blog GPU benchmarks and R&D on AI Docs Technical documentation and tutorials API Control GPU resources via external code

Community

Forum Exchange knowledge with AI experts Trust center Security and compliance at Verda

About

Company Learn more about us as a company Career Join us to shape the future of AI Jobs Explore the open positions Newsroom Stay up to date with our latest announcements

Serverless GPU containers
for AI workloads

Deploy inference services or run batch compute on scalable GPU infrastructure. From scale-to-zero to multi-GPU priority deployments

Deploy a container

Auto-scaling
Cost-efficient
Wide GPU selection

Serverless on Verda

Purpose-built for serverless deployments across the ML model lifecycle.

Verda Containers dashboard listing deployments with their status, image, compute, and replica health.

Unified AI cloud

One platform for serverless deployments, batch jobs, and other use cases across the entire ML model lifecycle.

Scaling settings with replica limits and Instant, Balanced, and Cost saver queue-load policies.

Tunable scaling sensitivity

Control how aggressively replicas scale to queue length. Dial it in for latency-sensitive services or relax it for cost-sensitive ones.

GPU VRAM timeseries chart showing used and free memory with a hover tooltip.

Real-time metrics

Replica count, GPU and CPU utilization, request rates, inference duration, and queue size — streamed to the UI or to your Prometheus or Loki stack.

Code window showing a fetch call to the Verda container-deployments API.

API and SDK

Deploy and manage containers from the UI, CLI, API, or the native SDK. Pick the interface that fits your workflow.

Quickstart

Point Verda at your container image, pick a GPU, and we handle the rest. Works with any registry, any framework.

We've written a step-by-step migration guide that covers image compatibility, endpoint conventions, environment variables, and scaling equivalents. Your container, unchanged — just a different platform underneath.

You get an endpoint, per-replica metrics, real-time logs, and a bill that scales with your traffic. Nothing to configure, nothing to provision.

Read the API docs

Deployment types

Choose the service that suits your use case best

	Serverless (auto-scaling)	Batch
Runs indefinitely	While traffic exists	No
Scales to zero	Yes, when idle	After completion
Exposes endpoint	Yes	Optional
Typical duration	Milliseconds to minutes per request	Minutes to hours per job
Cold start on request	On first request after idle	On job dispatch
Best for	Interactive inference, user-facing APIs, bursty or unpredictable traffic	Long-running compute, offline inference over large datasets, periodic pipelines

Usage-based pricing

Per-replica billing in 10-minute intervals. Interruptible spot pricing at roughly 50% of on-demand. Multi-GPU configurations in 1×, 2×, and 4×

Currency

Contract type

Compute type

B300 SXM6 268GB VRAM

GPUs

1x2x4x8x

Price per GPU

$8.250/h

B200 SXM6 180GB VRAM

GPUs

1x2x4x8x

Price per GPU

$6.721/h

H200 SXM5 141GB VRAM

GPUs

1x2x4x8x

Price per GPU

$4.400/h

H100 SXM5 80GB VRAM

GPUs

1x2x4x8x

Price per GPU

$3.575/h

RTX PRO 6000 96GB VRAM

GPUs

1x2x4x8x

Price per GPU

$2.079/h

L40S 48GB VRAM

GPUs

1x2x4x8x

Price per GPU

$1.507/h

AMD EPYC 32–128GB RAM

CPUs

8x16x32x

Starting price

$0.0614/h

Deploy a container See all products

Zero setup

Deploy any container from any registry — Docker Hub, GitHub Container Registry, or your own — in a few clicks. Start, stop, or hibernate instantly from the UI, CLI, or API.

Controlled auto-scaling

Scale to zero when idle or up to hundreds of GPUs during traffic spikes. Adjustable scaling sensitivity and multi-GPU priority deployments give you precise control over how your workload responds to load.

Fast cold starts

Our in-house AI research team tunes container pulls, model loads, and GPU warmups so your first request doesn't wait around. Cold starts are a first-class engineering problem at Verda, not an afterthought.

Wide GPU selection

Run on cutting-edge NVIDIA compute across the Blackwell, Hopper, and Ampere generations — including B300 SXM6, H200, H100, A100, and RTX PRO 6000. Available in 1×, 2×, and 4× configurations.

Pay-per-use

Pay only for the compute you actively use, billed in 10-minute intervals. No idle charges, no commitments, no surprise bills. Spot pricing cuts the bill roughly in half for interruptible workloads.

Full observability

Real-time logs and detailed metrics on utilization, request rates, inference duration, and queue size — in the Verda console or as endpoints for your existing Prometheus, Loki, or Grafana stack.

Already on Runpod?

Most Runpod serverless deployments move to Verda in under an hour.

Read the migration guide

FAQs

How is this different from running my own Kubernetes cluster?

You bring a container image; we run it. There's no cluster to stand up, no node pools to size, no GPU operator to install, and no autoscaler to tune. Verda handles provisioning, scaling to zero, cold starts, and per-replica metrics — you get an endpoint and a bill that tracks your traffic.

How does Verda compare to Runpod or Modal?

Different priorities. Runpod and Modal are US-based; Verda runs on European data centers with renewable energy, which matters for teams with data-residency or sustainability requirements.

Our engineering team pairs directly with customer teams — Runpod optimizes for self-service, Modal for code-first Python workflows.

On hardware, we bring early access to NVIDIA's latest silicon (B300 SXM6, GB300 NVL72) as an NVIDIA Preferred Partner. For interactive inference at production scale, all three can do the job; the fit depends on what you need from the team behind the platform.

Where are your data centers, and is my data sovereign?

Verda runs on European data centers powered by renewable energy. Your data and workloads stay in-region, which matters for teams with data-residency or sustainability requirements.

What's the difference between serverless deployments and batch jobs?

Serverless deployments auto-scale to live traffic, expose an endpoint, and scale to zero when idle — best for interactive inference and user-facing APIs. Batch jobs run to completion on dispatch — best for long-running compute, offline inference over large datasets, and periodic pipelines.

Do I pay when nothing is running?

No. Serverless deployments scale to zero when idle, and you're only billed for the compute you actively use, in 10-minute intervals — no idle charges.

How am I billed?

Per replica, in 10-minute intervals, for the compute you actively use. Spot pricing runs at roughly 50% of on-demand for interruptible workloads. No commitments, no surprise bills.

Can I use any container registry?

Yes — deploy any container from any registry, including Docker Hub, GitHub Container Registry, or your own private registry.

How fast are cold starts?

Our in-house AI research team tunes container pulls, model loads, and GPU warmups to keep first-request latency low. Cold starts are a first-class engineering problem at Verda, not an afterthought.

What's the SLA?

Production deployments are backed by our uptime SLA. Talk to sales for specifics on availability guarantees and enterprise terms.

Built in Europe, trusted globally

One platform, the full AI lifecycle

From rapid prototyping to foundation training and scalable inference — on a single full-stack AI cloud

Start building Talk to an expert

Serverless GPU containersfor AI workloads

Serverless on Verda

Unified AI cloud

Tunable scaling sensitivity

Real-time metrics

API and SDK

Quickstart

Deployment types

Usage-based pricing

Zero setup

Controlled auto-scaling

Fast cold starts

Wide GPU selection

Pay-per-use

Full observability

Already on Runpod?

Most Runpod serverless deployments move to Verda in under an hour.

FAQs

How is this different from running my own Kubernetes cluster?

How does Verda compare to Runpod or Modal?

Where are your data centers, and is my data sovereign?

What's the difference between serverless deployments and batch jobs?

Do I pay when nothing is running?

How am I billed?

Can I use any container registry?

How fast are cold starts?

What's the SLA?

One platform, the full AI lifecycle

Serverless GPU containers
for AI workloads