Scaling sovereign inference for national AI initiatives

Background

Public AI is a nonprofit initiative built on a simple but radical belief that artificial intelligence should function as public infrastructure, much like electricity, water or the internet. Its mission is to make access to AI models open, reliable and globally equitable.

To demonstrate this vision, Public AI launched the Public AI Inference Utility, a service that allows citizens and developers to access open models in real time. The initiative unites several national programmes, including Swiss AI and the Barcelona Supercomputing Center (BSC), to prove how public funding in AI can translate into tangible, sovereign infrastructure.

The challenge

Public AI’s partners each faced the same fundamental question:

How can governments and research institutions deliver AI services to their citizens without depending on hyperscale, closed source clouds?

Serving national scale models such as Swiss AI’s Apertus 70B requires immense GPU power and flexible orchestration, yet public projects often lack the resources or time to build dedicated infrastructure. The challenge was to show that publicly funded AI could operate with enterprise grade reliability and performance while remaining fully open and transparent.

The CUDO solution

CUDO Compute partnered with Public AI to deliver a sovereign inference pilot capable of serving tens of thousands of users during the launch of Swiss {ai} Weeks.

Leveraging CUDO’s on-demand bare metal service, Public AI deployed the Apertus 70B model on a dedicated 8× NVIDIA H100 NVL 94 GB server hosted in Norway. The system was provisioned and operational within hours, with no complex multi cluster setup or hidden orchestration overhead.

CUDO’s engineering team, notably Sean Berry and Nick Gardener, worked closely with Public AI to ensure smooth integration, helping configure network access, firewall settings and security protocols tailored for open but sovereign workloads.

This collaboration proved that national scale AI inference could be achieved without hyperscaler dependence, using open tooling, transparent governance and a trusted compute provider.

Technical overview

  • Infrastructure: 8× NVIDIA H100 NVL 94 GB bare metal server
  • Compute specs: 96 CPU cores | 2.5 TB memory | 25 TB storage | 752 GB total GPU VRAM
  • Framework: vLLM and NGINX for SSL termination and load balancing
  • Environment: Docker Compose deployment with tensor parallelism across 8 GPUs
  • Networking: HTTPS ingress via NGINX, internal Docker network for API services, locked egress post model download
  • Users served: 20,000+ users with 1,000 concurrent sessions

This configuration enabled Public AI to operate the entire inference pipeline as a single unified system with direct NVLink connectivity, simplifying performance scaling while maintaining full control over data flows and compliance.

Results and impact

The pilot delivered strong results. Within weeks of launch, more than 20,000 users and developers used the Apertus 70B model, generating public interest and validating the concept of AI as a shared resource.

Beyond technical success, the pilot sparked inbound interest from open source model builders and public institutions in Spain and the United States, exploring how similar utilities could serve their national ecosystems.

Most importantly, the initiative showed that public infrastructure and high performance AI are not mutually exclusive. They can coexist through open collaboration, transparent engineering and sovereign compute capacity.

Partner quotes

“CUDO’s bare metal 8× H100 NVL server made national scale AI inference remarkably straightforward. What could have been a complex multi cluster Kubernetes setup was literally just ‘docker compose up’ — and we were serving Switzerland’s Apertus 70B model to over 20,000 users. That simplicity is exactly what public AI infrastructure needs.” — Public AI Team

“So many people have been using Public AI to get an idea of the model. It’s been extremely valuable to get people to play around with it. A company in Zurich even built a tax question assistant on top of the API. It shows links, hallucinates very little and highlights what you can build with Apertus. I was surprised it worked so well out of the box.” — Imanol Schlag, Co lead of Apertus, Swiss AI Initiative

Technical setup: under the hood

Below is a closer look at how the Public AI inference environment was deployed and managed on CUDO Compute.

  • Deployment: Docker Compose stack using vLLM and NGINX, setup completed in minutes
  • Parallelism: Tensor parallel distribution across 8 GPUs, fully automated
  • Security: HTTPS ingress, strict egress control, API key authentication
  • Data sovereignty: Hosted in Norway to support European compliance and regional control
  • Maintenance: Bare metal environment simplified network management and firewall rules, no shared tenancy and full transparency

About CUDO Compute

This collaboration with Public AI demonstrates how CUDO Compute’s infrastructure enables the next generation of open and sovereign AI. By combining high performance GPU clusters with transparent bare metal control, CUDO allows governments, research institutions and public initiatives to deploy AI systems that align with their national priorities for data sovereignty and accessibility.

As an official NVIDIA Cloud Partner, CUDO delivers the same technical advantage trusted by enterprises, with the flexibility and compliance needed for publicly funded projects. From high density GPU clusters to sovereign cloud environments, CUDO provides the foundation for scalable, ethical and globally connected AI.

A proven model for sovereign AI infrastructure

The Public AI pilot underscores CUDO Compute’s commitment to building accessible and sovereign AI infrastructure that serves both innovation and the public good. By enabling open model inference at national scale, CUDO has shown how governments and research bodies can deliver advanced AI experiences without compromising transparency or control.

This proof of concept marks an important step toward a more inclusive AI future, where open collaboration and trusted infrastructure form the foundation for meaningful global impact.