Inference as infrastructure — Flambard Williams

Most analytical attention on the AI cycle has focused on the model layer — the foundation models being trained at extraordinary cost by a small number of frontier laboratories. This is understandable; the technical achievements are remarkable, the competition is highly visible, and the capital flows are enormous. We think it is the wrong place to be looking. The economics that will compound over the next decade are not in training these models. They are in running them — at scale, with low latency, near load, every day, for every application.

The shift from training to inference.

For most of the last five years, the dominant economic activity in AI has been training. Frontier models are trained once, at enormous expense, and then deployed. The capital flows have followed this profile: huge, lumpy, concentrated in a small number of compute clusters owned by a small number of firms.

Inference — the activity of running a trained model to produce useful output — has, until recently, been the smaller part of the economic picture. That is changing rapidly. As more applications are built on top of AI capabilities, the volume of inference is growing at rates that significantly exceed the growth rate of training. The cumulative inference workload of major model providers now exceeds, in compute terms, what was being used for training only two years ago.

The economic implication is that the dominant cost structure of AI is shifting from one-off, capital-intensive training cycles to continuous, operations-intensive inference workloads. The shape of the resulting economics is recognisable from a different industry: utilities.

Why the inference economy looks like utilities.

Capital intensity scaled to delivered output

Inference, like power generation, requires physical infrastructure proportional to the load it serves. More users, more applications, more queries — more compute, more power, more cooling, more interconnect bandwidth. The marginal cost of an additional unit of inference is non-trivial. The aggregate cost scales linearly, not in step-functions like training does.

Geographic distribution matters

Latency-sensitive inference workloads — which are most of them — need to run physically close to the load. This is why the major model providers have been building out regional data centre footprints rapidly, and why edge inference is becoming an industry of its own. Geographic distribution of inference capacity matters in a way that training compute does not.

Long-duration, contracted demand

The demand for inference is not project-based. It is continuous. Once an application is dependent on a model — and increasingly, every commercial software application is — the demand for inference is permanent. The contracts that govern this demand tend toward longer durations and more predictable economics. This is the most utility-like feature of the entire stack.

Regulatory permission

Sovereignty is becoming a major theme. European governments increasingly want AI workloads to run on European infrastructure. Financial services regulators want clear documentation of where regulated workloads execute. National security frameworks are placing similar constraints on government-related applications. This is a regulatory layer recognisable from telecoms and energy — and it favours infrastructure businesses with appropriate licensing.

"The marginal cost of training a model is collapsing. The aggregate cost of running it is not. That is the difference between software economics and utility economics."

The investment implications.

Compute infrastructure is the new transmission

In our energy work, we have written about the grid as the binding constraint of the next decade of energy investment. The analogue in AI is compute and inference infrastructure: data centres, networking, power supply, cooling, and the specialist hosting layer that runs at scale near load. The capital requirements are large, the time-to-build is years rather than months, and the economics, when properly underwritten, look like infrastructure rather than technology.

Several of our most attractive recent positions sit here. We have written separately about the way these positions interact with the energy transition: AI inference is the largest new load on the European grid in a generation. Capital that owns both the data centre and the renewable supply that powers it sits at a particularly attractive intersection.

The middle layer is more attractive than the model layer

The model layer is highly visible, highly capital-intensive, and structurally exposed to commoditisation. The infrastructure layer beneath it is less visible, less capital-intensive per unit of revenue, and structurally protected by physical and regulatory barriers. We are deploying meaningful capital at the infrastructure layer. We are not deploying any at the frontier model layer at current valuations.

Specialist services are emerging

A new layer of services is emerging between the raw infrastructure and the end application: model orchestration, fine-tuning platforms, inference optimisation, evaluation tooling, observability, governance. These businesses are not infrastructure in the traditional sense, but they sit close enough to the infrastructure layer that their economics share important features. We are selectively deploying here.

What we are underwriting.

Three categories that account for substantially all of our inference-related deployment:

Specialist data centre development with renewable offtake. We are an equity investor in two UK-based data centre projects designed specifically for inference workloads, each with contracted renewable power supply for ten years or more. The economics combine infrastructure cash yields with a demand profile growing at meaningful multi-year rates.

Inference-optimised compute hosting. One position in a specialist provider serving the inference layer of regulated industries — financial services, healthcare, defence — with sovereign-cloud guarantees. The customer concentration is high, but the contract durations and the regulatory moat are appropriate to the sizing.

The middle-layer specialist services. Two minority positions in the model orchestration and observability space, where the technology layer matters less than the integration with regulated workflows. Smaller positions, but compounding.

What we are not doing

Three categories that, in our analysis, look like the inference economy but are not:

API-resold model wrappers without proprietary distribution. Many of the apparent winners of the current cycle are wrappers around the model layer with thin integration. As the underlying model layer commoditises, these wrappers commoditise faster.

Non-specialist hyperscale capacity at headline valuations. The major hyperscalers are building inference capacity at substantial scale and competing on price. Generic capacity providers competing in the same market are exposed to commoditisation we do not believe is appropriately priced.

Speculative edge inference without applications. Edge inference will be a substantial market, but it requires real applications and real revenue. Several edge-inference platforms have raised on speculation about future application demand. We are watching, but we are not deploying.

The bigger point

The first decade of internet infrastructure produced returns that compounded for the businesses that owned the wires, the data centres, and the regional points of presence — not the businesses that owned the websites. The first decade of mobile infrastructure produced returns that compounded for the businesses that owned the towers and the spectrum, not the businesses that owned the apps. We believe the first decade of AI infrastructure will produce returns of a similar shape. The model layer will commoditise. The infrastructure layer will compound. We are positioned accordingly.

End of essay.

Maximus Rogers, Portfolio Manager

Inference as infrastructure.

The shift from training to inference.

Why the inference economy looks like utilities.

Capital intensity scaled to delivered output

Geographic distribution matters

Long-duration, contracted demand

Regulatory permission

The investment implications.

Compute infrastructure is the new transmission

The middle layer is more attractive than the model layer

Specialist services are emerging

What we are underwriting.

What we are not doing

The bigger point

Speak with our investment team.

Inference as infrastructure.

The shift from training to inference.

Why the inference economy looks like utilities.

Capital intensity scaled to delivered output

Geographic distribution matters

Long-duration, contracted demand

Regulatory permission

The investment implications.

Compute infrastructure is the new transmission

The middle layer is more attractive than the model layer

Specialist services are emerging

What we are underwriting.

What we are not doing

The bigger point

Why every family office needs an AI thesis.

The grid is the next bottleneck.

Energy transition: a decade of asymmetric returns.

Speak with our investment team.