[Paper Notes] Orion: Google’s Software-Defined Networking Control Plane
NSDI ’21, by Google
TL;DR Second-generation controller that replaces the monolithic Onix with a micro-service, intent-driven stack glued together by a pub-sub Network Information Base (NIB). It scales a single control paradigm from a rack ToR to the globe, pushes loss-free updates, and ships safely every two weeks.
1 Problem / Motivation
Onix’s tightly-coupled threads could not keep up with 100× growth in B4 or Jupiter; one bug or hot path stalled the whole controller and upgrades were slow. Google needed:
- Fast convergence & fine-grained traffic engineering across thousands of switches.
- Higher availability than distributed protocols (BGP/ISIS) yet safe evolution every two weeks.
- Smaller blast radius and a clean way to interoperate with legacy routers.
2 Key Ideas
Idea | Why it matters |
---|---|
Everything is intent, stored once in the NIB | Producers write “what” (e.g., RouteAdvert); consumers asynchronously materialise “how”. The NIB enforces a single global order, makes debugging & replay trivial. |
Micro-services with per-app replication | Each app (Routing Engine, Flow Mgr, Topology Mgr, Raven BGP, Drain Agents…) runs its own process, three replicas, independent crash/restart. |
Hierarchical control domains + *supernode* abstraction | Supernode = One physical block (a collection of switches) = one Orion domain = one abstract router. Higher-level apps speak in supernode terms; lower-level RE explodes them into per-switch flows. |
Loss-free flow sequencing | Before an upstream switch points at a nexthop, RE first programs that nexthop; removal is reversed, eliminating transient loops/blackholes. |
Fail-static & hybrid control network | Only ToRs are managed in-band, cutting CPN cables by ~40 % and falling back to a “last-heard-uplink” default if control is lost. |
Capability-graph boot & failover | Apps declare the state they provide and require; after a crash the graph ensures switches are never wiped before authoritative intent is ready. |
3 System Architecture
┌────────── Operator Intent / Config ───────────────────────┐
│ (Drain X, add link Y, advertise <IP/24, nexthop>, ...) │
└──────────────┬────────────────────────────────────────────┘
▼
Network Information Base (NIB)
┌────────────────┴────────────────┐
│ Core micro-services (per domain)│
│ • Routing Engine • Flow Mgr │
│ • Topology Mgr • Config Mgr │
│ • OF Front-Ends • Drain Agent │
└────────────────┬────────────────┘
writes flows │ reads acks
▼
OpenFlow switches
- Domains & hierarchy – Jupiter: leaf aggregation/FBR domains + colored IBR-C virtual controllers for spine blocks (≤25 % capacity per color). B4: flat supernode-per-POP layout.
- Inter-domain routing – Raven speaks BGP/ISIS, converts best paths into RouteAdverts at supernode granularity.
- Traffic engineering – TE-App writes tunnel intents; RE/WCMP handle equal/weighted ECMP and fast-reroute inside each domain.
Detailed Breakdown
Layer | Key components (micro-service binaries, each 3-way replicated) | What they publish / consume in the Network Information Base (NIB) | Core duties |
---|---|---|---|
SDN Applications(“north-bound”) | Raven (BGP/ISIS speaker), TE App, Drain Conductor, etc. | RouteAdvert (prefix → nexthop list), TunnelIntent , DrainIntent , … |
Express operator or global TE intent at super-node granularity. |
Orion Core | Routing Engine (RE) | Consumes RouteAdvert , topology; emits DesiredFlow rows |
Shortest-/weighted-path calc, load-balancing, loss-free sequencing |
Flow Manager (FM) | Consumes DesiredFlow ; emits ProgrammedFlow , Ack |
Diffs intended vs. on-switch state; issues OpenFlow ops | |
Topology Manager (T M) | Emits Port , Link , Node , PortStats |
Learns LLDP & OF events, keeps live view of fabric | |
Config Manager (CM) | Writes/validates app-config tables via two-phase commit | Atomic, multi-app config pushes | |
OpenFlow Front-End (OFE) | None (pure proxy) | One TLS channel per switch; multiplexes to FM & TM | |
NIB (in-memory, replicated) | ― central publish/subscribe store with sequential ordering | Provides the “single arrow of time” for every message | |
Data plane | SDN switches + on-box OpenFlow Agent | — | Forward packets; stream events & stats via OFE |
4 Why It Works
- Modularity → velocity: teams add features by introducing a new app & NIB tables; no global lock-step rollout.
- Deterministic ordering: the NIB’s append-only update stream gives every process the same “arrow of time,” aiding debugging and replay.
- Scalable safety: loss-free sequencing plus fail-static policies mean a controller crash rarely harms data-plane traffic.
- Blast-radius control: colored IBR-Cs, per-domain replicas, and ToR-only in-band management confine faults to ≤25 % fabric or one rack.One-Sentence Take-away
Takeaway: Orion turns Google’s entire fabric into a hierarchy of tiny “intent compilers,” each updating its switches loss-free through a shared, ordered database—scaling SDN while letting engineers ship like modern software teams.