Qianliang's blog

[Paper Notes] Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents

OSDI’24, by Stanford/Purdue et al.

TL;DR: Caravan is a control-plane stack that keeps in-network ML models fresh at runtime by (1) auto-labeling sampled traffic with labeling agents (heuristics/ACLs, DNNs, and foundation models), (2) watching a lightweight accuracy proxy for drift, and (3) retraining only when needed. In experiments it boosts F1 by ~30.3% over offline models and cuts GPU time by 61.3% vs continuous retraining; simple windowed triggers can save ~74.6% GPU time with negligible accuracy loss. It also runs at line rate on a Taurus FPGA testbed.

1 Goal & scope

Keep data-plane ML (switches/SmartNIC/FPGA) accurate under traffic and concept/data drift without relying on ground-truth labels at runtime. Target use cases include intrusion detection and IoT traffic classification; learning happens online while inference stays in the data plane.

2 Core ideas

3 Main workflow

  1. Sample recent flows + model predictions from in-network device to streaming DB, e.g. InfluxDB.
  2. Label the window via the agent (weak labels from fast sources; occasional LLM → rule cache).
  3. Validate with the accuracy proxy; if it drops or a rule/event fires → Retrain on a class-balanced subset (using iCaRL); update in-network weights; otherwise, keep going.

4 Key results

5 Techniques worth learning

6 One-liners to remember

Label via many imperfect sources, not one perfect one; use weak labels + cached rules; detect drift with an accuracy proxy; retrain selectivelyhigher F1, far less GPU, still line-rate.

7 Comments

This paper addresses an important and challenging problem that how to analyze (e.g. classify) large volume of raw, unlabeled streaming data (e.g. network traffic) in an efficient way (e.g. lower cost). One key idea is to utilize the ML methods (no-brainer choice, e.g. normal DNN or LLM) to generate a reuseble ruleset. For when to retrain, instead of sticking to find a extremely accurate method to monitor performance, it uses a proxy to detect degradation, a smart way to bypass the fundemantally hard problem.