WCET-aware SLOs: mapping worst-case execution time to production guarantees
Turn RocqStat pWCET numbers into enforceable SLOs, alerts, and deployment gates for production embedded systems. Practical patterns & CI examples for 2026.
Hook — Your timing guarantees stop being theoretical at deploy time
Safety-critical embedded teams spend months getting a sound WCET (worst-case execution time) on a function, then ship to production with little connection between that number and the runtime guarantees, alerts, and deployment gates that protect users. The result: fragile releases, missed deadlines, and audit headaches. In 2026 this disconnect is no longer acceptable — tooling (notably Vector’s 2026 acquisition of RocqStat and the planned integration into VectorCAST) makes it possible to close the loop. This article shows how to translate RocqStat WCET outputs into practical SLOs, alerts, and CI/CD deployment gates for embedded systems, with code examples, monitoring patterns, and decision rules you can adopt today.
Why map WCET to SLOs now (2026 context)
Late 2025 and early 2026 saw two important trends for timing safety in embedded systems:
- Statistical WCET tools matured and entered mainstream toolchains — Vector acquired RocqStat in January 2026, signaling consolidation and faster integration into test suites such as VectorCAST.
“Vector will integrate RocqStat into its VectorCAST toolchain to unify timing analysis and software verification” — Automotive World, Jan 16, 2026
- Operators moved from binary pass/fail timing gates to probabilistic, SLO-driven operational guarantees that are actionable in CI/CD and runtime monitoring.
The combination means teams can produce statistically-weighted WCET estimates (pWCET) and use them as the basis for measurable, enforceable production guarantees and automated deployment controls.
Key concepts (short)
- WCET / pWCET — a bound on execution time. RocqStat often provides a statistical (probabilistic) WCET: a time bound with an associated exceedance probability p.
- SLO — a production guarantee, usually expressed as allowable failure/exceedance rate over a time window.
- SLI — the observable metric (e.g., measured execution time per invocation) used to evaluate the SLO.
- Deployment gate — an automated CI/CD check (static or dynamic) that blocks release if timing guarantees are not met.
High-level pattern: WCET → SLI → SLO → Alerts → Gates
Implementing WCET-aware SLOs follows five repeatable steps:
- Ingest RocqStat pWCET outputs into your artefact store.
- Define SLIs that can be measured in production (execution time histograms, exceedance counters).
- Convert RocqStat p-values into SLO targets appropriate for your runtime activation rate.
- Create alerts and dashboards that detect drift and breaches early.
- Enforce deployment gates in CI/CD using RocqStat outputs and runtime acceptance tests.
Step 1 — Ingesting RocqStat outputs
RocqStat produces pWCET results and risk curves. The first practical step is to make those numbers machine-readable in your pipeline:
- Export RocqStat results as JSON (recommended) or CSV as part of your VectorCAST test run.
- Store the JSON artifact in your CI job artifacts or a timing results bucket (S3/GCS) keyed by commit and build ID.
- Attach metadata: hardware configuration, CPU frequency, compiler flags, cache and preemption model.
Example simplified RocqStat JSON (illustrative):
{
"function": "sensor_process",
"pwcet": [
{"p": 1e-3, "time_ms": 2.3},
{"p": 1e-6, "time_ms": 3.8},
{"p": 1e-9, "time_ms": 5.1}
],
"hardware": "ARM-Cortex-M7@200MHz",
"build": "commit-hash"
}
Step 2 — Define SLIs you can measure in production
WCET is a bound — your runtime SLI must be an observable that indicates when you’re approaching or violating that bound. Common SLIs for embedded timing:
- Invocation latency histogram (buckets in microseconds or cycles)
- Exceedance counter — count of invocations where measured time > pWCET threshold
- CPU budget utilization — fraction of CPU reserved vs used on a real-time core
Instrumentation choices for embedded devices:
- Cycle counters (DWT_CYCCNT on Cortex-M) or architected performance counters on Cortex-A.
- In-kernel tracing (ETM) or lightweight in-app timing with sampling to a remote metrics gateway.
- Aggregation at the edge — devices should send compact histograms or exceedance counts rather than raw traces to conserve bandwidth.
Example — ARM Cortex-M measure (C)
// Enable DWT_CYCCNT and measure cycles
static inline void cyc_reset(){ DWT->CYCCNT = 0; DWT->CTRL |= 1; }
static inline uint32_t cyc_read(){ return DWT->CYCCNT; }
void wrapped_sensor_process(){
uint32_t t0, t1;
cyc_reset();
t0 = cyc_read();
sensor_process();
t1 = cyc_read();
uint32_t cycles = t1 - t0;
// convert cycles -> ms: cycles / (cpu_freq_MHz * 1000)
}
Step 3 — Convert pWCET to an SLO
This is the critical mapping. RocqStat gives you a pWCET: a time bound T_p that is exceeded with probability p per activation (p is often tiny, e.g., 1e-6). To create an SLO you must consider the activation rate (N) of the function and the allowable violations (V) over your target window.
Formulas
- Expected violations per window: E = N * p
- To ensure E ≤ V, require p ≤ V / N
- SLO as a probability: SLO_p = 1 - (V / N) — but for small probabilities, express SLO as an allowed violations rate instead of a percent.
Numeric example
Suppose sensor_process() runs once per second (N = 3600 activations/hour). The system-level requirement is ≤ 1 violation per year (~1 / 8766 hours ≈ 1.14e-4 violations/hour). Then:
- V/hour ≈ 1.14e-4. So p ≤ V/N = 1.14e-4 / 3600 ≈ 3.17e-8 per activation.
- If RocqStat reports T for p=1e-6, that p is too large — you must either tighten code, change partitioning, add buffering, or accept a higher violation rate.
This illustrates why the raw p-values from pWCET must be traded off against activation rate and system safety targets.
Step 4 — Alerts and detection strategies
Once SLOs are defined you need actionable alerts that detect drift before safety margins are violated.
- Soft alert — measured execution time exceeds X% of T_p (e.g., 75% or 90%) for M consecutive windows. Use this for early warning and pre-emptive investigation.
- Hard alert — exceedance counter rate over last window exceeds V (allowed violations per window). This should trigger incident response and possibly rollbacks.
- Trend alert — the p95/p99 of execution times is shifting upwards for three consecutive deployments.
Prometheus-style alert rule (example)
groups:
- name: wcet_alerts
rules:
- alert: WCETApproaching
expr: histogram_quantile(0.90, rate(sensor_process_time_seconds_bucket[5m])) > (0.90 * ${T_p_seconds})
for: 5m
labels:
severity: warning
annotations:
summary: "sensor_process latency approaching pWCET"
- alert: WCETExceededRate
expr: increase(sensor_process_exceedances_total[1h]) > ${allowed_violations_per_hour}
for: 0m
labels:
severity: critical
annotations:
summary: "WCET exceedance rate too high"
Replace ${T_p_seconds} and ${allowed_violations_per_hour} with derived values from your mapping exercise.
Step 5 — Deployment gates and CI integration
There are three types of gates you should add to a modern embedded pipeline:
- Static timing gate — fail the build if RocqStat pWCET for the target p is above the allowed time.
- Test-time acceptance gate — run on-target timing tests and reject if measured exceedances exceed limits.
- Runtime acceptance gate — allow staged rollouts and reject or rollback if runtime exceedance rate breaches SLO during canary periods. See a practical incident playbook for automated recovery and rollback responses: Incident Response Playbook.
CI example — GitHub Actions job that fails on RocqStat output
jobs:
wcet-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build & run VectorCAST with RocqStat
run: ./run_vectorcast.sh --wcet --output results/wcet.json
- name: Check pWCET
uses: actions/setup-python@v4
- run: |
python tools/check_wcet.py results/wcet.json --p 1e-8 --max-ms 4.0
The helper script check_wcet.py would parse the RocqStat JSON and exit non-zero when constraints aren’t met. Standardize this step by creating a shared CI script or templated artifact that each team can adopt.
Runtime canary gate pattern
- Deploy new firmware to 1% of fleet.
- Collect exceedance counts for the canary window (e.g., 1 hour).
- If exceedance rate > allowed, automatically halt rollout and trigger rollback (see incident response patterns at Incident Response Playbook).
Practical tradeoffs and engineering guidance
Mapping pWCET to SLOs surfaces several tradeoffs — choose a principled approach:
- Conservative safety margin — for ASIL/DO-178C systems, prefer a multiplicative margin on top of RocqStat T_p to account for system-level jitter (interrupts, bus contention). Document the margin and rationale.
- Compositional timing — when multiple tasks run on the same core, use response-time analysis or CPU reservation to compose WCETs safely. Don’t add WCETs linearly without considering scheduling; for scheduling and temporal isolation patterns see work on edge scheduling and reservation.
- Operational cost — extremely tight p requirements increase development cost. Use activation-rate driven SLOs to focus effort where activations are frequent or safety-critical.
- Certifiable evidence — keep a chain of evidence from RocqStat run, hardware config, test vectors, and measured SLIs. This is often required for audits.
Common implementation pitfalls
- Using raw pWCET time as an immediate binary pass/fail without considering activation rates and overall system risk.
- Not instrumenting production for exceedance counts — without telemetry, you can’t validate SLOs.
- Failing to version timing results — pWCET depends on compile-time options, so store per-commit timing artifacts.
- Ignoring platform-level effects (cache, prefetchers, multicore interference).
Case study (compact): automotive sensor pipeline
Background: an automotive sensing ECU processes sensor frames at 100 Hz (360k activations/hour). RocqStat (via VectorCAST) reports:
- T_p = 4.0 ms for p = 1e-9
- T_p = 3.0 ms for p = 1e-6
System safety requirement: no more than 1 timing violation per year. Compute p_max:
- V/year = 1 → V/hour ≈ 1 / 8766 ≈ 1.14e-4
- N/hour = 360k → p_max = V/N ≈ 1.14e-4 / 360000 ≈ 3.17e-10
Conclusion: p=1e-9 (T_p=4.0ms) is borderline; p=1e-6 is unacceptable. Engineering options:
- Reduce activation rate (batch frames) or add buffering to reduce safety-critical path.
- Refactor or simplify code to reduce WCET.
- Move non-critical work to a lower-priority background task.
- Adopt temporal isolation: dedicate core or use RTOS reservation.
Operationalizing at scale (teams & automation)
To scale this pattern across hundreds of functions and tens of device types:
- Standardize the RocqStat output format and required p-values per ASIL/criticality level.
- Ship a shared CI script that turns RocqStat results into SLO artifacts (YAML/JSON) checked into the repo.
- Automate telemetry ingestion: devices push compact histograms or exceedance counters to a central collector. Use a lightweight protocol (MQTT + gateway that converts to Prometheus remote_write).
- Provide a dashboard that maps per-function pWCET, SLI trends, and canary rollout status for release managers.
Auditability and regulatory alignment
Regulated domains require traceability. Keep:
- RocqStat run logs and versions.
- Hardware configuration snapshots and compiler flags.
- Measurement evidence from production (exceedance counts and histograms)
- Change logs tying code changes to shifts in pWCET
This chain of evidence is much easier when RocqStat is integrated into your test toolchain (VectorCAST roadmap for 2026 promises tighter integration).
Advanced strategies and future predictions (2026+)
Looking forward, expect these developments:
- Tighter toolchain integration — Vector’s acquisition of RocqStat will accelerate native timing checks inside mainstream verification flows (VectorCAST). Expect native CI plugins and standardized JSON outputs in 2026–2027.
- Runtime probabilistic monitoring — production telemetry will include probabilistic exceedance estimators that merge with pWCET evidence to provide online risk assessments.
- Policy engines — deployment orchestrators will support declarative timing policies (e.g., “block rollout if pWCET at p ≤ X is > Y ms”) enabling automated safety governance.
- AI-assisted root-cause — models that correlate code changes to observed timing drift and recommend hot-code paths to refactor.
Checklist — immediate actions to adopt WCET-aware SLOs
- Enable RocqStat runs in your VectorCAST pipeline and export machine-readable results per build.
- Define per-function activation rates and system-level allowable violations (V) based on safety targets.
- Derive p_max and map to required pWCET levels; document margins for system jitter.
- Instrument production for histograms and exceedance counters; implement compact telemetry export.
- Create Prometheus (or equivalent) alerts for soft/hard thresholds and automatic rollback gates for canaries.
- Version and store all timing artefacts for audit and root-cause analysis — keep them in a repo and reference them from your build artifacts; consider using compose.page integrations for small SLO artifact sites.
Closing — from analytic numbers to operational safety
WCET numbers from RocqStat are powerful, but only when they become operational guarantees. By mapping pWCET into measurable SLIs, choosing SLOs that reflect activation rates and acceptable risk, instrumenting production, and gatekeeping releases with automated CI/CD checks, safety-critical teams can turn static timing analysis into living safety policy.
Next step: Start a small pilot: integrate RocqStat JSON outputs into a CI job, instrument one hot path on a device for exceedance counts, and run a one-week canary with automated rollback. If you want a templated pipeline, alert rules, and example scripts tailored to your platform, reach out — we’ll help you convert timing proofs into production guarantees.
Resources & acknowledgements
- Automotive World — “Vector buys RocqStat to boost software verification” (Jan 16, 2026)
- ISO 26262 / DO-178C guidance on timing analysis and evidence
- VectorCAST product roadmap (2026) for RocqStat integration
Call to action
Don’t let WCET stay a lab artifact. Instrument a canary, add a RocqStat-based CI gate, and define SLOs that put timing guarantees front and center. If you’d like a 2-week workshop to produce the required scripts, alerts, and CI gates for your platform, contact us at deployed.cloud to schedule a pilot.
Related Reading
- Device identity, approval workflows and decision intelligence for access in 2026
- Observability‑First Risk Lakehouse: Cost‑Aware Query Governance & Real‑Time Visualizations for Insurers (2026)
- Edge Field Kit & aggregation patterns for constrained devices
- Incident Response Playbook for automated rollback and recovery
- Community Cloud Co‑ops: Governance and dashboards
- Smart Lamps for Prayer Corners: How RGBIC Lighting Can Create a Calming Space
- Warm Nights on Cool Shores: Rechargeable Hot‑Water Bottles for Beach Bonfires and Campsites
- Budgeting for a House and a Ring: Prioritizing Big-Ticket Tech and Jewelry Purchases Together
- Sneakers for Summer Travel: Adidas Styles That Pack Light and Look Sharp
- How to Choose the Right Monitor for Mobile Diagnostics and In-Garage Workstations
Related Topics
deployed
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
From Our Network
Trending stories across our publication group