01SAP HANA workload triage from noisy telemetry
Problem
HANA already logs sessions, statements, and waits in detail. Operators still drowned in alerts, took too long to find root cause, and governance got shaky when people acted on a half-formed theory.
Solution
Python pipeline: SQL and SQLScript features feed hybrid rules with confidence-ranked root-cause hypotheses; impact is explicit; remediations only from an allowlist with human approval and rollback. Optional assistants may polish narrative text—they never decide safety or ranking.
Evaluation & metrics
- Offline eval: precision@k on root cause, false-alert checks, zero tolerance on allowlist violations, narrative rubric completeness
- scripts/eval_run.py writes baseline vs improved profiles (e.g. alert dedup) to reports/eval-baseline-vs-v2.md
Security & reliability
No open-ended auto-fix. High-risk steps need a human. Decisions and actions append to an audit log. Secrets stay in .env or BTP destinations, not in git.
System architecture
Telemetry (HANA / staging / CSV fixtures)
→ Ingest → store
→ Detect → rank (rules + confidence) → impact
→ Plan actions (allowlist YAML)
→ Safety (approval gate, rollback)
→ Narrative (template-first) → audit log