Research question(s): How can we build accurate, powerful traces of behaviors in desktop applications to aid with debugging anomalous behaviors?
Summary: Argus proposes tracing for desktop applications. This form of tracing is different from our traditional notion of distributed traces: they build traces capturing causal relationships between segments of execution from system-level logs without relying on context propagation. They make minor changes to the Mach Kernel to support this trace generation. They capture logs for critical system behaviors that allow them to tease out causality after the fact. Their trace model has three types of edges: strong, weak and enhanced weak. They later use a beam-search algorithm to find the most important causal paths for anomalous events in traces. To find the root cause of anomalous events, they often need a ‘normal’ trace for the problematic behavior to identify what differed on the causal path to lead to the anomalous event. They evaluated Argus on many different performance problems in desktop applications, most of which were unsolved, and identified the root cause for all of the problems.
Argus is mainly suitable for correctness problems since they make the assumption that problems will appear as a structural difference in the trace. They also assume that we have a normal execution of problematic behaviors to compare with.
Opportunities for future work: It’d be interesting to see how Argus can be combined with existing distributed tracing frameworks to collect per-service detailed traces. Argus is currently build for MacOS, but the ideas seem to be generally applicable (although work needs to be done to generalize the specific system behaviors that should be logged to infer causality).
Presenter: Darby Huye