Research Questions(s): 1) How can we quantitatively measure the similarity between two traces? And 2) How can we sample traces such that we maximize the dissimilarity between the traces in the sampled subset?
Key Contributions: STEAM is a tail-based sampling framework that aims to maximize the coverage and entropy in the sampled traces. Coverage measures the unique behaviors in the system (trace shapes, latency, response codes, etc) and entropy measures the additional knowledge gained by sampling a trace. Their main contributions are 1) a way to measure the similarity between traces. They provide some predefined similarity metrics that compare trace characteristics like structure, latency, span names, etc. These metrics can be combined into logical clauses which are used to form triplets (A, B, C) that are interpreted as traces A and B are more similar to each other than they are to C. 2) A new way to train GNNs which takes trace triplets as input and aims to minimize the distance between A and B in vector space while maximizing the distance to C. The GNN implicitly learns trace characteristics that imply trace similarity (from the domain knowledge encoded in the logical clauses) since it knows input traces A and B are more similar than they are to C without having to be explicitly told why they are more similar. Finally, 3) they parallelize the determinantal point process (DPP) so they can quickly select the most dissimilar traces (via their representative vectors) given a sampling budget.
Overall, they show that STEAM is able to capture the most unique and representative behaviors in the system by maximizing coverage and entropy in their sampled traces. They show that their solution is fast (can process 15,000 traces in 4 seconds for a single processor) and outperforms all prior work.
Opportunities for future work: Interestingly, they claim that the standard approach to training GNNs (graph contrastive learning) which randomly drops nodes in the input graph to create other similar graphs, is not suitable for this task. They say that these additional input graphs are not realistic as they show traces we do not expect to see in our system. However, this implies they are assuming we have only complete traces (with no data loss). They use the trace triplet approach to avoid this method of training. They later compare STEAM to this approach (which they label as ‘contrast’) and show how it underperforms to STEAM and other related work. This means STEAM is not suitable in an environment that experiences data loss of any kind. Further work should explore how to make STEAM more robust to messy input data. Additionally, future work should look into automatically generating logical clauses and potentially updating them dynamically given specific use cases.
Presenter: Darby Huye