The DivExplorer Project
Machine learning models may perform differently on different data subgroups. We propose the notion of divergence over itemsets (i.e., conjunctions of simple predicates) as a measure of different classification behavior on data subgroups, and the use of frequent pattern mining techniques for their identification. We quantify the contribution of different attribute values to divergence with the notion of Shapley values to identify both critical and peculiar behaviors of attributes.
Videos
5-minute introduction
20-minute in-depth
Web App and Python Package
You can use DivExplorer at divexplorer.org. You can upload your datasets there, and analyze them directly online. For a quick demo, you can use the pre-processed discretized COMPAS dataset, derived from the standard COMPAS dataset.
Alternatively, you can analyze your datasets using the divexplorer Python package, and you can look at its source code and documentation.
Papers
Looking for Trouble: Analyzing Classifier Behavior via Pattern Divergence. E. Pastor, L. de Alfaro, E. Baralis. In Proceedings of the 2021 ACM SIGMOD Conference, 2021.
How Divergent Is Your Data? E. Pastor, A. Gavgavian, E. Baralis, L. de Alfaro. In Proceedings of the 47th International Conference on Very Large Data Bases (VLDB), Demo Track, 2021.
Project Members