Skip to the content

Menu

Data and predictive methodologies can give us a better understanding of why people do what they do. It’s one thread that links data science and classic statistics. Understanding the results from predictive modeling methods can be challenging, especially as new, complex methods that leverage vast computational resources to make more accurate predictions are added to the data science toolbox. It is then necessary for those analyzing to have access to tools that can assist in efficiently interpreting the results produced by the predictive methods.  

Listen to What the Data Tell You 

At Fors Marsh Group (FMG), we support our clients by using our expertise to go beyond the predictive model results to understand what the data say and, critically, how that information impacts our clients’ goals, needs, and missions. The data inform, but expert judgment makes it possible to extrapolate beyond data. Given the application of increasingly complex predictive models, our team applies analytic decision aids and interpretation tools to inform our judgments in how the data and models we use translate to real-world impact. 

Dominance analysis, also known as Shapley value decomposition, is an interpretive tool used by our team at FMG that has recently been enjoying increased application in modern data science. The dominance analysis method uses an experiment-like approach to the predictive model by including and excluding predictive factors to assess their impact on the model’s predictive quality as applied to the data. The “predictive model experiment” is aggregated and averaged to produce a partioning of the metric—breaking the metric into components that sum to a whole—used to assess predictive quality and clearly rank predictive factors. 

Flexible Implementation Available to All 

As an approach to implementing this methodology with data in practice, I introduce the `{domir}` software package. Executed in a statistical computing environment called R, this free and open-source software for statistical computing has rapidly become a world-class predictive modeling software built on contributions from the data science community. Although `{domir}` is not the only implementation of dominance analysis/Shapley value decomposition in R, it is the most flexible—offering, effectively, any R predictive modeling command either directly or with a modest amount of programming knowledge. `{domir}` was developed out of a need to be able to apply the dominance analysis methodology to a wide range of predictive models we use at FMG. For example, no R package was available that permitted the dominance analysis of subpopulation differences on ordered-categorical survey responses for the HHS ASPA Public Education Campaign’s COVID-19 Attitudes and Beliefs Survey. `{domir}` was used to meet the need for understanding the differential impact of COVID-19 across key subpopulations. 

The Greater Impact 

As a certified B Corp, FMG and FMG employees are committed to serving our clients and the greater research community. We believe that providing this software to the research and data science community can help support decisions to improve scientific equity, assist decision-making with real-world problems, and speed research progress for the public good. To learn more about FMG’s mission and vision for the future, check out A Path to Lasting Social Change. 

Want to learn more about dominance analysis methodology? Introductions to the subject matter can be found on the package’s GitHub and CRAN web pages, and additional materials are available from ResearchGate.

About the author

Joseph Luchman

Joseph Luchman

Joseph Luchman, PhD, has more than 10 years of experience in consulting on and managing multiple behavioral science and statistics/data science-related consulting projects for the federal government as well as for private industry clients. Joseph is an Accredited Professional Statistician (PStat®) certified by the American Statistical Association (ASA) and has expertise in many statistical and analytic techniques including sampling and weighting/design-based inference, segmentation/clustering, generalized linear modeling, psychometrics and data reduction, as well as ensemble and machine/statistical learning methods.

Let's Work Together

We’re here to help, so ask us anything.

Insight Delivered

Subscribe to our newsletter to stay up-to-date on our latest news and outlook.

Please enter a valid Email address