Leveraging Data Epistemology to Turn Insights into Action
Navigating Data Epistemology to Turn Insights into Action
Epistemology is the area of philosophy concerned with knowledge:
How is knowledge obtained? How does it differ from belief or opinion? How do we know when we have it?
As an astrophysicist with a keen interest in philosophy, and whose area of study was the entire Universe, epistemology concerned me deeply early in my career. Unlike the chemist or biologist who can interact directly with their experiments in the lab, an astrophysicist must extrapolate across multiple levels of abstraction to produce scientific insights – from stars and galaxies to epically larger scales; from light that we can measure to dark matter, and maybe even dark energy; from equations we can only solve with super-computers to theories about the real, living Universe.
Now that I build data systems and implement analytical solutions for the IRS, questions about how to extract insights from large datasets have gone from philosophical to practical. We transform data into insights and insights into action—actions in the real world that help the IRS fight fraud and protect taxpayer dollars. And when I have time to take a step back and think deep thoughts about what we do, it all goes back to data epistemology.
We learn about data to learn from data to produce knowledge, and we build systems to help our clients generate knowledge from their data. While the work that we do with the IRS is always evolving, these key principles of data epistemology continuously inform how we do that:
Data are always incomplete and inaccurate.
While we use data to learn about the flow of money through a financial system, a fraudulently-filed tax return, or other phenomena, data themselves result from complex processes and are subject to statistical and systematic sources of error. The saying, “Garbage in, garbage out,” misses the point: all data are garbage, just some more so than others.
Meta-data (data about data) are crucially important.
To use data effectively, we must understand how data are gathered, processed, and stored so that we can identify any biases, noise, or other sources of error that might lead us to the wrong conclusions. For example, the software version used to file a tax return may have a minor bug that makes certain data elements unusable. When this data lineage is unknown, it becomes difficult, or impossible, to confidently make inferences and draw conclusions.
We learn from error.
The ability to say, with some level of confidence, that a particular conclusion (Bilbo Baggins is laundering money) is supported by data involves testing a slew of “auxiliary” hypotheses to rule out alternative explanations (Bilbo has a legitimate reason to use gold and jewels as tender) and potential sources of error. Doing this well requires close collaboration with subject matter and data experts, the careful application of statistical methods, and a good dose of epistemic humility: constantly asking ourselves how we might be wrong and developing ways to test our assumptions. (Didn’t Bilbo Baggins just return from an Adventure?)
Models play a role in every step of the process.
Before even getting close to a predictive machine learning model or generative AI model, there is a hierarchy of statistical, computational, phenomenological, theoretical, and other models that contribute to the production of knowledge. In particular, a data model is a cleaned, corrected, and idealized version of raw data. We standardize the formatting of dates, addresses, and other fields to enable more effective analytics, for example. Data processing is an absolutely necessary step in data epistemology, but we must remember: all models are wrong; some are useful.
These principles capture the tension between the inescapable uncertainty of all knowledge claims and the need to extract insights from data with confidence. But by using the same methods that scientists use to learn about our awesome Universe, we can navigate this tension to produce data-driven insights for our clients and the communities they serve.
-Bridget Falck, PhD, Senior Analytics Manager
