Using Real-World Data and Artificial Intelligence to Advance Health Services Research Summary Background The field of health services research (HSR) can capitalize on In recent years, advances in computing power have enabled re- burgeoning sources of real-world data to parse new and perennial searchers to leverage complex, large, and novel data sources to reveal questions about health care costs, quality, and access, as well as new insights for decision-makers. While data science has spread into potentially increase the timeliness and relevance of research find- many sectors, HSR has been relatively slow to incorporate new data ings. But first, the field must prepare the infrastructure, including sources and analytics into the field's methodological toolbox. academia, research funding, and peer-reviewed publications, to deliver on the promise and avoid the pitfalls of greater and more so- The AcademyHealth Paradigm Project, supported by the Robert phisticated use of real-world data. While researchers have long used Wood Johnson Foundation, is a concerted, collaborative effort structured real-world data like claims to answer questions in policy to increase the relevance, timeliness, quality, and impact of HSR and practice, myriad new unstructured data sources, including free through innovation.1 AcademyHealth, through the Paradigm text in electronic health records (EHRs) and images from X-rays Project, convened a February 2021 meeting to explore using real- and other technologies, are emerging for exploration. Similarly, world data-also sometimes referred to as "big data"-and related new methodologies, such as machine learning and natural language artificial intelligence methods, such as machine learning and natu- processing, are being applied to real-world data to gain deeper ral language processing, to enhance HSR capabilities to improve insights about which care is right for which patients. This brief health and health care. Over the course of two afternoons, a group summarizes key points from a February 2021 meeting convened by of health services researchers and data experts discussed how AcademyHealth to examine greater use of real-world data in HSR real-world data and evidence can complement traditional HSR ap- and related issues, including safeguarding against the introduction proaches to identify and answer questions relevant to health policy of racial and other biases; addressing privacy concerns; establish- and practice. Among the topics explored, participants discussed: ing data standards; developing data resources as public goods; and helping researchers gain needed skills to design and conduct studies •Using nontraditional data sources and artificial intelligence meth- and interpret and disseminate findings. ods alongside traditional HSR approaches to answer questions related to health care costs, quality, and access. Genesis of this Brief: This brief is based on a meeting of researchers and research users that took place virtually on February 24-25, 2021. AcademyHealth convened the meeting as part of its Paradigm Project, a concerted, collaborative effort to increase the relevance, timeliness, quality, and impact of health services research (HSR). Funded by the Robert Wood Johnson Foundation, the project is ideating and testing new ways to ensure HSR realizes its full potential to improve health and the delivery of health care. The Paradigm Project is designed to push HSR out of its comfort zone-to ask what works now, what doesn't, and what might work in the future. Additional information may be found on the project's website at https://academyhealth.org/ParadigmProject. Using Real-World Data and Artificial Intelligence to Advance Health Services Research •Tapping real-world data sources to untangle the causal inference prove access," a participant said, adding that such analyses can help of policy and practice interventions on heterogeneous subgroups. counter "policymakers routinely saying to health services research- ers, we're too slow." •Preparing the HSR infrastructure, including academia, research funding, and peer-reviewed publications, to capitalize on the Volume. Velocity. Variety. Veracity. Value. promise and avoid the pitfalls of real-world data. The so-called five Vs offer a helpful context to grasp the concept of big data.4 As the volume, velocity, and variety of real-world data •Applying research findings to answer real-world policy and prac- increase-fed by the exponential growth of digital information tice questions related to improving health and health care. generated in the connected world we live in-ensuring the veracity This brief summarizes the February meeting discussion, including and unlocking the value of real-world health data falls squarely in using real-world data and analytics to move beyond average effects; the HSR wheelhouse. While the field has long used structured real- safeguarding against the introduction of racial, ethnic, and other world data like claims to answer research questions, myriad new biases in real-world data analysis; addressing privacy concerns; es- unstructured data sources, including free text in EHRs and images tablishing standards; developing data resources as public goods; and from X-rays and other technologies, are emerging for exploration. helping researchers gain needed skills to design and conduct studies Similarly, new methodologies, such as machine learning and natu- and interpret and disseminate findings. The brief also examines the ral language processing, are being applied to real-world data to gain implications of greater use of real-world data for research funders, deeper insights beyond traditional randomized controlled trials. academia, peer-reviewed journals, and other aspects of the HSR ecosystem, including AcademyHealth. Because the session was off A form of artificial intelligence, machine learning essentially the record, the brief conveys the general content of the meeting enables computers to learn and adapt by analyzing and drawing without attributing specific comments to particular participants. inferences from patterns in large datasets. Algorithms, or the step- The discussion was informed by existing research though neither by-step rules used in problem-solving calculations, are the fuel of the discussion nor this brief incorporates a systematic review of the machine learning. Similarly, natural language processing, or NLP, literature related to using real-world data in HSR. A bibliography of is another form of artificial intelligence that "helps computers un- some relevant, current literature is included at the end of the brief. derstand, interpret and manipulate human language."5 An offspring of linguistics, NLP enables computer software not only to read text Real-World Data in Health Care and hear speech but interpret language, measure sentiment, and The U.S. Food and Drug Administration defines real-world data determine importance. as "data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources."2 Along Moving Beyond Average Effects to Precision HSR with more traditional retrospective observational and survey data, Increasingly, researchers are using real-world data and machine real-world data can include information from EHRs, administrative learning to move beyond average effects captured by random- and claims data, registries, patient-reported outcomes and wearable ized controlled trials. The goal is to pin down causal inference for sensors, measures of social determinants of health, environmental subgroups that may respond differently to new drugs and medical exposures, and even clicks on a webpage, tweets, and geolocation devices post market, or perhaps even more importantly, identify- data from smartphones and other mobile devices. Common users ing more generally across practice which tests and medical proce- of real-world data and related research include pharmaceutical dures work best for which patients, according to participants at the companies, payers and purchasers, providers, policymakers, and Paradigm Project gathering. Similarly, recent studies using machine patients.3 learning to analyze large complex datasets have pinpointed patient- level differences related to physicians ordering low-value care and In one example of how real-world data may increase the timeliness the impact of increased patient cost sharing for prescription drugs. of HSR findings for policymakers, researchers are using geoloca- tion data from 45 million smartphones to understand health care In the first study, researchers created algorithmic predictions of utilization and social mobility during the COVID-19 pandemic the results of testing patients in emergency departments for heart by examining people's visits to hospitals, physician offices, dental attacks. Using traditional analytic approaches, the testing on average offices, and other health care sites. "This dataset has allowed us to appeared to be cost-effective. But the analysis powered by machine look at these utilization patterns and see how they're changing in learning that accounted for more granular patient-level differences real time, such that policymakers may be able to make closer to found that almost half of the tests should never have been ordered, real-time adjustments, whether in payment policy or trying to im- and even more importantly, many patients who should have been 2 Using Real-World Data and Artificial Intelligence to Advance Health Services Research tested were not.6 In the second study, researchers took advantage of participants, however, questioned that premise, pointing out that a quirk in Medicare's prescription drug benefit structure to conduct algorithms, instead of reinforcing structural disparities, can actually a natural experiment.7 At the time of the study, beneficiaries paid help dismantle disparities if constructed correctly. 25 percent out of pocket each year for prescription drugs until they reached $2,500 in spending, and then they paid 100 percent out of For example, a 2019 study examined a commercial algorithm pocket for the next drug. Spending thresholds, however, were not widely used to predict patients who would benefit from intensive pro-rated in beneficiaries' first calendar year of enrollment, and care management interventions.9 Researchers-who had access to enrollment eligibility began in the month beneficiaries turned 65. So the proprietary algorithm's inputs, outputs, and outcomes-found those born later in the year enrolled later in the year, and in turn had that the algorithm exhibited significant racial bias-not through ill less time to reach the spending threshold, so they faced lower prices intent, but through faulty use of past health care expenditures as a on average. Researchers trained an algorithm to identify patients proxy for future health care needs. who really needed certain drugs like statins and antihypertensives, finding that those exposed to higher cost sharing died at about a 30 At a given risk score, Black patients were considerably sicker than percent higher rate than those who didn't face higher drug costs. white patients-remedying the disparity increased the share of Black patients receiving additional help from 17.7 percent to 46.5 "We're used to thinking about averages…but patients are all differ- percent. Researchers reasoned that because Black people have ent and machine learning and access to data are letting us do justice unequal access to care and higher risk factors due to systemic rac- to those differences to where we can look for both high-value health ism, using their past health care costs to predict future health care care and low-value health care at the patient level," one participant needs introduced racial bias into the prediction. Such an approach said, adding, "So, it's precision health services research, if you want." is far from unique in the health sector where past claims data are relatively available and often used to predict future needs. Ulti- Automating Analysis Versus Generating Knowledge mately, researchers involved in the study retrained the algorithm More and more, machine learning and algorithms are being used to using an index variable that combined cost prediction with health develop new clinical diagnostic tools, such as using artificial intel- prediction, which reduced the racial bias substantially. "Just like any ligence to interpret X-rays to diagnose knee pain or scan retinas tool, [algorithms] can be a force for good, or they can be a force for for signs of diabetic retinopathy. But real-world data and machine evil, and which one it is, is kind of up to us when we build them," a learning in some cases can go beyond just reading an X-ray and participant concluded. generate new medical knowledge. In one recent study, for instance, researchers used machine learning to examine knee X-rays and Unlike research to determine causal inference, such as differences linked the images to patient-reported pain symptoms. Not only in treatment effects among heterogeneous populations, research could the algorithm do a better job than radiologists of explain- in a prediction world using machine learning is relatively straight- ing which patients felt pain, the algorithm also did a better job of forward. In a causal inference world, researchers' traditional focus explaining pain in Black patients, who historically have been under- on cleaning and correcting data grows exponentially because they treated for knee pain.8 might be working with 5 million variables in a complex dataset instead of five variables in a more traditional analysis. "If we want algorithms to help make headway on understanding and producing medical knowledge, we can't just have them spit "That's the bad news, the good news is that when we're working in a back out what a human would say about an image-that's good prediction world and not a causal inference world, we actually don't for health delivery purposes, where we want to make health care need to pay quite as much attention to all 5 million variables on the cheaper and more efficient and less error prone, but it's not going to right-hand side of the model because all we want from those vari- get us far in building medical knowledge," a participant said. ables is their ability to predict what's on the left-hand side-what's the dependent variable that we're interested in," according to a Addressing Racial and Other Inequities researcher experienced in using real-world data and methods. "But Despite their promise, machine learning and other artificial intel- the price of that is that we really need to make sure that that left- ligence tools are not a "magic bullet," to solve health care cost, hand side variable is perfect, so…there's also a lot of very important quality, and access problems, data experts at the meeting agreed, work in making sure that the thing we're predicting is exactly what because algorithms can do enormous harm-even "automate er- we think it is and that we're not using costs as a proxy for needs, rors"-if designed incorrectly. The conventional wisdom is that bi- because of the biases…because algorithms will key in on those dif- ased algorithms come from biased data, and biased data come from ferences and amplify them." bias in society, and the only solution is to fix bias in society. Some 3 Using Real-World Data and Artificial Intelligence to Advance Health Services Research In discussing how to prevent bias from creeping into research using Several participants, however, noted that the private sector already machine learning, meeting participants coined an acronym on the collects and uses tremendous amounts of supposedly anonymous spot-GAP, for good algorithmic practice-as well as the need to data from smartphones and other sources to analyze consumer establish standards for developing and using algorithms in HSR. behavior for advertising and other uses. But as The New York Times Participants also discussed the need to strengthen the inter- and showed in 2018, it's relatively easy to connect that blue geolocation multi-disciplinary nature of HSR, pointing out the need for "bilin- dot on your app screen to you and precisely track almost your every gual" researchers conversant in health, economics, and data science, step.10 In the case of industry, as one person said, "Frankly, they for example, who would spot the flaw in using past health care have a lot of incentive to not reveal the fact that they have access to spending as a proxy for future needs of underserved patients. this type of data, because it does look very creepy." Linking Diverse Data Sources at the Patient Level And while privacy concerns are real, at the same time, "there are Invoking the maxim that "all data are health data," participants huge risks to not having data," according to a participant who noted stressed the importance of linking data sources at the patient level how the COVID-19 pandemic has highlighted the "shambles" of and getting data directly from patients through surveys, wearable the nation's health data infrastructure to solve problems ranging medical devices, and sensors, including smartphones. from predicting the pandemic to coordinating hospital beds to distributing vaccines. Other participants stressed the need to create For example, linking real-time patient reports of how they are a "social license" for data access and raising public awareness and feeling in the moment and data from "wearables" to EHRs could acceptance of using data for the "greater good," with one framing give clinicians a fuller picture of health status and support shared the issue as: "How do we get people to internalize that their data decision making with patients. Potentially, clinicians could create can be used for good purpose without it feeling like a threat." near real-time feedback loops to engage patients through email, for instance, by prompting for patient-reported outcomes and replying One data expert discussed the idea of individuals donating their with an intervention. "Using ecological momentary assessments- data, saying, "I'm fascinated by conversations about data owner- very frequent prompts to patients to report on their subjective ship…and whether individuals could be compensated for the use of status at that point in time-reduces recall bias and also generates their data-whether they will eventually have some avatar working a lot of data, and data that's very granular and has a lot of temporal around them virtually that allows their data to be seeped out to richness to it," according to a physician researcher at the meeting. some uses and blocked from others." Unlocking the power of patient-level data in both research and 'Data is the New Oil' practice, however, is fraught with privacy concerns, making "patient Notwithstanding privacy concerns, the proliferation of data and the trust" a key issue in accessing and linking data, one participant said. potential to monetize new insights into human activities and behav- In contrast to the United States, other countries, particularly Den- ior have sparked comparisons of data as the new oil. Neither data mark and Sweden, have "rich" datasets linking population health nor oil has much value as a raw material-their value comes from information at the individual level down to biomarkers. "They've refining and breaking down the parts and creating something new.11 somehow managed in multiple countries to be able to share nation- For health services researchers, accessing large real-world datasets al level population-based data at the individual level for researchers can be expensive, literally millions of dollars. At the same time, not for essentially free," according to a researcher with knowledge of the all data are equal, and quality is important, with one participant say- Scandinavian datasets. But other participants questioned Ameri- ing, "I always think of these black box data products that are on the cans' willingness to embrace such transparency of health informa- market and being sold, and you don't have any insight as to how the tion, with one saying, "I think there are a lot of people who are very data is being constructed, what's in there, where does it come from." envious of the Scandinavian countries' datasets, but we are never going to go there…. maybe someday… never is a long time." Given the potential to monetize data, competitive issues can pre- vent researchers from accessing datasets, and several participants Do You Know Where Your Data Are? cited the need to break down competitive barriers and develop data Privacy concerns are and will likely remain a major barrier to resources as "public goods" rather than commercial commodities. channeling the power of big data to inform health policy and Building data repositories and reusing data could lower costs, and practice, with one data expert saying, "If we're talking about using there is a pressing need as well to standardize data collection, for government capabilities in any way, privacy really has to be at the example, across health care payers and states. The glaring gaps in forefront of the protections that we're envisioning, as well as clearly U.S. health data, for instance, are illustrated by missing race/ethnic- delineating what the value proposition and potential benefits of any ity data for almost half of U.S. adults who had received at least one research application might be." dose of a COIVD-19 vaccine by mid-March 2021.12 4 Using Real-World Data and Artificial Intelligence to Advance Health Services Research Several participants stressed the importance of developing "feder- care. Funders, for example, could take a role in pushing for greater ated," or centralized, approaches to data collection and documenta- transparency and public engagement in plugging data gaps, such as tion. Under a federated model, multiple data sources feed into one the paucity of patient-level racial/ethnicity data, to inform policy another and are managed and documented in a standard fashion. and practice. Or they could partner with industry to purchase bulk Another pressing need is to develop best practices for data docu- access to data for research. mentation, such as FAQs and publishing metadata-or data that describes and provides information about key aspects of other data. Unlike other fields such as computer science, where academics of- "We value data but we don't value the stewardship of that data," a ten have an entrepreneurial bent and form companies and partner participant observed. with industry, the discipline of HSR hasn't really promoted itself to industry, according to a participant, who added that technology Real-World Data and the HSR Ecosystem companies routinely recruit HSR graduate students with coding As researchers increasingly embrace machine learning and real- skills to work on projects. "Academia needs to figure out how to ex- world data, the surrounding ecosystem-research funders, aca- tract more value and how to partner better with industry [because demia, peer-reviewed journals, and the health care industry-also industry] very much realizes that we need academia in order to do must evolve if the relevance, timeliness, quality, and impact of HSR what we're doing right now," the participant said. is to increase. To some degree, the field, with the overt support of academia and the tacit support of funders, relies on investigator- Others observed that health services researchers are skittish of com- initiated research that too often isn't sufficiently grounded in real- mercial motives, with one recalling the field's existential crisis in the world problems, some participants observed. mid-1990s when the predecessor to the federal Agency for Health- care Research and Quality drew the ire of surgeons and nearly was "We have bought into the investigator-initiated model of health eliminated by Congress for overseeing guidelines questioning the services research, because that is the coin of the realm in academic appropriateness of spine surgery for uncomplicated low back pain.13 institutions," a participant said. "But we are an applied field… and "The value proposition for industry is only there as long as what I think we need to be responding to the priorities of policymakers, you're showing is that you can sell more," the participant said, "and health system leaders, etc., and so I think that's a disconnect in the not there when you are trying to show that you can actually sell basic incentive structure." less… who's paying for research showing things shouldn't be done?" For example, promotion and tenure policies in academia typically Nonetheless, the field needs to identify ways to break down the reward a track record of publishing in peer-reviewed journals not "firewall" between researchers and industry because industry has improving data linkages or devising better ways to document data. the data researchers need. In the case of EHRs, vendors like Cerner "For tenure, why do just publication's matter?" a participant asked. and Epic "obviously have a commercial intent, but we have to get "Why can't data assets or radically improved linkage approaches or past that as researchers," one participant said, adding that EPIC metadata … start counting, because they are just as important for founder Judy Faulkner created the EPIC Health Record Network, a knowledge building as the great publication using the data itself." public benefit corporation focused on research, because she doesn't "perceive that we want to partner." Similarly, researchers must forge new understandings with peer- reviewed journals, which will need to adopt data standards and Real-World Dissemination and Implementation identify qualified peer reviewers conversant in new methodologies. Similar firewall issues exist between researchers and the health As one participant said, "In the near term, we're going to struggle, care delivery system, with few researchers willing to get inside as we have these new methods, with dealing with reviewers in the health systems and provider organizations to understand how they journal space and making sure that they really understand how to operate. Most academic researchers "come to health systems and interpret the work that we're going to submit and put out into the say, 'Hey I just want your data,' you know kind of cut and run, and public space for interpretation and review." that doesn't help build meaningful lasting relationships where we're really trying to help these systems think differently and transform The field's relationship with funders and industry-both within care," according to a participant. health care and beyond to the technology companies that collect and build large, novel datasets-also must change if HSR is going Emerging models, such as embedding researchers in delivery to use novel data to inform solutions to the real-world conundrums systems, can help bridge the gap between research and practice, but of a U.S. health care system that costs too much, harms too many "you can't just throw them into a health system with a dataset and patients, and leaves too many marginalized people without needed say have fun." Researchers need to learn how health systems oper- 5 Using Real-World Data and Artificial Intelligence to Advance Health Services Research ate, and health system administrators need to learn how research often give "short shrift" to dissemination and communication of works. For many health system administrators, according to a findings "often because they've run out of money, and dissemina- researcher experienced in working in a health system, "The idea tion costs money and getting people to pay attention to what you've of research is that you have the right answers that I need to imple- learned… it really requires a different group of people to help you ment, and once I implement it, there's going to be rainbows and get that message out in a meaningful way." puppies and everything is going to be perfect. And the reason we think that is because we're bombarded by vendors who say exactly Implications for HSR and AcademyHealth that-that if you hire us, that if you use us, that if you purchase this, As the data universe keeps expanding-from zettabytes to yot- then everything will be perfect." tabytes and beyond-participants agreed that AcademyHealth can play an important role in helping the HSR field embrace new data On the flip side, rather than diving into the data in search of a sources and methods to identify solutions to real-world problems in problem to solve, researchers need to understand what operational policy and practice. problems health systems are facing and then identify what data might help solve the problem. "Our frontline people have the ques- "I think AcademyHealth is well positioned to play this role in really tions-they know what they want to learn about. And their issue is helping to educate health services researchers," a participant said. they don't know how to get to the answer, so they'll say something "So, if you use me as sort of the average test case, I think there's a like we have this new quality indicator on asthma, as it turns out great lack of knowledge about nontraditional data sources-what's we're bad at it, we'd like to be better at it. So, we're going to do X. out there, how to use the data, how to get access to the data, what Is this a good idea? And then the researcher shows up and says some of the analytic methods are in terms of analyzing big data and well you're collecting all the wrong data and there's no way we can how to interpret the findings." answer this question for you." As the Paradigm Project continues to use human-centered design Researchers and administrators sometimes speak different languag- and other tools to innovate and identify ways to increase the rel- es, the researcher continued, recounting a story about how his team evance, timeliness, quality, and impact of HSR, integrating con- curated data and rolled it out to frontline staff as a "self-service versations about real-world data and community engagement and portal." The result: "Everybody hated us. I got hate mail because participation will be critical. Other areas where AcademyHealth can self-serve means that they're doing it." After consulting with the support the field in leveraging the use of real-world data include: marketing department, researchers deployed the same system, call- ing it on-demand data, and "everybody loved it, and it was just this •Supporting standardization of data use and standards, including privacy protections, data documentation, model data use agree- change in language." ments, and other processes. On the policymaking front, similar nomenclature and culture dif- •Designing training and other educational programs to help ferences can complicate communication of research findings. "We researchers gain skills to use novel data and methods. have a way of thinking about uncertainty and communicating it and talking about it and living with it, that is pretty different from •Building relationships among health services researchers, aca- the way people who have to actually make decisions do," a partici- demia, and industry-in both the technology sector and health pant said. "We see this with COVID all the time, right? I mean what care delivery. should the standard be for whether we recommend mask wearing or not. You can't say like, 'Well, we sort of think this and maybe •Working with funders and journals to align timescales to support near real-time publication of research findings based on real-time we're going to learn more.' That doesn't seem to be a successful data that can support policy and practice. public relations strategy." •Building capacity to translate and communicate research results More broadly, there is a need to set up infrastructure to communi- in accessible and actionable ways. cate findings. One model is the State-University Partnership Learn- ing Network, managed by AcademyHealth to support evidence- •Creating awards to recognize researchers using real-world data based state health policy and practice with a focus on transforming and methods to answer questions that inform policy and improve Medicaid-based health care. For a variety of reasons, researchers practice. 6 Using Real-World Data and Artificial Intelligence to Advance Health Services Research About the Author 7. Chandra, A, Flack, E & Obermeyer, Z. The Health Costs of Cost-Sharing. National Bureau of Economic Research. Working Paper No. 28439. (2021). Doi: Alwyn Cassil is a Principal at Policy Translation, LLC. 10.3386/w28439. Accessed at https://www.nber.org/papers/w28439 8. Pierson E, Cutler DM, Leskovec J, Mullainathan S, Obermeyer Z. An algorithmic Endnotes approach to reducing unexplained pain disparities in underserved populations. 1.https://www.academyhealth.org/ParadigmProject. Nat Med. 2021;27(1):136-140. doi:10.1038/s41591-020-01192-7 2. Food and Drug Administration (FDA). Framework for FDA's Real-World Evi- 9. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an al- dence Program. 2018. https://www.fda.gov/media/120060/download. Accessed gorithm used to manage the health of populations. Science. 2019;366(6464):447- March 10, 2021. 453. doi:10.1126/science.aax2342 3. Rudrapatna VA, Butte AJ. Opportunities and challenges in using real-world data 10. Valentino-DeVries, J, Singer, N, Keller, MH, Krolik, A. The New York Times. for health care. J Clin Invest. 2020;130(2):565-574. doi:10.1172/JCI129197. Your Apps Know Where You Were Last Night, and They're Not Keeping It 4. Kalbandi, Ishwarappa & Anuradha, J. (2015). A Brief Introduction on Big Data Secret. (December 10, 2018). Accessed March 19, 2021, at https://www.nytimes. 5Vs Characteristics and Hadoop Technology. Procedia Computer Science. 48. com/interactive/2018/12/10/business/location-data-privacy-apps.html. 319-324. 10.1016/j.procs.2015.04.188. 11. The phrase, data is the new oil, dates to 2006 and is widely attributed to Clive 5. SAS. Natural Language Processing (NLP): What it is and why it matters. Ac- Humby, a British mathematician. cessed March 19, 2021, at https://www.sas.com/en_us/insights/analytics/what-is- 12. Ndugga, N, et al. Latest Data on COVID-19 Vaccinations Race/Ethnicity. Kaiser natural-language-processing-nlp.html. Family Foundation. (2021) accessed March 25, 2021, at https://www.kff.org/ 6. Mullainathan, S & Obermeyer, Z. Diagnosing Physician Error: A Machine coronavirus-covid-19/issue-brief/latest-data-on-covid-19-vaccinations-race- Learning Approach to Low-Value Health Care. National Bureau of Economic ethnicity/. Research. Working Paper No. 26168. (2021). doi:10.3386/w26168. Accessed at 13. Gray BH, Gusmano MK, Collins SR. AHCPR and the changing politics of health https://www.nber.org/papers/w26168 services research. Health Aff (Millwood). 2003;Suppl Web Exclusives:W3-307. doi:10.1377/hlthaff.w3.283 7 Using Real-World Data and Artificial Intelligence to Advance Health Services Research Appendix: Additional Reading Abadie A. Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects. Journal of Eco Lit. Forthcoming. Athey S. The Impact of Machine Learning on Economics. Unpublished Manuscript. 2018 Jan. Athey S. Beyond prediction: Using big data for policy problems. Science. 2017 Feb 3; 355(6324):483-485. Goldstein BJ, Rigdon J. Using Machine Learning to Identify Heterogeneous Effects in Randomized Clinical Trials – Moving Beyond the Forest Plot and into the Forest. JAMA Network Open Cardiology. 2019 March 8; 2(3). Jarmin RS, O'Hara A. Big Data and the Transformation of Public Policy Analysis. JPAM. 2016 May 10; 35(3):715-721. Jarmin RS, O'Hara A. Counterpoint to "Big Data for Public Policy: the Quadruple Helix. JPAM. 2016 May; 35(3):725-727. Lane J. Big Data: The Role of Education and Training. JPAM. 2016 May 10; 35(3):722-724. Mingle D. Healthcare: Moving Beyond Average. CIO Magazine. 2015 Oct 15. Morgan FR, Wang D, Cebrian M, Rahwan I. The Evolution of Citation Graphs in Artificial Intelligence Research. Nature Machine Intel- ligence. 2019 Feb 11; 1:79–85. McClelland R, Gault S. The Synthetic Control Methods as a Tool to Understand State Policy. The Urban Institute. 2017 March. Mullainathan S, Obermeyer Z. On the Inequity of Predicting A While Hoping for B. AEA Papers and Proceedings. 2021 May. 111:37-42. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Sci- ence. 2019 Oct 25; 36(6464):447-453. Obermeyer Z, Emanuel EJ. Predicting the Future – Big Data, Machine Learning, and Clinical Medicine. N Engl J Med. 2016 Sept 29; 375:1216-1219. Pavaloiu A, Kose U, Boz H. How to Apply Artificial Intelligence in Social Sciences. Unpublished Manuscript. 2017. Available at https:// www.researchgate.net/profile/Hakan-Boz/publication/325398286_How_to_Apply_Artificial_Intelligence_in_Social_Sciences/ links/5de3f853a6fdcc2837fd09eb/How-to-Apply-Artificial-Intelligence-in-Social-Sciences.pdf. Schudde L. Heterogeneous Effects in Education: The Promise and Challenge of Incorporating Intersectionality Into Quantitative Method- ological Approaches. Review of Research in Education. 2018 April 5; 42(1):72-92. Sivarajah U, Kamal M, Irani Z, Weerakkody V. Critical analysis of Big Data challenges and analytical methods. J of Bus Res. 2016 Aug 10; 70:263-286. Stetter C, Menning P, Sauer J. Going Beyond Average – Using Machine Learning to Evaluate the Effectiveness of Environmental Subsidies at Micro-Level. Contributed paper prepared for presentation at the 94th Annual Conference of the Agricultural Economics Society; 2020 April 15-17; KU Leuven, Belgium. Thorlund K, Dron L, Park J, Mills E. Synthetic and External Controls in Clinical Trials – A Primer for Researchers. Clinical Epidemiology. 2020 May 8; 12:457-467. Zou K, Li J, Imperato J, Potkar C, Sethi N, Edwards J, Ray A. Harnessing Real-World Data for Regulatory Use and Applying Innovative Ap- plications. J Multidisciplinary Healthcare. 2020; 13:671-679. 8