Chapter 10: Ethics, Privacy, and Compliance

What You Will Learn

How to figure out which privacy regulations apply to your specific data, whether you work in clinical research, social science, education, or computational fields.
What actually happens to your data when you type it into an AI tool, and how to choose the right environment for your work.
Where bias enters AI-assisted research and how to catch it before it affects your conclusions.
When you need IRB approval, a data use agreement, or both, even if no one in your study was formally recruited as a participant.
A self-audit exercise at the end you can apply to your current project right now.

The Question Researchers Actually Ask

Most researchers do not wake up wondering whether they are following the NIST AI Risk Management Framework [National Institute of Standards and Technology, 2023]. What they actually wonder is something more like: I have a dataset and I want to use an AI tool with it — am I allowed to do that, and what could go wrong?

That is the question this chapter is organized around. The compliance frameworks and ethical principles matter, but they matter because they help answer that practical question for your specific situation. Rather than starting with abstract principles, we start with your data and work outward from there.

What Kind of Data Do You Have?

The single most important factor in determining how you can use AI with your research data is whether that data involves people, and if so, in what form. The answer to that question changes significantly depending on your field.

If you work in clinical or health research, your data almost certainly includes protected health information covered by HIPAA. That means electronic health records, imaging data, genomic sequences, insurance claims, or anything else that could be linked back to an individual patient. Uploading any of this to a public AI tool is a HIPAA violation, full stop, regardless of whether you think the data is de-identified. Re-identification from supposedly anonymous records is a well-documented risk, and the legal standard for de-identification under HIPAA is specific and strict [National Institute of Standards and Technology, 2023].

If you work in education research, the relevant regulation is FERPA, which governs student educational records. If your dataset includes grades, enrollment data, course evaluations, or anything tied to an individual student at an institution receiving federal funding, FERPA applies. This catches many researchers off guard because the data often feels administrative rather than sensitive, but the protections are real.

If you work in social science, survey research, or behavioral science and any of your participants are located in the European Union, GDPR enters the picture. GDPR is notably broader than US regulations in what it considers personal data and imposes strict requirements around consent, data minimization, and the right of participants to have their data deleted. It applies based on where your participants are located, not where you are.

If you work in computational research, NLP, or machine learning and you are building or fine-tuning models on text data scraped from the web, you are operating in a different but equally real compliance context. Terms of service agreements, copyright law, and in some cases GDPR can all apply to web-scraped corpora, particularly if that text includes posts, comments, or other content generated by identifiable individuals [Bjelobaba et al., 2024].

If you work in physical, environmental, or engineering sciences and your data does not involve people at all, most of the regulatory landscape above does not apply to you directly. Your primary concerns are more likely around confidentiality of unpublished results, data use agreements with external partners or government agencies, and export controls if your work involves dual-use technology.

The point is not to memorize each regulation in detail — your institution’s research compliance office exists precisely for that. The point is to be able to identify which category your work falls into so you know when to ask.

What Happens to Your Data in an AI Tool

One of the most common questions researchers ask is whether AI tools share their inputs with other users or use them to train future models. The short answer is: it depends entirely on which tool you are using, and the differences matter a great deal.

There are three meaningfully different categories of AI tools, and choosing the right one for your situation is one of the most concrete decisions you can make for research compliance.

The first category is public commercial tools on their free tiers — the default version of ChatGPT, Gemini, Claude, and similar products when accessed without a paid or enterprise account. Some providers in this category may retain your inputs and use them to improve their models unless you explicitly opt out, and opt-out mechanisms vary by platform and can change over time. These tools do not deliberately share your content with other users, but your inputs may become part of training data in ways you cannot fully control. Unpublished results, grant ideas, confidential collaborator information, and anything covered by the regulations above should not go into these tools [Carlini and others, 2021].

The second category is enterprise or institutionally governed AI tools — products accessed through your institution’s agreements rather than a personal account. In these environments, your inputs are contractually not used for model training, data is stored within enterprise-controlled infrastructure, and the tools are governed by institutional agreements that include privacy protections. This is the appropriate environment for most research-related AI use, including working with draft manuscripts, grant materials, and research ideas.

The third category is local or self-hosted models — tools like LM Studio or Ollama running on your own machine, or on-premise LLMs deployed on secure institutional clusters. In these environments, nothing leaves your machine or your institution’s network. This offers the highest level of confidentiality and is the right choice when working with genuinely sensitive data that cannot go anywhere outside a controlled environment, even in an enterprise system.

A Practical Decision Rule

Before pasting anything into an AI tool, ask yourself: would I be comfortable if this text appeared in a training dataset used by a commercial AI company? If the answer is no — because it contains patient data, unpublished findings, confidential grant information, or personally identifiable information about study participants — use an institutionally governed tool or a local model instead of a public tool.

If You’re at U-M

U-M’s institutionally governed tools include UM-GPT, Maizey, Microsoft Azure OpenAI through U-M’s enterprise agreement, and Google Gemini through U-M’s institutional subscription. For compliance-sensitive data, U-M provides secure computing environments including Armis2 for HIPAA-covered data and Great Lakes for broader sensitive research use. See AI Resources at the University of Michigan for details on each option and when to use them.

IRB, Data Use Agreements, and the “It’s Public Data” Question

A common assumption in computational and social science research is that publicly available data is automatically fair to use without IRB review or other oversight. This is often, but not always, correct, and the exceptions matter.

IRB review is triggered by research involving human subjects, which federal regulations define as living individuals about whom a researcher obtains data through intervention or interaction, or identifiable private information. The key word is identifiable. If you are working with a dataset that contains no information that could, even indirectly, be linked back to an individual person, you are likely outside IRB jurisdiction. But if your dataset contains text, behavioral traces, location data, or other information that could plausibly be tied to a specific person — even if names are not present — IRB review is worth seeking, if only to get a formal determination that your work is exempt.

This catches researchers working with social media data more often than almost any other context. A dataset of tweets or Reddit posts feels public because the original posts were visible to anyone. But those posts were written by identifiable individuals who may not have understood or consented to their words being used in research. Several disciplines have developed specific ethical frameworks for social media research, and IRB offices at most institutions have guidance on this. If you are using AI to analyze social media text at scale, it is worth a conversation with your IRB before you start rather than after.

Data use agreements (DUAs) are a separate but related concern. Many datasets — particularly administrative datasets from government agencies, hospital systems, school districts, or commercial data providers — come with DUAs that specify how the data can be used, where it can be stored, and who can access it. If your DUA was written before large language models existed (which is most of them), it almost certainly does not address whether you can process the data through an AI tool. When in doubt, contact the data provider. Getting explicit permission in writing takes a few days and protects you from significant risk.

Bias in AI-Assisted Research

Bias is a word that gets used in a lot of different ways, so it helps to be specific about where it actually enters AI-assisted research workflows and what the consequences are.

The most fundamental source of bias is training data. Large language models and other AI systems learn from whatever data they were trained on, and that data reflects the world as it has been documented — which is not the same as the world as it is. Models trained predominantly on English-language text from the global north will perform better on text from those contexts than on text from other languages, dialects, or cultural settings. Models trained on published scientific literature will reflect whatever biases exist in that literature, including the historical underrepresentation of certain populations in research studies [Mehrabi et al., 2021].

A well-documented example from clinical research is the racial bias found in a widely used clinical risk-scoring algorithm, where the model systematically underestimated the severity of illness in Black patients because it used healthcare costs as a proxy for health needs — and Black patients had historically received less care for the same conditions [Obermeyer et al., 2019]. The developers did not deliberately encode race into the model. The bias came from the data.

Outside of clinical research, similar dynamics appear in different forms. NLP models used to analyze job applications or academic writing have been shown to perform differently on text written in African American Vernacular English than on standard American English, in ways that could disadvantage candidates or students if the outputs were used for evaluation [Mehrabi et al., 2021]. Sentiment analysis tools trained on Western social media corpora often produce unreliable results when applied to text from other cultural contexts [Mehrabi et al., 2021]. More broadly, computational models trained on historical data in any domain will tend to encode the patterns and inequities present in that history, because that is what they are learning from [Mehrabi et al., 2021].

For researchers, the practical implication is that AI outputs need to be interrogated, not just accepted. If you are using a model to classify text, identify patterns, or generate summaries, it is worth asking whether the model’s training data was representative of your specific study population, and whether there are subgroups in your data for which the model might behave differently. This does not require running a full fairness audit on every project, but it does require holding the output to the same critical standard you would apply to any other analytical tool.

Documentation and Transparency

Several of the preceding chapters have mentioned documenting your AI use. This chapter is the right place to say more concretely what that means and why it matters beyond just satisfying a journal requirement.

Good documentation of AI use serves three purposes. It makes your work reproducible — someone following your methods section should be able to understand what role AI played, with which tools, and at which stages. It protects you — if questions arise later about the provenance of your work, documentation is your evidence that you used AI appropriately. And it contributes to an emerging shared understanding of norms — as a field, we are still working out what responsible AI use in research looks like, and transparent documentation is how those norms get established.

At minimum, documentation should include which tools you used, for what purpose, at what stage of the research process, and what human review or verification you applied to AI outputs. Part IV of this handbook includes ready-to-use templates you can adapt for your own work, including an AI usage statement and a preprocessing log. Chapter 11 explains how to use the Card as a working record throughout a project, and how it connects to the methods disclosure you will eventually write (see Checking AI Output).

Try This

This exercise is a self-audit for your current project. It takes about ten minutes and will tell you whether there are any compliance or ethical questions worth looking into before you go further.

Step 1. Write down in one sentence what kind of data your project uses and where it came from. Is it data you collected yourself? Administrative data from an institution? Publicly scraped text? A dataset shared by a collaborator under a DUA?

Step 2. Ask yourself whether any of the following apply: the data involves living human beings in any form, including their words, behaviors, or records. If yes, identify which regulatory framework is most relevant to your situation using the discussion above as a guide.

Step 3. Identify which tier of AI tool you have been using or plan to use with this data. If you have been using a public free-tier tool, ask whether the data you have been entering would be acceptable to appear in a commercial training dataset. If the answer is no, switch to UM-GPT or a local model for that work.

Step 4. Think about one output your project will produce that involves AI assistance — a classification, a summary, a generated text. Ask: is there a subgroup in my data for which this output might be systematically less accurate or fair? How would I know if that were the case?

You do not need to have perfect answers to all of these. The goal is to surface the questions that are worth raising with your IRB, your compliance office, or a colleague before they become problems downstream.

References

[1] (1,2)

National Institute of Standards and Technology. Artificial intelligence risk management framework (ai rmf 1.0). 2023. URL: https://www.nist.gov/itl/ai-risk-management-framework.

[2]

Sonja Bjelobaba, Lorna Waddington, Mike Perkins, Tomáš Foltýnek, Sabuj Bhattacharyya, and Debora Weber-Wulff. Research integrity and genai: a systematic analysis of ethical challenges across research phases. arXiv preprint, 2024. Accessed 2025-12-08. URL: https://arxiv.org/abs/2412.10134.

[3]

Nicholas Carlini and others. Extracting training data from large language models. In USENIX Security Symposium. 2021. URL: https://www.usenix.org/conference/usenixsecurity21/presentation/carlini.

[4] (1,2,3,4)

Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning. ACM Computing Surveys, 2021. URL: https://doi.org/10.1145/3457607.

[5]

Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453, 2019. URL: https://doi.org/10.1126/science.aax2342.