Chapter 29: Quick Reference Templates

Important

The templates in this chapter are starting points, not compliance checklists. What counts as adequate documentation varies by institution, journal, and funding agency. Before submitting work that involved AI tools or sensitive data, check the specific requirements from your journal, IRB, and department. When in doubt, document more rather than less.

The following templates are designed to be copied directly into your project documentation. They cover the four most common documentation needs that come up in AI-assisted research: stating how AI tools were used, logging data provenance and preprocessing decisions, summarizing a model for readers, and reviewing your project against a basic ethics checklist before submission.

Data Provenance and Preprocessing Log

1. Dataset Overview

Dataset name:
Version or date acquired:
Source (PI, repository, instrument):
Link or accession number (if public):
IRB or DUA reference (if applicable):

2. Data Structure

Number of subjects or samples:
Variables and features included:
File formats (CSV, JSON, Parquet, etc.):
Original storage location:

3. Preprocessing Steps

For each step, document the date, the tool or method used, who was responsible, the relevant code or command, the output produced, and any notes on data quality or anomalies.

Date	Step	Tool/Method	Description	Output	Verified By
2025-03-01	Remove duplicate records	Python (pandas)	Dropped 47 exact duplicate rows based on respondent ID	cleaned_survey_v1.csv	J. Smith
2025-03-03	AI-assisted feature suggestions	UM-GPT (no data uploaded)	Prompted for candidate interaction terms based on theory; evaluated manually	feature_notes.md	J. Smith

4. Data Exclusion and Inclusion Criteria

(List rules for exclusions and the rationale behind each.)

5. Version Tracking

Raw data filename:
Cleaned data filename:
Analysis-ready data filename:
Git commit tags or data version number:

6. Reproducibility Notes

(Any assumptions, known issues, missing metadata, or steps that require manual intervention.)

Model Card Summary

1. Model Overview

Model name:
Version:
Architecture (e.g., gradient boosting, LSTM, Transformer, fine-tuned LLM):
Purpose of model (task):

2. Intended Use

Primary intended use cases:
Out-of-scope uses (important for preventing misuse):

3. Training Data

Source datasets:
Size (subjects, samples, tokens):
Data characteristics (e.g., demographic composition, domain, time period):
Preprocessing steps:

4. Evaluation Metrics

Metrics used (AUC, accuracy, F1, MAE, etc.):
Cross-validation strategy:
Baseline comparisons:

5. Performance Summary

Subgroup	Metric	Notes
Overall
Group A
Group B

6. Ethical and Fairness Considerations

Potential biases:
Known failure modes:
Mitigation strategies:

7. Limitations

Data limitations
Generalizability limits
Known sources of uncertainty

8. Recommended Monitoring

(What should be monitored if this model is deployed or reused: drift, error patterns, fairness over time.)

Chapter 29: Quick Reference Templates

AI Usage Statement

Data Provenance and Preprocessing Log

1. Dataset Overview

2. Data Structure

3. Preprocessing Steps

4. Data Exclusion and Inclusion Criteria

5. Version Tracking

6. Reproducibility Notes

Model Card Summary

1. Model Overview

2. Intended Use

3. Training Data

4. Evaluation Metrics

5. Performance Summary

6. Ethical and Fairness Considerations

7. Limitations

8. Recommended Monitoring

Ethics Review Checklist

A. Data and Privacy

B. AI Tool Use

C. Bias and Fairness

D. Reproducibility

E. Compliance and Institutional Oversight

F. Dissemination