How We Can Use AI in Teaching and Research

Lunch and Learn — Session 2

Kerry Back, Rice University

The AI Agent Landscape

OpenAI Codex

  • Terminal-based coding agent
  • Full local file access
  • Runs code, edits files, uses tools
  • Free with OpenAI account

Google Gemini CLI

  • Terminal-based agent
  • Google ecosystem integration
  • Code execution, file access
  • Free tier available

Anthropic Claude Code

  • Terminal-based universal agent
  • Full local file access + internet
  • Skills, MCP connectors, subagents
  • Also available as Claude Desktop (Code tab)

All three are AI agents with tools — not chatbots. This session focuses on Claude Code and Claude Desktop, but the concepts (skills, connectors, iteration) transfer to any agent platform.

Claude Desktop: Three Modes

Chat (Analysis)

  • Sandboxed Python in the browser
  • No local file access
  • Must upload data manually
  • Good for quick analysis

Cowork

  • Runs in a local VM
  • Files synced to/from VM
  • No internet access (sandbox)
  • Safe for sensitive work

Code

  • Runs on your machine
  • Full local file access
  • Full internet access
  • Same engine as Claude Code CLI

The Code tab in Claude Desktop and the terminal command claude use the same engine. Code mode gives Claude direct access to your machine — no sandbox, no VM, no upload step.

Claude Code: Universal Agent

What makes it universal

  • Full access to your local file system, internet, and installed tools
  • Add instructions via CLAUDE.md — Claude reads this on every startup
  • Add capabilities via skills — text files Claude reads only when needed
  • Add external tools via MCP connectors — email, calendar, databases, browsers
  • Run subagents in parallel for complex tasks
  • One interface for everything

General Tools

Email & Calendar

Gmail MCP Connector

  • Reads both email accounts (Rice institutional + personal)
  • Manages both calendars — creates events, checks conflicts
  • Drafts replies, identifies emails needing responses
  • Claude walks you through the one-time setup

“Are there any events in my emails that should be added to my calendars? If so, add them.”

“Are there any emails that I need to reply to?”

“Draft an email to [person] about [topic] saying [key points].”

Rice’s security policy makes merging email accounts difficult, but it doesn’t restrict API access. Claude can read and manage both accounts from a single conversational interface.

Task Management

One fewer app to manage

  • Ask Claude to create a task skill with categories: personal, teaching, research
  • “What are my open teaching tasks?”
  • “Add a task to review the midterm exam by Friday”
  • “What’s overdue?”
  • Stored as files — Claude reads and updates them via the skill

There are many to-do apps, but it simplifies things to do everything inside Claude Code. The way all of personal computing will soon work, I think.

AI for Research

Planning & Model Building

Plan Mode

  • Shift+Tab to force Claude into plan mode
  • Claude also enters plan mode automatically for complex projects
  • In plan mode, Claude researches and architects — editing is disabled
  • Claude asks clarifying questions before execution
  • Plans are saved to ~/.claude/plans/ — persisted on disk

Model Building

  • Describe the model in words — Claude writes the code
  • Specify your preferred language (Python, R, Stata, MATLAB)
  • Iterative refinement: “add fixed effects,” “cluster standard errors by firm”
  • Claude writes reusable scripts — scripts are the artifact, not just the output
  • Build replicable pipelines, not ad-hoc commands

Planning is as important as execution. For substantial projects, invest in the plan — then clear context so implementation has a clean slate. The plan file is Claude’s instructions to itself.

Data Handling & Analysis

Data Acquisition

  • Claude finds, downloads, and parses data from APIs, websites, and files
  • Handles messy Excel structures, scrapes SEC Edgar, Census, FRED
  • Set user-agent headers to avoid HTTP 403 blocks on government sites
  • Cache raw data locally — re-runs only download new/missing files

Large Datasets

  • Convert CSV → Parquet for dramatic compression (70 GB → 1 GB)
  • Query with DuckDB — serverless SQL, no database setup
  • Schema harmonization across eras (e.g., HMDA changed column names in 2018)
  • Add metadata and data dictionaries to help future sessions
  • Include resume capability in download scripts

Claude samples data to understand structure, then writes a script to process the full dataset. It never loads everything into the context window. DuckDB + Parquet is the modern stack for research data — fast, portable, and free.

Generating Figures & Tables

From verbal description to publication-ready output

  • “Plot homeownership rates by age over time using Kieran Healy’s best practices” — Claude knows published style guides
  • Generates ggplot2 (R) or matplotlib/seaborn (Python) figures
  • Iterate: “make the legend larger,” “use direct labeling instead of a legend,” “switch to a coefficient plot”
  • Formatted regression tables (stargazer, modelsummary, estout)
  • Inserts figures and tables directly into your LaTeX manuscript

Start rough, review, refine. Reference style authorities by name — Claude knows Kieran Healy’s Data Visualization, Tufte’s principles, and journal-specific formatting requirements.

Writing LaTeX

Drafting & Editing

  • Claude writes LaTeX sections from verbal descriptions or outlines
  • Inserts \includegraphics and table environments referencing your generated outputs
  • Compiles the PDF and reviews the result autonomously
  • Build a personal style guide skill from your published papers — Claude drafts in your voice

Revision Workflows

  • LLM as editor, not writer: “Create inline comments where my argument is poor — do not edit my text”
  • Preserves your voice — prevents convergence to generic AI prose
  • Strategic revision skill (Jukka Sihvonen): upload manuscript + referee reports → DAG-validated revision plan
  • Identifies conflicting referee demands and parallelizable tasks

Writing is thinking — be deliberate about what you offload. Use the LLM as an editor and collaborator, not a ghostwriter. 1977 IBM: “A computer can never be held accountable, therefore a computer must never make a management decision.”

VS Code for Research

VS Code with file explorer, Python editor, LaTeX editor, LaTeX preview, and Claude Code terminal

VS Code + Remote Server (jgsrc1)

Setup

  • Install the Remote SSH and Claude Code extensions in VS Code
  • SSH into jgsrc1 from within VS Code
  • Install Claude Code on the server: curl -fsSL https://claude.ai/install.sh | bash
  • Run claude in the VS Code terminal

What This Gets You

  • Edit files on the server with AI assistance
  • Run computations on jgsrc1’s hardware
  • Claude Code operates on the server’s file system directly
  • VS Code runs on your laptop; everything else runs remotely
  • Same experience as local — just on a more powerful machine

Best Practices

Plan → Execute → Evaluate

📋 Plan Shift+Tab for plan mode. Research, ask questions, architect before coding.

⚙️ Execute Clear context after planning. Implementation starts fresh with the plan file.

🔍 Evaluate Run the critique skill. Three parallel reviewers catch what you’d miss.

Trust but verify: Know your expected output so you can catch errors — e.g., 30 firms × 4 years = ~120 rows. Resist the “drinking bird” temptation of blindly hitting “yes.” Claude is a remarkable programmer but makes mistakes on edge cases.

Working with Subagents

What They Are

  • Claude spawns independent agents to work on subtasks
  • They run in the background — you continue your conversation
  • Claude decides when to use them spontaneously (parallelizable tasks)
  • Or you can direct: “Use subagents to research these three topics in parallel”
  • Each subagent has its own context — doesn’t clog the main conversation

Gotchas

  • Subagents do not inherit all parent instructions
  • If you told Claude “stay in this folder,” a subagent may still explore elsewhere
  • You must re-state constraints at the subagent level
  • Explain why it should stop, not just say no — subagents are persistent
  • Denying a tool call alone is not enough; the agent will try alternatives

Context Window & Keyboard Shortcuts

Context Window Management

  • Everything consumes the context window — being precise improves performance
  • /compact — summarizes conversation to free space
  • Start new sessions when focus drifts
  • Research → plan → implement workflow: write findings to a file, start fresh, read the file
  • Subagents protect the main window

Key Shortcuts

  • Escape — interrupt; double-escape — roll back
  • Ctrl+O — toggle Claude’s thinking
  • Shift+Tab — force plan mode
  • @filename — reference a file
  • Tab on a permission prompt — add extra instructions
  • Up arrow — edit previous message

Data Privacy

Treat Claude Code’s access like Dropbox

  • Everything in your conversation — prompts, file contents, tool outputs — goes to Anthropic’s API
  • Do not expose: PII, HIPAA data, API keys, passwords, IRB-restricted data
  • If you accidentally paste a secret, delete it and rotate the credential immediately
  • Wall off sensitive directories — don’t launch Claude Code from a folder containing restricted data
  • FERPA: anonymize student submissions before AI grading

Claude Code runs on your machine but sends context to Anthropic’s servers. The same caution you’d use with Dropbox or email applies here. Be deliberate about what enters the conversation.

Sycophancy & the Critique Skill

Reviewer 1: Correctness

  • Factual errors?
  • Logical gaps?
  • Missing information?
  • Claims supported?

Reviewer 2: Clarity

  • Logical structure?
  • Anything confusing or buried?
  • Main message direct enough?
  • Redundancy?

Reviewer 3: Devil’s Advocate

  • Strongest counterarguments?
  • Weakest reasoning?
  • What would a skeptic challenge?
  • Alternative interpretations?

Sycophancy warning: Claude’s feedback on your work will be relentlessly positive — even the weakest argument gets praised. Push hard: “Be harsh. What would a skeptical reviewer attack?” The critique skill helps by assigning an explicit devil’s-advocate role.

The Autonomous Review Loop

✏️ Create Claude writes the document, code, or slides

🔄 Render Compile, execute, or build the output

👁️ Review Claude inspects the rendered result visually

🔧 Fix & Repeat Edit and re-render until no issues remain

Don’t review AI’s first draft. Let it critique itself and revise autonomously. Repeat until stable. Then review the final version. Your time is expensive — AI’s time is a few cents per cycle.

This loop applies to everything: LaTeX papers, slide decks, code output, figures. Claude creates, renders, inspects, and fixes without prompting. You review a final draft instead of a first draft.

Teaching with AI

Slide Creation Workflows

Beamer (LaTeX)

  • AI writes the .tex file, compiles to PDF
  • Review: generates PNG of each slide and inspects for overflow, spacing, alignment
  • Edits and repeats autonomously
  • Create a skill with your style preferences

PowerPoint

  • AI generates .pptx via PptxGenJS (Node.js) and edits via direct XML manipulation
  • Review: converts .pptx → PDF via LibreOffice, then PDF → PNGs for visual inspection
  • Good for university or department templates
  • Can read and modify existing decks

Quarto → reveal.js

  • Markdown-based — simpler than LaTeX, more versatile
  • Embed executable code — figures generated at render time
  • Review: uses the browser MCP tool to view rendered HTML slides directly
  • Accepts LaTeX math notation

The autonomous review loop applies to all three formats. The key difference is how Claude sees the output: PNGs for PDF-based formats, the browser tool for HTML. Claude creates → renders → reviews → fixes → repeats until satisfied.

Quarto: Embedded Code & Interactive Slides

Embedded Code Blocks

  • Write Python or R code directly in your .qmd file
  • Figures are generated at render time — always up to date
  • Change a parameter, re-render, new figure appears
  • Students see the figure (or optionally the code too)
  • No separate script files to manage

Interactive Plotly Slides

  • Plotly produces interactive HTML charts — hover, zoom, pan
  • Embed directly in reveal.js slides
  • Students explore data live during lecture
  • Efficient frontier, yield curves, return distributions — all interactive
  • Works in any browser, no install required

Quarto + Plotly turns static lecture slides into interactive data exploration. Claude writes the code blocks, renders the slides, views them in the browser, and iterates. These slides are a Quarto deck that Claude built.

VS Code for Teaching

VS Code with file explorer, Quarto source, rendered slide preview, and Claude Code terminal

CLAUDE.md and Skills

CLAUDE.md

  • A text file in your project folder (or global ~/.claude/CLAUDE.md)
  • Contains your preferences, project context, and rules
  • Claude follows these instructions automatically — no need to repeat them
  • Example: “Always use the metropolis Beamer theme,” “Use Python, not R,” “When generating charts, use blue and amber colors”

Skills

  • Text files with reusable instructions — Claude reads them only when needed
  • Invoke with /skill_name or let Claude decide automatically
  • Claude can create skills for you: “Create a skill for building Beamer decks”
  • Examples: beamer-create, critique, grading rubrics, style guides

Think of CLAUDE.md as standing instructions for a research assistant. Skills are specialized playbooks for recurring tasks. Both are just text files — nothing to install.

In-Class Use of AI

Chat Exercises

  • “Chat with Claude/ChatGPT for ten minutes about the efficient frontier. What did you learn that surprised you?”
  • Students engage with concepts at their own pace and level
  • AI adapts explanations to the student’s questions
  • Low stakes, high engagement — works in any discipline
  • Debrief as a class: what did the AI get wrong?

NotebookLM for Research Papers

  • Upload assigned research papers to NotebookLM
  • Audio Overview generates a podcast-style discussion of the paper
  • Students listen before class — arrive with context and questions
  • Query the paper in class: “What is the identification strategy?” “What are the limitations?”
  • Inline citations back to the source text

These approaches work for any course. The chat exercise takes zero preparation — just a topic and ten minutes. NotebookLM Audio Overviews turn dense papers into accessible pre-class listening.

Canvas API

Setup

  • Request an API key once (Claude tells you how to find the link in Canvas)
  • After that: no more SSO login, no more two-factor authentication
  • Claude handles all Canvas interactions via the API

What Claude Can Do

  • Upload files and assignments
  • Download student submissions
  • Upload gradebook and comments
  • Help with grading — provide a rubric and let Claude assess
  • Ask for a summary of why Claude assigned each grade

FERPA compliance: Anonymize submissions before grading. Claude strips student identity, grades against the rubric, then de-anonymizes to upload scores. Test with a dummy gradebook first.

The AI Oral Examiner

📄 Upload Student submits slides (PDF)

🔍 Pre-Analysis AI identifies key claims, gaps, question areas

🎙️ Voice Exam 10–12 adaptive questions via voice AI

📊 Grading Council Three AI models grade independently, then deliberate

Why oral examination?

  • “Did the student do the work?” is now the wrong question — AI writes a report or builds a model in 60 seconds
  • The right question: can they defend it?
  • Can they explain assumptions? Field hard questions? Catch what AI got wrong?
  • Practice mode available 24/7 — students improve through repetition

The Presentation Examiner

How It Works

  • Students receive a magic-link login (no passwords)
  • Upload PDF slides
  • Session 1: Present to AI listener (ElevenLabs voice agent)
  • Session 2: Answer 3 AI-generated questions + follow-ups
  • Claude Sonnet analyzes slides and generates questions
  • GPT-4 grades on three rubrics with detailed feedback

What It Solves

  • Traditional: schedule every student for a live presentation (nightmare)
  • This system: anytime, anywhere presentations
  • Instant, consistent grading with detailed rubric feedback
  • Instructor reviews transcripts and grades later
  • Scales from 20 to 2,000 students without hiring graders
  • Live now — piloted at Rice

Concerns and Validation

Cost

  • API costs: ~$1 per exam at current pricing
  • Development and maintenance: ongoing
  • Faculty review time for borderline grades
  • Total cost of ownership exceeds API cost
  • But scales better than human grading

Fairness

  • Practice mode available 24/7
  • Extended time accommodations built in
  • Text-based alternative available
  • Accent/fluency bias testing planned
  • FERPA review planned

Validation Plan

  • Blind AI vs. faculty grading comparison
  • Pre-registered reliability threshold (Cohen’s kappa)
  • No deployment without pilot evidence
  • Student feedback and learning outcomes
  • Target: 2027–28, pending results

Live Demos

AI Lab

  • Python + Node.js + Claude Code in individual cloud accounts
  • Each student gets their own environment — nothing to install
  • ai-lab.rice-business.org
  • Log in: test_student / jgsbai

XYZ Corp

  • Simulated enterprise data with Claude + Python
  • Students query, analyze, and visualize in a realistic setting
  • xyzcorp.rice-business.org
  • Log in: test_student / jgsbai

Presentation Examiner

Who Wants to Pilot This?