How We Can Use AI in Teaching and Research

Lunch and Learn — Session 2

Kerry Back, Rice University

The AI Agent Landscape

OpenAI Codex

  • Terminal-based coding agent
  • Full local file access
  • Runs code, edits files, uses tools
  • Free with OpenAI account

Google Gemini CLI

  • Terminal-based agent
  • Google ecosystem integration
  • Code execution, file access
  • Free tier available

Anthropic Claude Code

  • Terminal-based universal agent
  • Full local file access + internet
  • Skills, MCP connectors, subagents
  • Also available as Claude Desktop (Code tab)

All three are AI agents with tools — not chatbots. This session focuses on Claude Code and Claude Desktop, but the concepts (skills, connectors, iteration) transfer to any agent platform.

Claude Desktop: Three Modes

Chat (Analysis)

  • Sandboxed Python in the browser
  • No local file access
  • Must upload data manually
  • Good for quick analysis

Cowork

  • Runs in a local VM
  • Files synced to/from VM
  • No internet access (sandbox)
  • Safe for sensitive work

Code

  • Runs on your machine
  • Full local file access
  • Full internet access
  • Same engine as Claude Code CLI

The Code tab in Claude Desktop and the terminal command claude use the same engine. Code mode gives Claude direct access to your machine — no sandbox, no VM, no upload step.

Claude Code: Universal Agent

What makes it universal

  • Full access to your local file system, internet, and installed tools
  • Add instructions via CLAUDE.md — Claude reads this on every startup
  • Add capabilities via skills — text files Claude reads only when needed
  • Add external tools via MCP connectors — email, calendar, databases, browsers
  • Run subagents in parallel for complex tasks
  • One interface for everything

The Iteration Principle

Don’t review AI’s first draft. Ask it to critique itself and revise. Repeat until stable. Then review the final version. You review a final draft instead of a first draft.

Vague — Weak

  • “Make it better”
  • “Improve the analysis”
  • “Make it more professional”

Specific — Strong

  • “Lead with the biggest risk to Q4 revenue”
  • “Add year-over-year comparisons for each metric”
  • “Cut the background section and start with the recommendation”

The AI’s time is cheap — a few cents per revision cycle. Your time is expensive. Let the AI iterate on itself before you invest your attention. Specific feedback produces specific improvement; vague feedback produces nothing.

Slide Creation

Slide Creation Workflows

Beamer (LaTeX)

  • AI writes the .tex file
  • Compiles to PDF
  • Generates PNG of each slide
  • Reviews for formatting issues (overflow, spacing)
  • Edits and repeats autonomously
  • Create a skill with your style preferences

PowerPoint

  • AI generates .pptx via Python (python-pptx)
  • Full control over layouts, charts, formatting
  • Works from verbal descriptions
  • Good for institutional templates
  • Can read and modify existing decks

Quarto → reveal.js

  • Markdown-based — simpler than LaTeX, more versatile
  • Produces HTML slides, websites, books
  • Accepts LaTeX math notation
  • Convert to PDF via decktape for annotation
  • AI views slides in browser and self-reviews

Demo: Autonomous Slide Creation

LIVE DEMO

I’ll ask Claude Code to:

  1. Create a Beamer slide deck on a topic
  2. Compile it to PDF
  3. Generate PNG images of each slide
  4. Review each image for formatting issues (overflow, spacing, alignment)
  5. Fix any problems and recompile
  6. Repeat until satisfied

The entire process runs autonomously — I review the final version.

Quarto for Slides

Why Quarto

  • Markdown language — simpler than TeX, similar in spirit
  • Produces reveal.js HTML slides (like this deck)
  • Also produces websites, online books, can run embedded code
  • Claude recommended Quarto over Beamer for visual quality
  • Accepts LaTeX math: $E = mc^2$

The Workflow

  • Add an MCP browser connector so Claude can view the rendered slides
  • Claude writes Quarto → renders → views in browser → edits → repeats
  • Hypothesis for student annotation (free accounts, works like Acrobat)
  • Convert to PDF via decktape if you need a file students can download
  • These slides are a Quarto deck that Claude built

CLAUDE.md and Skills

CLAUDE.md

The file Claude always reads on startup

  • A text file in your project folder (or global ~/.claude/CLAUDE.md)
  • Contains your preferences, project context, and rules
  • Claude follows these instructions automatically — no need to repeat them every session
  • Example contents: “Always use Beamer with the metropolis theme,” “My fiscal year starts in February,” “When generating charts, use blue and amber colors”

Think of CLAUDE.md as standing instructions for a junior analyst. You write them once, and Claude follows them in every conversation. You can also describe your skills here so Claude knows when to use them automatically.

Skills

What They Are

  • Text files with reusable instructions
  • Claude reads them only when needed — don’t clog the context window
  • Invoke with /skill_name (slash command)
  • Or describe the skill in CLAUDE.md so Claude uses it automatically
  • Claude can create skills for you: “Create a skill for building Beamer decks”

What They Can Do

  • Format text (status reports, summaries)
  • Process data (read files, run calculations, generate charts)
  • Orchestrate workflows (multiple steps, multiple files)
  • Spawn subagents for parallel work
  • Anything you can describe in English

Example: A Beamer Skill

beamer-create skill

Rules: - Use the metropolis theme with my custom color scheme - No more than 5 seconds to read any slide - No more than 3 bullet points per slide - Use section divider slides between major topics - After creating the deck, compile to PDF - Generate PNG of each slide and review for formatting - Fix any overflow, spacing, or alignment issues - Repeat compile → review → fix until no issues remain

Ask Claude to create this skill: “Create a skill called beamer-create with my Beamer preferences.” Split into beamer-create and beamer-review if you don’t always need the review loop. Every deck you produce will be consistent.

The Critique Skill

Reviewer 1: Correctness

  • Factual errors?
  • Logical gaps?
  • Missing information?
  • Claims supported?

Reviewer 2: Clarity

  • Logical structure?
  • Anything confusing or buried?
  • Main message direct enough?
  • Redundancy?

Reviewer 3: Devil’s Advocate

  • Strongest counterarguments?
  • Weakest reasoning?
  • What would a skeptic challenge?
  • Alternative interpretations?

Invoke with /critique filename. Spawns three subagents in parallel — each reviews from a different angle, then synthesizes findings and applies revisions. Run in a loop: critique → fix → critique again until stable. Works on papers, slide decks, grant proposals, course materials.

Subagents

What They Are

  • Claude spawns independent agents to work on subtasks
  • They run in the background — you continue your conversation
  • Claude decides when to use them spontaneously (parallelizable tasks)
  • Or you can direct: “Use subagents to research these three topics in parallel”

Examples

  • Critique skill: three reviewers run simultaneously
  • Research: search for papers on three topics at once
  • Code: test multiple approaches in parallel
  • File processing: convert 20 documents simultaneously
  • Each subagent has its own context — doesn’t clog the main conversation

MCP Connectors

Email & Calendar

Gmail MCP Connector

  • Reads both email accounts (Rice institutional + personal)
  • Manages both calendars — creates events, checks conflicts
  • Drafts replies, identifies emails needing responses
  • Claude walks you through the one-time setup

“Are there any events in my emails that should be added to my calendars? If so, add them.”

“Are there any emails that I need to reply to?”

“Draft an email to [person] about [topic] saying [key points].”

Rice’s security policy makes merging email accounts difficult, but it doesn’t restrict API access. Claude can read and manage both accounts from a single conversational interface.

Canvas API

Setup

  • Request an API key once (Claude tells you how to find the link in Canvas)
  • After that: no more SSO login, no more two-factor authentication
  • Claude handles all Canvas interactions via the API

What Claude Can Do

  • Upload files and assignments
  • Download student submissions
  • Upload gradebook and comments
  • Help with grading — provide a rubric and let Claude assess
  • Ask for a summary of why Claude assigned each grade

FERPA compliance: Anonymize submissions before grading. Ask for a script to upload a dummy gradebook first, then run the script on the actual gradebook. Student identity is stripped before AI processing.

Task Management

One fewer app to manage

  • Ask Claude to create a task skill with categories: personal, teaching, research
  • “What are my open teaching tasks?”
  • “Add a task to review the midterm exam by Friday”
  • “What’s overdue?”
  • Stored as files — Claude reads and updates them via the skill

There are many to-do apps, but it simplifies things to do everything inside Claude Code. The way all of personal computing will soon work, I think.

VS Code

VS Code: One App for Everything

VS Code + Claude Code extension

  • Code: Python, R, LaTeX, Quarto — all in one editor
  • Generate: Figures, tables, charts — AI writes the code and runs it
  • Compose: “Generate a figure showing X and insert it into the LaTeX file at section 3”
  • Orchestrate: Claude can edit multiple files, compile, review output, and iterate
  • Remote: SSH into a server and run Claude Code there via VS Code

Claude can generate a matplotlib figure, save it, insert the \includegraphics command into your .tex file, compile the PDF, and review the result — all from a single instruction. One application for your entire research and teaching workflow.

VS Code + Remote Server (jgsrc1)

Setup

  • Install the Remote SSH and Claude Code extensions in VS Code
  • SSH into jgsrc1 from within VS Code
  • Install Claude Code on the server: curl -fsSL https://claude.ai/install.sh | bash
  • Run claude in the VS Code terminal

What This Gets You

  • Edit files on the server with AI assistance
  • Run computations on jgsrc1’s hardware
  • Claude Code operates on the server’s file system directly
  • VS Code runs on your laptop; everything else runs remotely
  • Same experience as local — just on a more powerful machine

AI for Research

Research Workflows

Literature & Documents

  • Upload papers to NotebookLM — query across 50 sources with citations
  • Audio Overview generates a podcast-style discussion of your research
  • Compare year-over-year changes in 10-K disclosures or policy documents
  • AI reads and summarizes papers, identifies methodology gaps
  • RAG pipelines for large document collections

Empirical Analysis

  • AI writes Python/R code for econometric and statistical analysis
  • Generates figures and tables, inserts them into your manuscript
  • Data cleaning, merging, and wrangling from verbal descriptions
  • Replication and robustness checks directed through conversation
  • The critique skill catches weaknesses before reviewers do

The Research Loop in Practice

📄 Read AI summarizes papers, identifies gaps and methods

💻 Analyze AI writes code, generates figures and tables

📝 Write Draft sections, insert results into LaTeX

🔍 Critique Three-reviewer critique skill finds weaknesses

Each cycle takes minutes, not days. The AI remembers context across the session, so you can iterate rapidly. The critique skill is particularly valuable — three parallel reviewers catch issues you’d miss on your own.

Demo: Research Workflow

LIVE DEMO

I’ll show Claude Code:

  1. Reading a dataset and exploring its structure
  2. Running a regression analysis
  3. Generating a formatted results table
  4. Creating a figure (e.g., coefficient plot)
  5. Inserting both the table and figure into a LaTeX paper
  6. Compiling the PDF and reviewing the output

Rethinking Assessment

The Problem

Current State

  • Most assessments test production
  • Write a report, build a model, create a deck
  • AI does all of this in 60 seconds
  • We are testing a skill that has been commoditized

The Question

  • “Did the student do the work?” is now the wrong question
  • The right question: can they defend it?
  • Can they explain assumptions?
  • Can they field hard questions?
  • Can they catch what AI got wrong?

This applies most directly to analytical and data-intensive courses. In disciplines where producing the work IS the learning, the framework needs adaptation.

The AI Examiner

📄 Upload Student submits slides (PDF)

🔍 Pre-Analysis AI identifies key claims, gaps, question areas

🎙️ Voice Exam 10–12 adaptive questions via voice AI

📊 Grading Council Three AI models grade independently, then deliberate

Multi-model deliberation

  • Three frontier models grade independently, see peers’ scores, adjust with reasoning
  • Faculty review required for borderline grades and all appeals
  • Convergence metrics are auditable
  • Practice mode available 24/7

The Presentation Examiner

How It Works

  • Students receive a magic-link login (no passwords)
  • Upload PDF slides
  • Session 1: Present to AI listener (ElevenLabs voice agent)
  • Session 2: Answer 3 AI-generated questions + follow-ups
  • Claude Sonnet analyzes slides and generates questions
  • GPT-4 grades on three rubrics with detailed feedback

What It Solves

  • Traditional: schedule every student for a live presentation (nightmare)
  • This system: anytime, anywhere presentations
  • Instant, consistent grading with detailed rubric feedback
  • Instructor reviews transcripts and grades later
  • Scales from 20 to 2,000 students without hiring graders
  • Live now — built for the Rice executive education program

Concerns and Validation

Cost

  • API costs: ~$1 per exam at current pricing
  • Development and maintenance: ongoing
  • Faculty review time for borderline grades
  • Total cost of ownership exceeds API cost
  • But scales better than human grading

Fairness

  • Practice mode available 24/7
  • Extended time accommodations built in
  • Text-based alternative available
  • Accent/fluency bias testing: explicit pilot deliverable
  • FERPA review planned

Validation Plan

  • Blind AI vs. faculty grading comparison
  • Pre-registered reliability threshold (Cohen’s kappa)
  • No deployment without pilot evidence
  • Student feedback and learning outcomes
  • Target: 2027–28, pending results

Who Wants to Pilot This?