What Students Can Do Now

Lunch and Learn — Session 1

Kerry Back, Rice University

Chatbot vs. Agent

Chatbot

  • Passes messages back and forth with an AI model
  • Generates text, answers questions
  • No access to external tools or systems
  • What most people have used (ChatGPT, Claude.ai, Gemini)

Agent

  • Also has tools — file system, databases, Python, browser, APIs
  • Decides which tools to use based on context
  • Chains multiple tool calls before responding
  • Different path for every question

The Agent Loop

💬 Plan Understand the task, decide which tools to use

Execute Call tools — run code, query data, read files

🔍 Observe Check the results — do they make sense?

🔄 Iterate If not right, adjust and try again

The agent plans, executes, checks its own work, and iterates — sometimes calling 5 or 10 tools before producing a final answer. Everything I show you today uses this loop.

Excel File Generation

Demo: Loan Amortization

“Create a loan amortization table for a $300K mortgage at 6.5% with a 30-year term. Use formulas so I can change the inputs.”

What You Get

  • Complete .xlsx with live formulas
  • Change the rate → the whole table recalculates
  • Monthly payment, interest, principal, running balance
  • Multiple sheets if requested (summary + detail)
  • Conditional formatting, currency formatting

Then Ask For More

  • “Add a chart of principal vs. interest over time”
  • “Add a sheet comparing 15-year vs. 30-year terms”
  • “Turn this into an interactive web app with sliders”
  • Each request: 10–15 seconds
  • The Excel file has real formulas, not pasted values

LIVE DEMO

I’ll generate this Excel file live, open it, change the rate, and show the formulas recalculating.

Demo: Financial Model from Data

“Read this Excel file of historical financials. Build a pro forma income statement with revenue growing at 8%, margins expanding 50bp per year, and a terminal value at 10x EBITDA.”

LIVE DEMO

I’ll give the agent an Excel file with historical financials and ask it to:

  1. Read and summarize the historical data
  2. Build a 5-year pro forma projection with formulas
  3. Add a sensitivity table varying growth rate and exit multiple
  4. Generate a valuation summary sheet
  5. Produce the complete .xlsx with everything linked

The agent reads the source data, understands the structure, and builds a complete model with cell references linking everything together. This is not a template — it’s custom-built from the data you provide.

Claude for Excel

The Excel Add-In

  • AI sidebar that reads your open workbook
  • Understands the structure: sheets, ranges, formulas, named ranges
  • “What’s driving the variance in column G?”
  • “Add a VLOOKUP to pull prices from the other sheet”
  • “This formula is returning #REF — trace the error”
  • Works with your existing files — no re-upload needed

What It Can Build

  • Complete financial models from verbal descriptions
  • Pivot tables and summary statistics
  • Charts formatted to your specifications
  • Conditional formatting rules
  • Data validation and dropdown menus
  • Formulas, not hardcoded values — always

LIVE DEMO

I’ll open an Excel file with raw data, ask Claude to analyze it, add formulas, create a summary sheet, and build a chart — all from inside Excel.

Charts & Visualizations

Demo: Static Charts

“Read this CSV of quarterly revenue by region. Create a grouped bar chart with Q-over-Q growth labels, a line chart of cumulative revenue, and a heatmap of growth rates by region and quarter.”

LIVE DEMO

I’ll give the agent a CSV file and ask for three different chart types:

  1. Grouped bar chart with growth rate labels on each bar
  2. Line chart of cumulative revenue with annotations
  3. Heatmap of growth rates by region and quarter

The agent writes Python (matplotlib/seaborn), generates the charts, and saves them as PNG files.

Demo: Interactive Visualizations

“Build an interactive tool where I adjust the volatility parameter and see the Black-Scholes option price update in real time.”

What Happens

  • One sentence → interactive HTML artifact
  • Sliders, dropdowns, real-time updates
  • Click Publish → shareable URL
  • Students interact on laptops or phones
  • No coding, no hosting, no IT department

Works for Any Concept

  • Option pricing models
  • Regression visualization (adjust parameters, see fit update)
  • Game theory payoff matrices
  • Supply chain optimization
  • Customer lifetime value calculators
  • Bayesian updating demonstrations

LIVE DEMO

I’ll type the prompt, show the artifact, click publish, and share the URL — about 30 seconds.

Statistical Analysis

Demo: Exploratory Data Analysis

“Read this dataset. Give me summary statistics, check for missing values, show the distributions of key variables, and flag any outliers.”

LIVE DEMO

I’ll give the agent a dataset and ask for EDA:

  1. Summary statistics table (mean, median, std, min, max, missing count)
  2. Histograms of continuous variables
  3. Correlation matrix with heatmap
  4. Outlier detection with box plots
  5. Missing value analysis — which columns, what percentage, any patterns?

The agent writes pandas and matplotlib code, runs it, and presents the results. If something looks off — a variable with 90% missing values, a suspicious outlier — the agent flags it before you ask.

Demo: Regression Analysis

“Run a multiple regression of sales on advertising spend, price, and seasonality. Show me the results table, residual plots, and check for multicollinearity.”

What the Agent Produces

  • Regression results table (coefficients, standard errors, p-values, R²)
  • Residual plots (residuals vs. fitted, Q-Q plot, scale-location)
  • VIF scores for multicollinearity
  • Interpretation: “A $1 increase in ad spend is associated with…”
  • Formatted for inclusion in a paper or report

Then Iterate

  • “Add interaction terms between price and season”
  • “Run a robust regression — the residuals look heteroskedastic”
  • “Compare this model to one with logged variables”
  • “Generate a LaTeX table I can paste into my paper”
  • Each iteration: 15–30 seconds

LIVE DEMO

I’ll run the regression, show the output, check diagnostics, and iterate on the specification — all through conversation.

Demo: Panel Data and Fixed Effects

“This is panel data — firms observed over time. Run a fixed-effects regression of returns on book-to-market, size, and momentum with firm and time fixed effects. Cluster standard errors by firm.”

LIVE DEMO

I’ll show the agent handling:

  1. Panel data structure (identify firm and time dimensions)
  2. Fixed effects specification
  3. Clustered standard errors
  4. Hausman test (fixed vs. random effects)
  5. Results table formatted for publication

The agent uses linearmodels or statsmodels, handles the panel structure, and produces publication-ready output. You direct the specification; the agent handles the implementation.

Machine Learning

Demo: Classification (Churn Prediction)

“Build a gradient boosting classifier to predict customer churn from this dataset. Show me accuracy, the confusion matrix, and which features matter most.”

What the Agent Does

  • Reads the data, profiles it, handles missing values
  • Splits into train/test sets
  • Trains a gradient-boosted model (XGBoost or LightGBM)
  • Evaluates: accuracy, precision, recall, AUC
  • Generates confusion matrix visualization
  • Plots feature importance (top 10 predictors)

What You Check

  • Is accuracy better than a naive baseline?
  • Do the important features make business sense?
  • Precision vs. recall: what’s the cost of each error type?
  • Does the model generalize (train vs. test performance)?
  • Any data leakage? (Features that wouldn’t be available at prediction time)

LIVE DEMO

I’ll build a churn model live — show the confusion matrix, discuss the precision/recall tradeoff, and interpret feature importance.

Demo: Regression (Revenue Forecasting)

“Predict next quarter’s revenue for each region using this historical data. Use gradient boosting, show me R² and feature importance, and generate a forecast with confidence intervals.”

LIVE DEMO

  1. Train a gradient boosting regressor on historical data
  2. Evaluate: R², MAE, RMSE on held-out test set
  3. Feature importance: what drives revenue differences across regions?
  4. Generate point forecasts + confidence intervals for next quarter
  5. Visualize: actual vs. predicted with error bands

The agent handles feature engineering, model selection, and evaluation. You handle the judgment: does this forecast make sense given what you know about the business?

The Five-Step ML Workflow

📊 Prepare Clean data, select features, handle missing values

✂️ Split Train/test split — evaluate on unseen data

🔧 Fit Train the model (gradient boosting, random forest, etc.)

🎯 Predict Generate predictions on the test set

📋 Evaluate Accuracy, precision, recall, R², feature importance

All five steps directed through natural language. The AI writes the Python, runs it, and presents results. You verify: is accuracy better than baseline? Do the features make sense? Does the model generalize? These are judgment calls the AI cannot make for you.

What AI Cannot Do

Silent Analytical Errors

AI in code-execution mode rarely invents facts. Instead it makes silent analytical errors — mistakes that look fine on the surface.

Wrong Filters

  • “What is our average deal size?”
  • SQL included $0 lost deals that should have been excluded
  • Average reported 40% lower than reality
  • The agent doesn’t know “deal size” means closed-won only

Date Misinterpretation

  • Two systems: MM/DD/YYYY vs. YYYY-MM-DD
  • November transactions assigned to wrong month
  • Revenue understated by $2.3 million
  • No error messages, no warnings

Double-Counting

  • “Acme Corp” in one system, “ACME Corporation” in another
  • Fuzzy name matching counted customers multiple times
  • Customer count overstated by 23%
  • Entity resolution is one of the hardest data problems

These are illustrative examples. The pattern is real: the agent presents incorrect results with the same formatting and professional tone as correct results. No error messages. No warnings. Business context and institutional knowledge are what the AI lacks.

The Checker’s Toolkit

🔢 Spot-Check Run a simplified version by hand on one or two rows

🔍 Ask for Logic “Walk me through how you computed this number”

💻 Review Code Ask to see the key filters and aggregation steps

⚖️ Sanity-Check Does the answer make sense given what you know?

Calibrated Trust

🟢 Low Stakes

  • Internal notes, brainstorming
  • Personal exploration, first drafts
  • Quick sanity check is enough

🟡 Medium Stakes

  • Team decks, internal reports
  • Summary analyses for your manager
  • Verify every number, check framing

🔴 High Stakes

  • Board decks, regulatory filings
  • Public-facing documents
  • Independent verification of every claim

Match verification effort to consequences. Check proportionally to what happens if it’s wrong.

What About AI Detection?

The Experiment

1️⃣ Generate Asked Claude to write a one-page essay on corporate governance

2️⃣ Detect GPTZero scored it 100% AI

3️⃣ Rewrite Asked Claude, Gemma 4, and Kimi K2.5 to rewrite it

4️⃣ Re-test Every rewrite: still 100% AI

Multiple models, multiple rewrites, same result. GPTZero flagged every version as 100% AI-generated. Detectors look for low perplexity (predictable word choices), low burstiness (uniform sentence length), and formulaic structure (parallel paragraphs, balanced hedging, no personal voice). AI writing is too consistently “correct” — and that consistency is the tell.

But Detection Still Fails

Claude Refused to Help Evade

  • When asked directly to rewrite to avoid detection, Claude refused
  • But this is only one guardrail on one model
  • Other models may not have the same restriction
  • And there are other ways around it …

Tools That Bypass Detection

  • Undetectable AI, BypassGPT, StealthWriter, and others
  • Claim 90%+ success rates against GPTZero, Turnitin, etc.
  • Available to anyone with an internet connection
  • Students can also just submit to GPTZero themselves and revise until it passes

The detector arms race is unwinnable. Even if detection tools improve, students have unlimited retries: generate, test, revise, re-test. The question is not “how do we catch them?” — it’s “how do we assess what actually matters?”

The Takeaway

AI agents can generate Excel workbooks with live formulas, run regressions and panel data analyses, build machine learning models, produce publication-quality charts, create interactive tools, and produce undetectable written work — all from natural language. The skill that matters: knowing what to ask for, spotting what went wrong, and explaining why you trust the result.

Discussion