Lunch and Learn — Session 1
Chatbot
Agent
💬 Plan Understand the task, decide which tools to use
⚡ Execute Call tools — run code, query data, read files
🔍 Observe Check the results — do they make sense?
🔄 Iterate If not right, adjust and try again
The agent plans, executes, checks its own work, and iterates — sometimes calling 5 or 10 tools before producing a final answer. Everything I show you today uses this loop.
“Create a loan amortization table for a $300K mortgage at 6.5% with a 30-year term. Use formulas so I can change the inputs.”
What You Get
Then Ask For More
LIVE DEMO
I’ll generate this Excel file live, open it, change the rate, and show the formulas recalculating.
“Read this Excel file of historical financials. Build a pro forma income statement with revenue growing at 8%, margins expanding 50bp per year, and a terminal value at 10x EBITDA.”
LIVE DEMO
I’ll give the agent an Excel file with historical financials and ask it to:
The agent reads the source data, understands the structure, and builds a complete model with cell references linking everything together. This is not a template — it’s custom-built from the data you provide.
The Excel Add-In
What It Can Build
LIVE DEMO
I’ll open an Excel file with raw data, ask Claude to analyze it, add formulas, create a summary sheet, and build a chart — all from inside Excel.
“Read this CSV of quarterly revenue by region. Create a grouped bar chart with Q-over-Q growth labels, a line chart of cumulative revenue, and a heatmap of growth rates by region and quarter.”
LIVE DEMO
I’ll give the agent a CSV file and ask for three different chart types:
The agent writes Python (matplotlib/seaborn), generates the charts, and saves them as PNG files.
“Build an interactive tool where I adjust the volatility parameter and see the Black-Scholes option price update in real time.”
What Happens
Works for Any Concept
LIVE DEMO
I’ll type the prompt, show the artifact, click publish, and share the URL — about 30 seconds.
“Read this dataset. Give me summary statistics, check for missing values, show the distributions of key variables, and flag any outliers.”
LIVE DEMO
I’ll give the agent a dataset and ask for EDA:
The agent writes pandas and matplotlib code, runs it, and presents the results. If something looks off — a variable with 90% missing values, a suspicious outlier — the agent flags it before you ask.
“Run a multiple regression of sales on advertising spend, price, and seasonality. Show me the results table, residual plots, and check for multicollinearity.”
What the Agent Produces
Then Iterate
LIVE DEMO
I’ll run the regression, show the output, check diagnostics, and iterate on the specification — all through conversation.
“This is panel data — firms observed over time. Run a fixed-effects regression of returns on book-to-market, size, and momentum with firm and time fixed effects. Cluster standard errors by firm.”
LIVE DEMO
I’ll show the agent handling:
The agent uses linearmodels or statsmodels, handles the panel structure, and produces publication-ready output. You direct the specification; the agent handles the implementation.
“Build a gradient boosting classifier to predict customer churn from this dataset. Show me accuracy, the confusion matrix, and which features matter most.”
What the Agent Does
What You Check
LIVE DEMO
I’ll build a churn model live — show the confusion matrix, discuss the precision/recall tradeoff, and interpret feature importance.
“Predict next quarter’s revenue for each region using this historical data. Use gradient boosting, show me R² and feature importance, and generate a forecast with confidence intervals.”
LIVE DEMO
The agent handles feature engineering, model selection, and evaluation. You handle the judgment: does this forecast make sense given what you know about the business?
📊 Prepare Clean data, select features, handle missing values
✂️ Split Train/test split — evaluate on unseen data
🔧 Fit Train the model (gradient boosting, random forest, etc.)
🎯 Predict Generate predictions on the test set
📋 Evaluate Accuracy, precision, recall, R², feature importance
All five steps directed through natural language. The AI writes the Python, runs it, and presents results. You verify: is accuracy better than baseline? Do the features make sense? Does the model generalize? These are judgment calls the AI cannot make for you.
AI in code-execution mode rarely invents facts. Instead it makes silent analytical errors — mistakes that look fine on the surface.
Wrong Filters
Date Misinterpretation
Double-Counting
These are illustrative examples. The pattern is real: the agent presents incorrect results with the same formatting and professional tone as correct results. No error messages. No warnings. Business context and institutional knowledge are what the AI lacks.
🔢 Spot-Check Run a simplified version by hand on one or two rows
🔍 Ask for Logic “Walk me through how you computed this number”
💻 Review Code Ask to see the key filters and aggregation steps
⚖️ Sanity-Check Does the answer make sense given what you know?
🟢 Low Stakes
🟡 Medium Stakes
🔴 High Stakes
Match verification effort to consequences. Check proportionally to what happens if it’s wrong.
1️⃣ Generate Asked Claude to write a one-page essay on corporate governance
2️⃣ Detect GPTZero scored it 100% AI
3️⃣ Rewrite Asked Claude, Gemma 4, and Kimi K2.5 to rewrite it
4️⃣ Re-test Every rewrite: still 100% AI
Multiple models, multiple rewrites, same result. GPTZero flagged every version as 100% AI-generated. Detectors look for low perplexity (predictable word choices), low burstiness (uniform sentence length), and formulaic structure (parallel paragraphs, balanced hedging, no personal voice). AI writing is too consistently “correct” — and that consistency is the tell.
Claude Refused to Help Evade
Tools That Bypass Detection
The detector arms race is unwinnable. Even if detection tools improve, students have unlimited retries: generate, test, revise, re-test. The question is not “how do we catch them?” — it’s “how do we assess what actually matters?”
AI agents can generate Excel workbooks with live formulas, run regressions and panel data analyses, build machine learning models, produce publication-quality charts, create interactive tools, and produce undetectable written work — all from natural language. The skill that matters: knowing what to ask for, spotting what went wrong, and explaining why you trust the result.