What Corporate Implementation Looks Like

Lunch and Learn — Session 3

Kerry Back, Rice University

What Your Graduates Will Face

AI Meets Enterprise Data

The Problem

  • “Which customers buy from multiple divisions?”
  • Touches three CRMs, an ERP, and an HR system
  • Different naming conventions, date formats, data models
  • Traditional answer: emails to five people over three weeks
  • Enterprise systems were never designed to talk to each other

The Agent Answer

  • Ask in plain English
  • Agent queries each system sequentially
  • Reconciles results in Python (not a single SQL join)
  • Handles fuzzy matching, date formats, naming differences
  • 30 seconds — but requires verification

The speed is real, but so are the risks. The 30-second answer may contain the same silent errors we discussed in Session 1. The new challenge: how do you verify a 30-second answer?

XYZ Corp Custom Chatbot

A simulated enterprise for teaching

  • $500M B2B industrial supplies distributor with three divisions: Industrial, Energy, Safety
  • 9 enterprise systems: Salesforce (Industrial CRM), Legacy CRM (Energy), HubSpot (Safety), NetSuite (finance), SAP (supply chain), Oracle SCM, Workday (HR), Zendesk (support), QuickBooks
  • 10 DuckDB databases pre-loaded as Parquet files — 41 tables, 26,000+ rows, 3 years of data
  • Web interface: Ask questions in plain English → Claude queries databases, generates charts, creates documents
  • Built for the Rice “From BI to AI” executive education program

Demo: Cross-System Query

LIVE DEMO

I’ll ask the XYZ Corp chatbot:

“Which customers buy from multiple divisions? Show combined revenue and flag name mismatches.”

Watch the agent:

  1. Query Salesforce (Industrial division customers)
  2. Query Legacy CRM (Energy division customers)
  3. Query HubSpot (Safety division customers)
  4. Fuzzy-match customer names across all three systems
  5. Produce a result table with combined revenue

The Result

Demo output

  • 10 customers identified across 2+ divisions
  • 4 buying from all 3 divisions: $56.5M combined revenue (11.4% of total)
  • Only 3 of 10 had matching names across systems — 7 required fuzzy matching
Customer Industrial Energy Safety Combined
General Electric GE Industrial GE Energy Solutions GE Safety Div $18.4M
ExxonMobil ExxonMobil Corp Exxon Mobil ExxonMobil LLC $14.7M
Dow Chemical Dow Inc Dow Chemical Co Dow Safety $12.1M
Chevron Chevron Corp Chevron USA Chevron Safety $11.3M

The name mismatches are the story. “General Electric” appears as three different strings across three systems. Without fuzzy matching, these look like 12 separate customers, not 4.

Behind the Scenes

1️⃣ Query Salesforce Industrial division customers and revenue

2️⃣ Query Legacy CRM Energy division customers and revenue

3️⃣ Query HubSpot Safety division customers and revenue

4️⃣ Fuzzy Merge Python: match names, reconcile, deduplicate

Not a single SQL join. The agent runs sequential queries against each system, then merges in Python. This is critical because enterprise systems have different schemas, naming conventions, and date formats.

From Data to Deliverable

📊 Query Agent queries multiple databases

🔗 Merge Reconcile across systems

🧮 Compute Calculate metrics and trends

📈 Chart Generate visualizations

📄 Narrate Assemble a finished report

Unlike a dashboard that answers yesterday’s questions, an agent answers any question you think of right now — constructing new queries for each one. The same pipeline produces a cross-system customer analysis, a quarterly executive summary, or a supply chain risk report.

Demo: Quarterly Executive Summary

LIVE DEMO

I’ll ask the XYZ Corp chatbot to prepare a quarterly executive summary. The agent will:

  • Query 6 systems (CRMs, Workday, Zendesk, finance)
  • Compute KPIs: revenue by division, QoQ growth, headcount efficiency, customer concentration
  • Generate charts and narrative

Then I’ll iterate on the same data:

  • Draft 1: Comprehensive data summary (flat, everything included)
  • Draft 2: Traffic-light format (red/yellow/green, flag >10% off plan)
  • Draft 3: One-page executive brief (lead with risk, end with actions)

Same data, same agent — the prompt is the variable. Three different formats from three different instructions, demonstrating the iteration principle from Session 2 at enterprise scale.

The Agent Loop

📋 System Prompt Schema descriptions, business rules, data gotchas

🔧 Tool Call LLM writes SQL and requests execution

Execute System runs SQL, returns results

Test & Review LLM checks results and self-corrects

The test-and-review step is what separates agents from dashboards. A dashboard runs a pre-written query — if it’s wrong, it’s wrong forever. An agent reviews its results and self-corrects: “Row count seems low — let me check the WHERE clause.”

The System Prompt

Institutional knowledge encoded as text

Available tables: - salesforce_opportunities: Industrial division deals - legacy_orders: Energy division (dates stored as MM/DD/YYYY text) - hubspot_deals: Safety division - workday_employees: HR data (headcount, department, hire date)

Business rules: - “Average deal size” means closed-won deals only - Customer names differ across systems — use fuzzy matching - Fiscal year starts February 1 - legacy_orders dates are text strings, not DATE type

The system prompt is what turns a generic LLM into your organization’s analyst. Without it, the agent makes the same mistakes a new hire would — averaging across all deals instead of closed-won, parsing date strings incorrectly, counting the same customer three times.

Enterprise Error Modes

Currency & Unit Errors

  • Revenue in USD in one system, EUR in another
  • Agent sums without converting
  • Reported total is meaningless
  • No error thrown — the numbers just add up wrong

Fiscal vs. Calendar Year

  • Finance system uses fiscal year (Feb–Jan)
  • CRM uses calendar year (Jan–Dec)
  • Agent joins on “2025” — misaligns by one month
  • Q4 revenue attributed to wrong period

Intercompany Elimination

  • Division A sells to Division B
  • Agent counts the transaction as external revenue
  • Consolidated revenue overstated
  • Standard accounting rule the AI doesn’t know

These are uniquely enterprise problems — they arise only when AI queries across multiple systems. The pattern is the same as Session 1: confident, well-formatted, wrong. The system prompt is the defense.

Deployment Architecture

Where Does the LLM Run?

Cloud API

  • Easiest setup, best model quality
  • Data leaves your network
  • Pay per token
  • Zero-retention agreements available
  • Morgan Stanley: GPT-4 to 16,000 advisors with zero data retention for SEC/FINRA compliance

On-Premise

  • Full data control — nothing leaves the building
  • Open-source models (Llama, Mistral)
  • $50–200K GPU infrastructure
  • Quality gap remains but narrowing
  • Developer tooling gap is larger than model gap

Hybrid

  • Route by sensitivity level
  • Public data → cloud API (best quality)
  • Sensitive data → on-premise model
  • Becoming the mainstream enterprise approach
  • Most sophisticated path

Build vs. Buy

Build Internally

  • Maximum customization, full control
  • Requires engineering team
  • In our experience: prototype ~50 lines, production ~3,000 lines
  • The gap: security, logging, error handling, access control, monitoring
  • 6–12 month development cycle for production

Buy or Extend

  • Faster time to value
  • Vendor handles infrastructure
  • Less customization, vendor lock-in risk
  • Options: AI-native startups, incumbent platforms (Salesforce Agentforce), extend existing tools
  • Evaluate with a framework (data integration, user experience, security, cost, extensibility)

Governance

Financial Liability

  • SOX compliance for public companies
  • Audit trails for every AI-generated number
  • Who signs off on AI-produced analysis?
  • Board deck accuracy requirements

Privacy

  • PII combination across systems
  • GDPR right-to-erasure conflicts
  • Data residency requirements
  • Cross-border data transfers

Access Control

  • Can the agent see data the user shouldn’t?
  • Role-based filtering at the query level
  • Prompt injection risks
  • Audit who asked what

Operational Risk

  • Inconsistent answers to same question
  • Model version changes break workflows
  • Audit logs and rollback procedures
  • Human review gates for high-stakes output

Building AI Systems

Custom Chatbots

Architecture

  • System prompt defines the chatbot’s behavior, domain knowledge, constraints
  • Every user message sent to LLM along with system prompt + conversation history
  • Web app hosted on corporate intranet
  • Single API key → logging, cost tracking, access control

The System Prompt Is Everything

  • Role: “You are XYZ Corp’s financial analyst”
  • Knowledge: schema descriptions, business rules
  • Constraints: “Never reveal raw salary data”
  • Format: “Present financial data in tables with % change columns”
  • The system prompt turns a generic LLM into a specialized tool

A custom chatbot is the simplest AI system to deploy. The XYZ Corp chatbot is essentially: system prompt (schema + rules) + Claude API + a web frontend + DuckDB for data. The intelligence comes from the system prompt and the model’s reasoning.

Custom Agents

What Makes It an Agent

  • Has tools — databases, Python, APIs, file systems
  • Decides which tools to use based on the question
  • Chains multiple tool calls before responding
  • Different path for every question — not a script

The Claude Agent SDK

  • Define tools as JSON Schema: name, description, input parameters
  • Write a system prompt with domain expertise and strategy
  • The agent loop: send message → LLM requests tool calls → execute tools → feed results back → repeat until done
  • ~200 lines of application logic
  • Complexity lives in the agent’s reasoning, not your code

Example: Portfolio Analyst Agent

Five tools, one agent, hundreds of different analyses

  • get_holdings — retrieve portfolio positions with tax lot detail
  • get_target_allocation — sector weight targets
  • get_analyst_recommendations — Strong Buy stocks by sector
  • run_sql — execute SQL against a live price database
  • run_python — run Python for analytics (correlations, returns, statistics)

“Review the portfolio”

  • Calls get_holdings
  • Calls run_sql for current prices
  • Calls run_python for market values
  • Calls get_target_allocation
  • Compares actual vs. target weights

“Harvest INTC losses and find a replacement”

  • Identifies INTC lots with largest losses
  • Calculates realized loss amount
  • Checks if selling moves Tech below target
  • Gets Strong Buy candidates for Technology
  • Fetches prices for INTC + candidates
  • Computes return correlations
  • 7 tool calls, none predetermined

AI+Code Lab

What It Is

  • Shared Ubuntu VM at ai-lab.rice-business.org
  • Each student gets an isolated workspace
  • Left pane: file browser
  • Right pane: terminal with Claude Code pre-launched
  • Pre-provisioned API keys and skills
  • Built for the Rice executive education program

What Students Build

  • Exercise 1: 50-line data agent querying one database (~30 min)
  • Exercise 2: Extend to query two systems and merge results (~30 min)
  • Exercise 3: Wrap in a web interface (~45 min)
  • Exercise 4: Red-team for security vulnerabilities (~30 min)
  • All built from English instructions — Claude Code writes the code

Students build a working data agent from scratch using Claude Code — from a 50-line prototype to a multi-system agent with a web interface. The same progression that organizations follow: prototype → extend → deploy → secure.

Docker: From Prototype to Production

Without Docker

  • “It works on my machine”
  • Install Python, pip, duckdb, anthropic…
  • Manage conflicting library versions
  • Secrets scattered across config files

With Docker

  • Identical environment everywhere
  • One command to build, one to run
  • Dependencies frozen in the image
  • Secrets injected at runtime, never baked in

A Docker container packages your agent, its dependencies, and its configuration into a single portable unit. The container provides an isolation boundary — agent code executes inside the container, not on the host. In production, companies deploy via orchestrators like Kubernetes or AWS ECS for scaling, restarts, and secret management.

The Adoption Roadmap

🧪 Sandbox Prove value with a prototype on safe data

📋 Audit Add logging, access control, verification protocols

🚀 Deploy Production rollout with governance in place

Pre-Work (60–90 Days)

  • Catalog enterprise systems and data sources
  • Document schemas, business rules, data gotchas
  • Secure API access and credentials
  • Classify data sensitivity (PII, confidential, public)
  • Identify 2–3 questions nobody can answer easily today

Signs You Should Wait

  • No clear use case that justifies the investment
  • Data quality issues that must be fixed first
  • No executive sponsor with budget authority
  • Regulatory uncertainty in your industry
  • IT infrastructure not ready for API integration

What This Means for Us

This is the world your graduates are entering. The skills they need: directing agents, verifying output, designing governance, and defending conclusions. Whether it’s a student checking a DCF model, a faculty member critiquing a paper draft, or a corporate team deploying an enterprise agent — the core skill is the same.

Session 1

What Students Can Do Now

  • AI agents as analytical tools
  • Silent errors as the real risk
  • The checker’s toolkit

Session 2

How We Can Use AI

  • Claude Code and Desktop
  • Skills, MCP, Canvas
  • Research workflows
  • Assessment reform

Session 3

Corporate Implementation

  • Enterprise data agents
  • Custom chatbots and agents
  • Deployment and governance

Discussion