What Corporate Implementation Looks Like

What Your Graduates Will Face

AI Meets Enterprise Data

The Problem

“Which customers buy from multiple divisions?”
Touches three CRMs, an ERP, and an HR system
Different naming conventions, date formats, data models
Traditional answer: emails to five people over three weeks
Enterprise systems were never designed to talk to each other

The Agent Answer

Ask in plain English
Agent queries each system sequentially
Reconciles results in Python (not a single SQL join)
Handles fuzzy matching, date formats, naming differences
30 seconds — but requires verification

The speed is real, but so are the risks. The 30-second answer may contain the same silent errors we discussed in Session 1. The new challenge: how do you verify a 30-second answer?

XYZ Corp Custom Chatbot

A simulated enterprise for teaching

$500M B2B industrial supplies distributor with three divisions: Industrial, Energy, Safety
9 enterprise systems: Salesforce (Industrial CRM), Legacy CRM (Energy), HubSpot (Safety), NetSuite (finance), SAP (supply chain), Oracle SCM, Workday (HR), Zendesk (support), QuickBooks
10 DuckDB databases pre-loaded as Parquet files — 41 tables, 26,000+ rows, 3 years of data
Web interface: Ask questions in plain English → Claude queries databases, generates charts, creates documents
Built for the Rice “From BI to AI” executive education program

Demo: Cross-System Query

LIVE DEMO

I’ll ask the XYZ Corp chatbot:

“Which customers buy from multiple divisions? Show combined revenue and flag name mismatches.”

Watch the agent:

Query Salesforce (Industrial division customers)
Query Legacy CRM (Energy division customers)
Query HubSpot (Safety division customers)
Fuzzy-match customer names across all three systems
Produce a result table with combined revenue

The Result

Demo output

10 customers identified across 2+ divisions
4 buying from all 3 divisions: $56.5M combined revenue (11.4% of total)
Only 3 of 10 had matching names across systems — 7 required fuzzy matching

Customer	Industrial	Energy	Safety	Combined
General Electric	GE Industrial	GE Energy Solutions	GE Safety Div	$18.4M
ExxonMobil	ExxonMobil Corp	Exxon Mobil	ExxonMobil LLC	$14.7M
Dow Chemical	Dow Inc	Dow Chemical Co	Dow Safety	$12.1M
Chevron	Chevron Corp	Chevron USA	Chevron Safety	$11.3M

The name mismatches are the story. “General Electric” appears as three different strings across three systems. Without fuzzy matching, these look like 12 separate customers, not 4.

Behind the Scenes

1️⃣ Query Salesforce Industrial division customers and revenue

2️⃣ Query Legacy CRM Energy division customers and revenue

3️⃣ Query HubSpot Safety division customers and revenue

4️⃣ Fuzzy Merge Python: match names, reconcile, deduplicate

Not a single SQL join. The agent runs sequential queries against each system, then merges in Python. This is critical because enterprise systems have different schemas, naming conventions, and date formats.

From Data to Deliverable

📊 Query Agent queries multiple databases

🔗 Merge Reconcile across systems

🧮 Compute Calculate metrics and trends

📈 Chart Generate visualizations

📄 Narrate Assemble a finished report

Unlike a dashboard that answers yesterday’s questions, an agent answers any question you think of right now — constructing new queries for each one. The same pipeline produces a cross-system customer analysis, a quarterly executive summary, or a supply chain risk report.

Demo: Quarterly Executive Summary

LIVE DEMO

I’ll ask the XYZ Corp chatbot to prepare a quarterly executive summary. The agent will:

Query 6 systems (CRMs, Workday, Zendesk, finance)
Compute KPIs: revenue by division, QoQ growth, headcount efficiency, customer concentration
Generate charts and narrative

Then I’ll iterate on the same data:

Draft 1: Comprehensive data summary (flat, everything included)
Draft 2: Traffic-light format (red/yellow/green, flag >10% off plan)
Draft 3: One-page executive brief (lead with risk, end with actions)

Same data, same agent — the prompt is the variable. Three different formats from three different instructions, demonstrating the iteration principle from Session 2 at enterprise scale.

The Agent Loop

📋 System Prompt Schema descriptions, business rules, data gotchas

🔧 Tool Call LLM writes SQL and requests execution

⚡ Execute System runs SQL, returns results

✅ Test & Review LLM checks results and self-corrects

The test-and-review step is what separates agents from dashboards. A dashboard runs a pre-written query — if it’s wrong, it’s wrong forever. An agent reviews its results and self-corrects: “Row count seems low — let me check the WHERE clause.”

The System Prompt

Institutional knowledge encoded as text

Available tables: - salesforce_opportunities: Industrial division deals - legacy_orders: Energy division (dates stored as MM/DD/YYYY text) - hubspot_deals: Safety division - workday_employees: HR data (headcount, department, hire date)

Business rules: - “Average deal size” means closed-won deals only - Customer names differ across systems — use fuzzy matching - Fiscal year starts February 1 - legacy_orders dates are text strings, not DATE type

The system prompt is what turns a generic LLM into your organization’s analyst. Without it, the agent makes the same mistakes a new hire would — averaging across all deals instead of closed-won, parsing date strings incorrectly, counting the same customer three times.

Enterprise Error Modes

Currency & Unit Errors

Revenue in USD in one system, EUR in another
Agent sums without converting
Reported total is meaningless
No error thrown — the numbers just add up wrong

Fiscal vs. Calendar Year

Finance system uses fiscal year (Feb–Jan)
CRM uses calendar year (Jan–Dec)
Agent joins on “2025” — misaligns by one month
Q4 revenue attributed to wrong period

Intercompany Elimination

Division A sells to Division B
Agent counts the transaction as external revenue
Consolidated revenue overstated
Standard accounting rule the AI doesn’t know

These are uniquely enterprise problems — they arise only when AI queries across multiple systems. The pattern is the same as Session 1: confident, well-formatted, wrong. The system prompt is the defense.

Deployment Architecture

Where Does the LLM Run?

Cloud API

Easiest setup, best model quality
Data leaves your network
Pay per token
Zero-retention agreements available
Morgan Stanley: GPT-4 to 16,000 advisors with zero data retention for SEC/FINRA compliance

On-Premise

Full data control — nothing leaves the building
Open-source models (Llama, Mistral)
$50–200K GPU infrastructure
Quality gap remains but narrowing
Developer tooling gap is larger than model gap

Hybrid

Route by sensitivity level
Public data → cloud API (best quality)
Sensitive data → on-premise model
Becoming the mainstream enterprise approach
Most sophisticated path

Build vs. Buy

Build Internally

Maximum customization, full control
Requires engineering team
In our experience: prototype ~50 lines, production ~3,000 lines
The gap: security, logging, error handling, access control, monitoring
6–12 month development cycle for production

Buy or Extend

Faster time to value
Vendor handles infrastructure
Less customization, vendor lock-in risk
Options: AI-native startups, incumbent platforms (Salesforce Agentforce), extend existing tools
Evaluate with a framework (data integration, user experience, security, cost, extensibility)

Governance

Financial Liability

SOX compliance for public companies
Audit trails for every AI-generated number
Who signs off on AI-produced analysis?
Board deck accuracy requirements

Privacy

PII combination across systems
GDPR right-to-erasure conflicts
Data residency requirements
Cross-border data transfers

Access Control

Can the agent see data the user shouldn’t?
Role-based filtering at the query level
Prompt injection risks
Audit who asked what

Operational Risk

Inconsistent answers to same question
Model version changes break workflows
Audit logs and rollback procedures
Human review gates for high-stakes output

Building AI Systems

Custom Chatbots

Architecture

System prompt defines the chatbot’s behavior, domain knowledge, constraints
Every user message sent to LLM along with system prompt + conversation history
Web app hosted on corporate intranet
Single API key → logging, cost tracking, access control

The System Prompt Is Everything

Role: “You are XYZ Corp’s financial analyst”
Knowledge: schema descriptions, business rules
Constraints: “Never reveal raw salary data”
Format: “Present financial data in tables with % change columns”
The system prompt turns a generic LLM into a specialized tool

A custom chatbot is the simplest AI system to deploy. The XYZ Corp chatbot is essentially: system prompt (schema + rules) + Claude API + a web frontend + DuckDB for data. The intelligence comes from the system prompt and the model’s reasoning.

Custom Agents

What Makes It an Agent

Has tools — databases, Python, APIs, file systems
Decides which tools to use based on the question
Chains multiple tool calls before responding
Different path for every question — not a script

The Claude Agent SDK

Define tools as JSON Schema: name, description, input parameters
Write a system prompt with domain expertise and strategy
The agent loop: send message → LLM requests tool calls → execute tools → feed results back → repeat until done
~200 lines of application logic
Complexity lives in the agent’s reasoning, not your code

Example: Portfolio Analyst Agent

Five tools, one agent, hundreds of different analyses

get_holdings — retrieve portfolio positions with tax lot detail
get_target_allocation — sector weight targets
get_analyst_recommendations — Strong Buy stocks by sector
run_sql — execute SQL against a live price database
run_python — run Python for analytics (correlations, returns, statistics)

“Review the portfolio”

Calls get_holdings
Calls run_sql for current prices
Calls run_python for market values
Calls get_target_allocation
Compares actual vs. target weights

“Harvest INTC losses and find a replacement”

Identifies INTC lots with largest losses
Calculates realized loss amount
Checks if selling moves Tech below target
Gets Strong Buy candidates for Technology
Fetches prices for INTC + candidates
Computes return correlations
7 tool calls, none predetermined

AI+Code Lab

What It Is

Shared Ubuntu VM at ai-lab.rice-business.org
Each student gets an isolated workspace
Left pane: file browser
Right pane: terminal with Claude Code pre-launched
Pre-provisioned API keys and skills
Built for the Rice executive education program

What Students Build

Exercise 1: 50-line data agent querying one database (~30 min)
Exercise 2: Extend to query two systems and merge results (~30 min)
Exercise 3: Wrap in a web interface (~45 min)
Exercise 4: Red-team for security vulnerabilities (~30 min)
All built from English instructions — Claude Code writes the code

Students build a working data agent from scratch using Claude Code — from a 50-line prototype to a multi-system agent with a web interface. The same progression that organizations follow: prototype → extend → deploy → secure.

Docker: From Prototype to Production

Without Docker

“It works on my machine”
Install Python, pip, duckdb, anthropic…
Manage conflicting library versions
Secrets scattered across config files

With Docker

Identical environment everywhere
One command to build, one to run
Dependencies frozen in the image
Secrets injected at runtime, never baked in

A Docker container packages your agent, its dependencies, and its configuration into a single portable unit. The container provides an isolation boundary — agent code executes inside the container, not on the host. In production, companies deploy via orchestrators like Kubernetes or AWS ECS for scaling, restarts, and secret management.

The Adoption Roadmap

🧪 Sandbox Prove value with a prototype on safe data

📋 Audit Add logging, access control, verification protocols

🚀 Deploy Production rollout with governance in place

Pre-Work (60–90 Days)

Catalog enterprise systems and data sources
Document schemas, business rules, data gotchas
Secure API access and credentials
Classify data sensitivity (PII, confidential, public)
Identify 2–3 questions nobody can answer easily today

Signs You Should Wait

No clear use case that justifies the investment
Data quality issues that must be fixed first
No executive sponsor with budget authority
Regulatory uncertainty in your industry
IT infrastructure not ready for API integration

What This Means for Us

This is the world your graduates are entering. The skills they need: directing agents, verifying output, designing governance, and defending conclusions. Whether it’s a student checking a DCF model, a faculty member critiquing a paper draft, or a corporate team deploying an enterprise agent — the core skill is the same.

Session 1

What Students Can Do Now

AI agents as analytical tools
Silent errors as the real risk
The checker’s toolkit

Session 2

How We Can Use AI

Claude Code and Desktop
Skills, MCP, Canvas
Research workflows
Assessment reform

Session 3

Corporate Implementation

Enterprise data agents
Custom chatbots and agents
Deployment and governance