Deployment Guide – Cloud Run & Agent Engine
Deployment Guide – Cloud Run & Agent Engine
Complete guide for deploying the GCP Billing Agent web app (frontend + backend on Cloud Run) and the reasoning agents running on Vertex AI Agent Engine.
Architecture Snapshot
┌─────────────────────┐ ┌─────────────────────┐
│ React Frontend │ HTTPS │ FastAPI Backend │
│ Cloud Run (nginx) ├──────────▶ │ Cloud Run (Python) │
└──────────┬──────────┘ └──────────┬──────────┘
│ │
│ │ Vertex AI API
│ ▼
│ ┌─────────────────────┐
└────────────────────────▶│ Vertex AI Agent │
│ Engine (agents) │
└─────────────────────┘
Firestore stores users and query history. Cloud Build handles container builds.
Why Cloud Run?
- Serverless: Scales to zero and back based on traffic
- Managed security: HTTPS, IAM integration, optional domain restrictions
- Fast iteration: Deployment scripts (
web/deploy/deploy-web.sh) rebuild both services in minutes - Cost control: Pay only for requests processed and execution time
Cloud Storage + Cloud CDN remains an alternative for the frontend, but Cloud Run keeps the authentication flow and deployment tooling consistent end to end.
Agent Engine Deployment Overview
Vertex AI Agent Engine hosts the reasoning engines that power chat responses. Deploy agents with the ADK tooling, then let the backend auto-discover them at runtime.
# Example: deploy BigQuery agent with ADK
cd agents/bq_agent
adk deploy agent_engine \
--project "$PROJECT_ID" \
--location "$REGION" \
--display-name "Billing Insights"
# List deployed reasoning engines
gcloud ai reasoning-engines list \
--project="$PROJECT_ID" \
--region="$REGION"
Key points:
- Agents are defined in the
agents/directory (seeagents/bq_agent.mdfor configuration specifics). - No manual ID wiring is required—the backend scans Agent Engine at startup.
- If IAM restrictions block discovery, fall back to environment variables (
BQ_AGENT_REASONING_ENGINE_ID, etc.) using the deployment script.
Overview
This guide covers deploying both the backend (FastAPI) and frontend (React) to Cloud Run services with:
- ✅ Firestore Authentication - Custom JWT-based authentication with domain restrictions
- ✅ HTTPS - Automatic SSL/TLS encryption
- ✅ Auto-scaling - Scales based on traffic
- ✅ Cost-effective - Pay only for what you use
- ✅ Auto-discovery - Agents automatically discovered from Vertex AI Agent Engine
Prerequisites
- GCP Project with billing enabled
- gcloud CLI installed and authenticated
- Required APIs enabled (automated in deployment scripts):
- Cloud Run API
- Cloud Build API
- IAM API
- Container Registry API (or Artifact Registry)
- Vertex AI API
- Firestore API
Quick Start
Automated Deployment (Recommended)
Deploy everything with one command:
make deploy-web-simple PROJECT_ID=your-project-id
Or using the deployment script directly:
cd web/deploy
export PROJECT_ID="your-project-id"
export REGION="us-central1"
./deploy-web.sh
This will:
- Enable required APIs
- Create service accounts with proper IAM roles
- Build and deploy backend to Cloud Run
- Build and deploy frontend to Cloud Run
- Configure Firestore authentication
- Auto-discover agents from Vertex AI Agent Engine
Manual Deployment (advanced / custom builds)
Automation is preferred, but you can deploy individual services when testing Docker changes or experimenting with environment variables.
Backend (FastAPI) manual steps
cd web/backend
# Build image with Cloud Build
gcloud builds submit --tag="gcr.io/$PROJECT_ID/agent-engine-api"
# Deploy to Cloud Run
gcloud run deploy agent-engine-api \
--image "gcr.io/$PROJECT_ID/agent-engine-api" \
--region "$REGION" \
--allow-unauthenticated \
--set-env-vars \
GCP_PROJECT_ID=$PROJECT_ID,\
LOCATION=$REGION
You normally do not need to set agent IDs—auto-discovery handles it. Optional environment variables like BQ_AGENT_REASONING_ENGINE_ID act only as a fallback.
Frontend (React + nginx) manual steps
cd web/frontend
npm install
npm run build
gcloud builds submit --tag="gcr.io/$PROJECT_ID/agent-engine-ui"
gcloud run deploy agent-engine-ui \
--image "gcr.io/$PROJECT_ID/agent-engine-ui" \
--region "$REGION" \
--allow-unauthenticated \
--set-env-vars \
VITE_API_URL="https://agent-engine-api-xxxxx-uc.a.run.app"
If you need to override documentation links, supply VITE_GITBOOK_URL during deployment or set GITBOOK_URL before running deploy-web.sh.
What Gets Created
- ✅ Service Accounts:
agent-engine-api-sa- Backend service accountagent-engine-ui-sa- Frontend service account
- ✅ Cloud Run Services:
agent-engine-api- FastAPI backendagent-engine-ui- React frontend
- ✅ IAM Permissions:
- Custom role with minimal required permissions
- BigQuery, Vertex AI, Firestore access
- ✅ Firestore Collections:
users- User accountsquery_history- Query history per user
Deployment Details
Step 1: Set Environment Variables
export PROJECT_ID="your-project-id"
export REGION="us-central1"
Step 2: Run Automated Deployment
cd web/deploy
./deploy-web.sh
The script will:
- Enable required APIs
- Create service accounts
- Set up IAM permissions
- Build Docker images (Cloud Build)
- Deploy backend service
- Deploy frontend service
- Configure environment variables
- Set up CORS
Step 3: Access the Application
After deployment, you’ll get URLs like:
- API:
https://agent-engine-api-xxxxx-uc.a.run.app - UI:
https://agent-engine-ui-xxxxx-uc.a.run.app
Configuration
Backend Configuration
The backend automatically:
- Discovers agents from Vertex AI Agent Engine (no manual configuration needed)
- Uses Firestore for user authentication and query history
- Reads environment variables for project configuration:
BQ_PROJECTorGCP_PROJECT_ID- GCP project IDLOCATION- Region (default:us-central1)JWT_SECRET_KEY- Secret key for JWT tokens (auto-generated)
Frontend Configuration
The frontend:
- Connects to backend via API URL (automatically configured during build)
- Uses JWT tokens for authentication
- Supports domain restrictions (currently
@asl.apps-eval.com)
Agent Discovery
Agents are automatically discovered from Vertex AI Agent Engine:
- Backend scans for all reasoning engines on startup
- Cache refreshes every 5 minutes
- Fallback to environment variables if API scan fails
- No manual agent configuration needed
Authentication
Sign Up
- Navigate to the UI URL
- Click “Sign Up”
- Enter email (must be from
@asl.apps-eval.com) - Enter password
- Account is created in Firestore
Sign In
- Navigate to the UI URL
- Click “Sign In”
- Enter email and password
- JWT token is stored in browser
- All API requests include the token
Domain Restrictions
Currently restricted to @asl.apps-eval.com emails. To change:
- Update
REQUIRED_DOMAINinweb/backend/auth.py - Update
REQUIRED_DOMAINinweb/frontend/src/Auth.jsx - Redeploy services
Post-Deployment Steps
1. Verify Services
# List Cloud Run services
gcloud run services list --region="$REGION" --project="$PROJECT_ID"
# Check service details
gcloud run services describe agent-engine-api \
--region="$REGION" \
--project="$PROJECT_ID"
2. Test Access
- Open the UI URL in a browser
- Sign up with an
@asl.apps-eval.comemail - Sign in
- Verify agents appear in the dropdown
- Test a query
3. Verify Agent Discovery
Check logs to confirm agents are discovered:
gcloud run services logs read agent-engine-api \
--region="$REGION" \
--project="$PROJECT_ID" \
--limit=50 | grep -i "agent"
Look for:
✓ Scanned Agent Engine: Found X reasoning engine(s)✓ Loaded X agent(s) from Agent Engine
4. Grant Access (Optional)
By default, services are publicly accessible. To restrict:
# Restrict to specific domain
gcloud run services add-iam-policy-binding agent-engine-ui \
--region="$REGION" \
--member="domain:your-domain.com" \
--role="roles/run.invoker" \
--project="$PROJECT_ID"
# Remove public access
gcloud run services remove-iam-policy-binding agent-engine-ui \
--region="$REGION" \
--member="allUsers" \
--role="roles/run.invoker" \
--project="$PROJECT_ID"
Troubleshooting
Services Not Accessible
Check IAM bindings:
gcloud run services get-iam-policy agent-engine-ui \
--region="$REGION" \
--project="$PROJECT_ID"
Check service account:
gcloud run services describe agent-engine-api \
--region="$REGION" \
--project="$PROJECT_ID" \
--format="value(spec.template.spec.serviceAccountName)"
Backend Errors
Check backend logs:
gcloud run services logs read agent-engine-api \
--region="$REGION" \
--project="$PROJECT_ID" \
--limit=50
Verify service account permissions:
gcloud projects get-iam-policy "$PROJECT_ID" \
--flatten="bindings[].members" \
--filter="bindings.members:agent-engine-api-sa@$PROJECT_ID.iam.gserviceaccount.com"
Agent Discovery Not Working
Check Vertex AI permissions:
The service account needs roles/aiplatform.user or custom role with:
aiplatform.reasoningEngines.listaiplatform.reasoningEngines.getaiplatform.reasoningEngines.query
Check logs for errors:
gcloud run services logs read agent-engine-api \
--region="$REGION" \
--project="$PROJECT_ID" \
--limit=50 | grep -i "agent\|reasoning"
Verify agents are deployed:
python3 scripts/list_agent_engines.py \
--project="$PROJECT_ID" \
--location="$REGION"
Frontend Can’t Connect to Backend
Check CORS configuration:
- Backend CORS should allow the frontend URL
- Verify
CORS_ALLOWED_ORIGINSincludes frontend URL
Verify API URL is embedded:
- The frontend JavaScript should contain the Cloud Run API URL
- Check browser console for errors
- Do a hard refresh (
Cmd+Shift+RorCtrl+Shift+R)
Check environment variables:
gcloud run services describe agent-engine-api \
--region="$REGION" \
--project="$PROJECT_ID" \
--format="value(spec.template.spec.containers[0].env)"
Authentication Issues
Check JWT secret key:
- Must be the same across deployments
- Auto-generated and stored in Cloud Run environment variables
Check Firestore permissions:
- Service account needs
roles/datastore.user - Verify Firestore API is enabled
FAQ
Can the frontend live on Agent Engine?
No. Vertex AI Agent Engine only hosts reasoning engines and exposes REST endpoints. Keep the UI on Cloud Run (or Cloud Storage + CDN) and call Agent Engine through the FastAPI backend.
Why split frontend and backend services?
Independent services let you redeploy without downtime, scale each tier separately, and keep resource usage efficient. A combined service is fine for prototypes but not required here.
How is access controlled without IAP?
The backend enforces Firestore sign-in with domain restrictions and JWT tokens. Cloud Run defaults to HTTPS and you can tighten roles/run.invoker to your corporate domain if you need extra control.
Do I need to manage agent IDs manually?
No. Auto-discovery queries Vertex AI Agent Engine on startup. Environment variables like BQ_AGENT_REASONING_ENGINE_ID are optional fallbacks for restricted environments.
How do I point the UI at updated documentation?
Set GITBOOK_URL (or VITE_GITBOOK_URL) before deploying the frontend. The Dockerfile reads that value and bakes links into the bundle.
Metrics Snapshot Pipeline (Cloud Run Job + Firestore)
The /metrics endpoint and UI dashboard now read from Firestore snapshots produced by a scheduled Cloud Run Job. This keeps expensive Git history analysis out of the request path and works inside Cloud Run’s read-only filesystem.
1. Create a GitHub token (one-time)
- In GitHub, visit Settings → Developer settings → Personal access tokens (classic).
- Generate a token with repo (read) scope only – this lets the collector query commit history.
- Store it in Secret Manager (replace the project ID):
echo "ghp_..." | gcloud secrets create github-metrics-token \
--replication-policy=automatic \
--data-file=- \
--project "$PROJECT_ID"
# Allow the collector job to read it (service account created in the next step)
gcloud secrets add-iam-policy-binding github-metrics-token \
--member="serviceAccount:metrics-collector-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor" \
--project "$PROJECT_ID"
2. Deploy the Cloud Run Job
Reuse the backend image but override the command to run metrics_job.py:
# 1) Create a dedicated service account (once)
gcloud iam service-accounts create metrics-collector-sa \
--display-name="Metrics Collector" \
--project "$PROJECT_ID"
# 2) Grant Firestore + Secret access
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:metrics-collector-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/datastore.user"
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:metrics-collector-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
# 3) Deploy the job (re-run after backend image updates)
# The image below reuses the backend Cloud Run service image
API_SERVICE="agent-engine-api"
gcloud run jobs deploy metrics-collector \
--image="gcr.io/$PROJECT_ID/$API_SERVICE:latest" \
--region="$REGION" \
--service-account="metrics-collector-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--command=python \
--args=metrics_job.py \
--set-env-vars="GCP_PROJECT_ID=$PROJECT_ID,LOCATION=$REGION" \
--set-secrets="GITHUB_TOKEN=github-metrics-token:latest" \
--project "$PROJECT_ID"
# (Optional) customise analysis windows
# Add --set-env-vars="METRICS_WINDOWS=7,30,90" to limit which snapshots are collected.
# Run once to seed Firestore
gcloud run jobs execute metrics-collector \
--region="$REGION" \
--project "$PROJECT_ID"
The deploy command also runs the job once (--execute-now) so the dashboard has initial data.
3. Schedule recurring snapshots
Use Cloud Scheduler to execute the job every two hours:
gcloud scheduler jobs create http metrics-collector-schedule \
--schedule="0 */2 * * *" \
--uri="https://$REGION-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/$PROJECT_ID/jobs/metrics-collector:run" \
--http-method=POST \
--oauth-service-account-email="metrics-collector-sa@$PROJECT_ID.iam.gserviceaccount.com" \
--oauth-token-scope="https://www.googleapis.com/auth/cloud-platform" \
--project "$PROJECT_ID"
4. Hook up the backend refresh endpoint
deploy-web.sh automatically injects METRICS_JOB_NAME if a job named metrics-collector exists in the deployment region. If you use a different job name, set the variable manually when deploying:
export METRICS_JOB_NAME="projects/$PROJECT_ID/locations/$REGION/jobs/custom-metrics-job"
make deploy-web-simple PROJECT_ID=$PROJECT_ID REGION=$REGION
The /metrics/refresh endpoint triggers the job on demand; the frontend now calls it before polling Firestore for the newest snapshot.
Cleanup
To remove all deployed resources:
# Delete Cloud Run services
gcloud run services delete agent-engine-api \
--region="$REGION" \
--project="$PROJECT_ID"
gcloud run services delete agent-engine-ui \
--region="$REGION" \
--project="$PROJECT_ID"
# Delete service accounts (optional)
gcloud iam service-accounts delete agent-engine-api-sa@$PROJECT_ID.iam.gserviceaccount.com
gcloud iam service-accounts delete agent-engine-ui-sa@$PROJECT_ID.iam.gserviceaccount.com
Next Steps
- Monitor: Set up Cloud Monitoring and Logging
- Scale: Adjust min/max instances based on traffic
- Customize: Update domain restrictions for your organization
- Deploy agents: Deploy your agents to Vertex AI Agent Engine
References
- architecture.md - Complete system architecture
- AUTOMATED_DEPLOYMENT.md - Automated deployment guide
- AUTHENTICATION_SETUP.md - Authentication details