Cloud Provisioning

This guide covers provisioning cloud infrastructure for ubTrace on AWS using OpenTofu (or Terraform). It is intended for platform engineers who need to set up a production-grade environment with managed databases, caches, search, and compute resources.

Note

This guide covers infrastructure provisioning only. For deploying the ubTrace application itself, see Installation (Docker Compose) or use the Helm chart for Kubernetes deployments.

Overview

The infra/terraform/ directory contains modular OpenTofu/Terraform configurations that provision all cloud resources ubTrace needs on AWS:

Module

Resources

Purpose

security

KMS key, CloudTrail, IAM roles, SSM secrets

Encryption, audit logging, identity

network

VPC, subnets, NAT, security groups

Network foundation (3-tier: public/private/data)

database

2× RDS PostgreSQL 16

App database + Keycloak database

cache

ElastiCache Redis 7

Session cache with TLS

search

OpenSearch (Elasticsearch-compatible)

Full-text search and analytics

compute

EKS cluster or EC2 instance

Container orchestration or VM

storage

EFS (shared) or EBS (single-node)

Persistent storage for the data pipeline

loadbalancer

ALB, ACM certificate

HTTPS load balancing with path routing

Two deployment modes are supported:

  • EKS mode — managed Kubernetes cluster for enterprise deployments. Use the ubTrace Helm chart to deploy the application workloads.

  • EC2 mode — single VM with Docker Compose for simpler or air-gapped environments. The offline bundle can be loaded directly onto the instance.

Prerequisites

  • OpenTofu >= 1.5.7 (or Terraform >= 1.5.7)

  • AWS CLI configured with credentials that have permission to create VPCs, RDS instances, ElastiCache clusters, EKS clusters, and IAM roles

  • An S3 bucket and DynamoDB table for remote state (see Bootstrap State Backend)

Quick Start

cd infra/terraform

# 1. Bootstrap the remote state backend (one-time)
./scripts/bootstrap-state.sh my-ubtrace-state eu-central-1

# 2. Initialize
tofu init \
  -backend-config="bucket=my-ubtrace-state" \
  -backend-config="key=staging/terraform.tfstate" \
  -backend-config="region=eu-central-1" \
  -backend-config="dynamodb_table=ubtrace-terraform-locks" \
  -backend-config="encrypt=true"

# 3. Review changes
tofu plan -var-file=environments/staging.tfvars

# 4. Apply
tofu apply -var-file=environments/staging.tfvars

Important

Always review the plan output before applying. Infrastructure changes can be destructive — especially for stateful resources like databases and search indices.

Bootstrap State Backend

Before the first tofu init, create the S3 bucket and DynamoDB table that store Terraform state and provide locking:

./scripts/bootstrap-state.sh [bucket-name] [region] [dynamodb-table]

Defaults: bucket ubtrace-terraform-state, region eu-central-1, table ubtrace-terraform-locks.

The script creates:

  • An S3 bucket with versioning, KMS encryption, and public access blocked

  • A DynamoDB table (PAY_PER_REQUEST) for state locking

Deployment Modes

EKS (Kubernetes)

Set deployment_mode = "eks" in your tfvars file. This provisions:

  • A managed EKS cluster with a node group

  • EBS CSI driver and VPC CNI addons

  • EFS file system with access points for the pipeline folders

  • Kubernetes Secrets encrypted with a customer-managed KMS key

After tofu apply, configure kubectl and deploy the Helm chart:

# Configure kubectl
$(tofu output -raw kubeconfig_command)

# Deploy with Helm (example)
helm install ubtrace deploy/helm/ubtrace \
  --values deploy/helm/ubtrace/values-eks.yaml \
  --set postgresql.external.host=$(tofu output -raw app_db_endpoint | cut -d: -f1) \
  --set postgresql.external.password=$(aws ssm get-parameter --name "/ubtrace/${ENV}/db/app/password" --with-decryption --query Parameter.Value --output text) \
  --set redis.external.host=$(tofu output -raw redis_endpoint) \
  --set elasticsearch.external.host=$(tofu output -raw opensearch_endpoint)

EC2 (Docker Compose)

Set deployment_mode = "ec2" in your tfvars file. This provisions:

  • An EC2 instance (Ubuntu 24.04) with an IAM instance profile

  • An ALB with path-based routing to API, frontend, and Keycloak

  • EBS volumes for the pipeline folders

  • Target group attachments for the EC2 instance

After tofu apply, connect to the instance and deploy:

# Connect via SSM (no SSH required)
aws ssm start-session --target $(tofu output -raw ec2_instance_id)

# On the instance: load the offline bundle and start services
tar xzf ubtrace-offline-bundle-*.tar.gz
cd ubtrace-offline-bundle-*
./offline-load.sh ubtrace-images-*.tar.gz
make init
# Edit .env with the RDS/Redis/OpenSearch endpoints from Terraform outputs
make up

Environment Sizing

Three pre-configured environments are provided:

Environment

Database

Redis

Search

HA

Est. Cost

dev.tfvars

db.t3.micro

cache.t3.micro

t3.small.search

No

~$100/mo

staging.tfvars

db.t3.medium

cache.t3.small

t3.medium.search

No

~$300/mo

production.tfvars

db.r6g.large

cache.r6g.large

r6g.large.search

Yes (Multi-AZ)

~$1,500/mo

Create a custom tfvars file for your environment:

cp environments/staging.tfvars environments/myenv.tfvars
# Edit to match your requirements
tofu plan -var-file=environments/myenv.tfvars

Configuration Reference

General

Variable

Type

Default

Description

project_name

string

ubtrace

Prefix for all resource names

environment

string

(required)

Environment name: dev, staging, production

aws_region

string

eu-central-1

AWS region for all resources

deployment_mode

string

eks

eks (Kubernetes) or ec2 (Docker Compose)

Networking

Variable

Type

Default

Description

vpc_cidr

string

10.0.0.0/16

CIDR block for the VPC

availability_zones

list

["eu-central-1a", "eu-central-1b"]

Availability zones (minimum 2)

enable_nat_gateway

bool

true

NAT gateway for private subnets. Set false for air-gapped

domain_name

string

""

Domain for ALB and ACM certificate

certificate_arn

string

""

Existing ACM certificate ARN (skips creation)

Security & Compliance

Variable

Type

Default

Description

enable_cloudtrail

bool

true

CloudTrail API audit logging

enable_vpc_flow_logs

bool

true

VPC network flow logs

ssh_key_name

string

""

EC2 key pair name (EC2 mode only)

License

Variable

Type

Default

Description

license_key

string

""

ubTrace license key (stored in SSM as SecureString)

license_product_id

string

""

Cryptolens product ID

license_access_token

string

""

Cryptolens access token (stored in SSM as SecureString)

Note

License variables are optional. When provided, they are stored as SSM Parameter Store entries under /ubtrace/<environment>/license/. For offline licensing with .skm files, copy the file to the instance and set UBTRACE_LICENSE_FILE in the application environment instead.

Security

The infrastructure is provisioned with TISAX and ISO 27001 compliance in mind:

  • Encryption at rest — All data stores (RDS, ElastiCache, OpenSearch, EFS, EBS, S3) are encrypted with a customer-managed KMS key

  • Encryption in transit — TLS enforced on Redis, OpenSearch, and the ALB

  • Audit logging — CloudTrail records all API calls; VPC flow logs capture network traffic; ALB access logs record HTTP requests

  • Least privilege — IAM roles follow the principle of least privilege; EC2 instances use SSM instead of SSH (no open ports)

  • Secret management — Database passwords, Redis auth tokens, and license credentials are stored in SSM Parameter Store (KMS-encrypted)

  • Network isolation — Three-tier subnet design: public (ALB only), private (compute), data (databases). Security groups restrict traffic between tiers

  • EKS hardening — Kubernetes Secrets encrypted with customer KMS key; cluster manages its own security group; API audit logging enabled

  • Data protection — Stateful resources (RDS, OpenSearch, EFS) have prevent_destroy lifecycle guards to prevent accidental deletion

Outputs

After tofu apply, retrieve connection details for application configuration:

# Database endpoints
tofu output app_db_endpoint
tofu output keycloak_db_endpoint

# Cache
tofu output redis_endpoint
tofu output redis_port

# Search
tofu output opensearch_endpoint

# Compute (EKS)
tofu output kubeconfig_command
tofu output eks_cluster_name

# Compute (EC2)
tofu output ec2_instance_id
tofu output ec2_private_ip

# Load balancer
tofu output alb_dns_name

# Sensitive values (retrieve from SSM Parameter Store)
aws ssm get-parameter --name "/ubtrace/<env>/db/app/password" --with-decryption --query Parameter.Value --output text
aws ssm get-parameter --name "/ubtrace/<env>/db/keycloak/password" --with-decryption --query Parameter.Value --output text
aws ssm get-parameter --name "/ubtrace/<env>/redis/auth_token" --with-decryption --query Parameter.Value --output text

# License SSM parameter ARNs
tofu output license_ssm_arns

Day-2 Operations

Rotate Database Passwords

tofu taint 'module.security.random_password.db_app'
tofu taint 'module.security.random_password.db_keycloak'
tofu apply -var-file=environments/production.tfvars

Warning

After rotating passwords, restart the ubTrace application to pick up the new credentials from SSM Parameter Store.

Scale EKS Nodes

tofu apply -var-file=environments/production.tfvars -var="eks_node_count=5"

Upgrade Kubernetes Version

tofu apply -var-file=environments/production.tfvars -var="eks_version=1.32"

Important

Upgrade one minor version at a time. Review the EKS version calendar and test in staging first.

Troubleshooting

tofu init fails with “bucket does not exist”

Run ./scripts/bootstrap-state.sh to create the state backend first.

tofu plan shows replacement of a database

This typically means an engine version or parameter change that requires recreation. The prevent_destroy lifecycle guard will block the apply. Review the change carefully, take a manual snapshot, then remove the guard temporarily if the replacement is intentional.

EKS nodes fail to join the cluster

Check that the node IAM role has the required policies (AmazonEKSWorkerNodePolicy, AmazonEKS_CNI_Policy, AmazonEC2ContainerRegistryReadOnly). These are provisioned automatically by the compute module.

OpenSearch returns 403 errors

The OpenSearch domain is VPC-internal with a permissive resource policy for HTTP operations. Ensure the application is running in the private subnets and the app security group allows outbound traffic on port 443.

ALB returns 503 for all requests (EC2 mode)

Check that the target group health checks are passing. The API health check endpoint is /api/v1/health/liveness. Verify the application is running and listening on the expected ports (API: 3000, Frontend: 3000, Keycloak: 8080).