IT AI Skill
Platform Engineering
Design and maintain internal developer platforms (IDP) including golden paths, self-service infrastructure, developer portals, and platform reliability. Use when building developer platforms, creating golden paths, implementing self-service infrastructure,...
Platform Engineering
Design and maintain internal developer platforms (IDP) including golden paths, self-service infrastructure, and developer portals.
Workflow
1. Platform Architecture
INTERNAL DEVELOPER PLATFORM (IDP)
═══════════════════════════════════════
PLATFORM LAYERS:
═══════════════════════════════════════
Layer 1: Developer Portal (Backstage)
→ Service catalog (all applications registered)
→ Template library (scaffolding for new services)
→ Documentation hub
→ API references
→ Support requests
Layer 2: Golden Paths (Opinionated Workflows)
→ Create new service (1-click from template)
→ Deploy to environment (auto CI/CD + infra)
→ Add database (provision + connect)
→ Add message queue (provision + connect)
→ Add monitoring (auto-instrument + dashboards)
Layer 2b: Supported Paths (Custom Workflows)
→ Custom CI/CD pipelines
→ Custom infrastructure (Terraform)
→ Non-standard deployments
Layer 3: Shared Infrastructure (Managed by Platform Team)
→ Kubernetes clusters (EKS/GKE)
→ Databases (managed PostgreSQL, Redis)
→ Message queues (SQS, Kafka)
→ CI/CD runners
→ Monitoring stack (Prometheus, Grafana)
→ Secret management (Vault)
→ Service mesh (Istio)
Layer 4: Cloud Provider (AWS/Azure/GCP)
→ Networking (VPC, subnets, load balancers)
→ Compute (EC2, ECS, Lambda, EKS)
→ Storage (S3, EBS, RDS)
→ Security (IAM, KMS, security groups)
DEVELOPER PORTAL (Backstage):
═══════════════════════════════════════
Features:
→ Service catalog: All applications with metadata
→ Software templates: Scaffolding for new services
→ Plugin ecosystem: Custom plugins for organization needs
→ Documentation: Embedded docs, runbooks, ADRs
→ API catalog: Internal API discovery
→ Support: Request infrastructure, report issues
2. Golden Paths & Templates
GOLDEN PATH: Create New Microservice
═══════════════════════════════════════
Step 1: Developer selects template in Backstage
→ Options: API service, Worker service, Web app, CLI tool
→ Language: Python, Node.js, Go, Java
Step 2: Fill in metadata
→ Service name, owner team, description
→ Environment (dev/staging/prod)
→ Resource requirements (CPU, memory)
→ Dependencies (database, cache, queue)
Step 3: Platform provisions automatically
→ Creates Git repository (from template)
→ Provisions Kubernetes namespace
→ Sets up CI/CD pipeline (GitHub Actions)
→ Creates Prometheus alerts
→ Provisions database (if selected)
→ Creates Grafana dashboard
→ Sets up ArgoCD application
→ Generates documentation
Step 4: Developer starts coding
→ Clone repository
→ Run locally (Docker Compose)
→ Push to branch → CI runs tests
→ Merge to main → Deploy to dev
→ Promote to staging → QA
→ Promote to production → Live
Time to first deployment: < 15 minutes (from zero)
Previous time: 2-3 days (manual setup)
TEMPLATE LIBRARY:
═══════════════════════════════════════
Template Tech Stack Includes Usage
────────────────────────────────────────────────────────────────────────────
Python API FastAPI + SQLAlc CI/CD, K8s, DB, Monitor 45
Node.js API Express + Prisma CI/CD, K8s, DB, Monitor 30
Go Service Gin + gRPC CI/CD, K8s, Monitor 20
Worker Service Celery/RQ + Redis CI/CD, K8s, Queue 15
Web App Next.js + Prisma CI/CD, Vercel, DB 25
CLI Tool Click/Go CI/CD, Release 5
3. Self-Service Infrastructure
SELF-SERVICE CATALOG
═══════════════════════════════════════
Available Resources:
═══════════════════════════════════════
Resource Provisioning Time Approval Cost Visibility
──────────────────────────────────────────────────────────────────────
Database (PostgreSQL) <5 minutes Auto (dev) $150/mo base
Redis Cache <2 minutes Auto (dev) $50/mo base
Message Queue (SQS) <1 minute Auto Pay-per-use
K8s Namespace <1 minute Auto Shared cost
Cloud Storage (S3) <1 minute Auto Pay-per-use
Load Balancer <2 minutes Auto (dev) $25/mo
DNS Record <1 minute Auto $0
SSL Certificate <5 minutes Auto $0
Secret (Vault) <1 minute Auto $0
APPROVAL FLOW:
═══════════════════════════════════════
Dev environment: Auto-approve (no gatekeeping)
Staging: Auto-approve (with cost notification)
Production: Team lead approval + cost review
Guardrails:
→ Max instance size per environment
→ Allowed regions only
→ Required tags enforced
→ Budget limits per team
→ Auto-shutdown for dev resources
4. Platform Reliability & SLIs
PLATFORM RELIABILITY METRICS
═══════════════════════════════════════
Platform SLIs:
═══════════════════════════════════════
SLO Target Current Measurement
──────────────────────────────────────────────────────────────
Self-service success 99% 97% Provisioning success rate
Provisioning latency <5 min 3 min P95 time to provision
CI/CD pipeline success 98% 96% Pipeline pass rate
Platform API uptime 99.9% 99.95% API availability
Template freshness <30 days 15 days Days since last update
ADOPTION METRICS:
═══════════════════════════════════════
Golden path adoption: 68% of new services
Supported path: 25% of new services
Golden path (legacy): 7% (migrated services)
Template usage: 140 services created from templates
Portal active users: 85% of engineering
DEVELOPER SATISFICATION:
═══════════════════════════════════════
Survey (Q4 2024, n=120):
→ Overall satisfaction: 4.1/5.0
→ Time-to-deploy: 4.5/5.0 (improved from 2.8)
→ Documentation quality: 3.8/5.0
→ Platform reliability: 4.2/5.0
→ Support responsiveness: 4.0/5.0
Top requested features:
1. More language templates (Rust, Ruby)
2. One-click staging → prod promotion
3. Cost visibility per service
4. Environment cloning
5. Database migration tooling
5. Platform Roadmap
PLATFORM ROADMAP — 2025
═══════════════════════════════════════
Q1 2025: Developer Experience
═══════════════════════════════════════
→ Rust and Ruby templates
→ Local development environment (Dev Containers)
→ Improved documentation (searchable)
→ Cost visibility per service (dashboard)
→ Environment cloning feature
Q2 2025: Automation
═══════════════════════════════════════
→ One-click staging → production promotion
→ Automated canary deployments
→ Database migration framework
→ Automated security scanning in golden path
→ Service dependency graph
Q3 2025: Scale
═══════════════════════════════════════
→ Multi-cluster support
→ Multi-region deployment
→ Advanced autoscaling
→ Disaster recovery automation
→ Cost optimization recommendations
Q4 2025: Intelligence
═══════════════════════════════════════
→ AI-assisted template selection
→ Anomaly detection for services
→ Predictive scaling
→ Automated performance tuning
→ Developer productivity analytics
Edge Cases
- Large teams: 1,000+ developers need scalable platform
- Legacy migration: Golden path adoption for existing services
- Multi-cloud: Platform abstraction across cloud providers
- Security: Platform security review, vulnerability management
- Cost management: Platform cost allocation to teams
Integration Points
- IDP tools: Backstage, Humanitec, Port
- Cloud: AWS, Azure, GCP
- Kubernetes: EKS, GKE, AKS
- CI/CD: GitHub Actions, GitLab CI, Jenkins
- IaC: Terraform, Pulumi, Crossplane
- Monitoring: Prometheus, Grafana, Datadog
Output
Platform Status
PLATFORM STATUS — Q4 2024
═══════════════════════════════════════
Services on platform: 180 (68% golden path)
Template usage: 140 services created
Portal active users: 85% of engineering
Platform reliability: 99.95% uptime
Developer satisfaction: 4.1/5.0
Q1 roadmap: Rust/Ruby templates, cost visibility, env cloning