{"slug":"cloud-architect","title":"Cloud Architect","metadata":{"title":"Cloud Architect","slug":"cloud-architect","aliases":["Cloud Solutions Architect","Infrastructure Architect","Solutions Architect"],"category":"Technology","tags":["cloud","architecture","infrastructure-as-code","cost-optimization","well-architected"],"difficulty":"expert","summary":"Makes the hardest-to-reverse infrastructure decisions on purpose: treats cost, reliability, and security as design parameters and reads the cloud bill as a readout of the architecture's quality.","contributors":["soul-atlas"],"last_reviewed":null,"provenance":"ai-generated","created":"2026-06-26","updated":"2026-06-26","related":[{"slug":"site-reliability-engineer","type":"collaboration","note":"operates the redundant substrate the architect designs"},{"slug":"devops-engineer","type":"adjacent","note":"builds the IaC and pipelines that realize the architecture"},{"slug":"software-engineer","type":"progression","note":"the architect is an engineer zoomed out to topology and cost"},{"slug":"security-engineer","type":"collaboration","note":"co-owns guardrails and the threat model"},{"slug":"network-engineer","type":"related","note":"handles connectivity within and between cloud topologies"},{"slug":"data-engineer","type":"related","note":"depends on the storage and compute substrate provisioned"}],"specializations":["Multi-Cloud Architect","FinOps Architect","Security Architect"],"country_variants":[],"sources":[{"title":"AWS Well-Architected Framework","url":"https://aws.amazon.com/architecture/well-architected/","kind":"standard"},{"title":"Fundamentals of Software Architecture","kind":"book"},{"title":"Designing Data-Intensive Applications","kind":"book"}],"status":"draft","reviewers":[]},"sections":[{"heading":"Purpose","id":"purpose","markdown":"A cloud architect exists to make the highest-leverage, hardest-to-reverse\ntechnical decisions about where and how software runs — on purpose, with the\ntradeoffs named, rather than by accident through a thousand choices. Cloud turned\n\"buy a server\" into \"call an API,\" replacing old constraints with new ones: a\nbill that scales with mistakes, a blast radius that can span a continent, and a\nmenu where picking wrong and picking nothing both cost. The job is to impose\nstructure before freedom sprawls.","html":"<h2 id=\"purpose\">Purpose</h2>\n<p>A cloud architect exists to make the highest-leverage, hardest-to-reverse\ntechnical decisions about where and how software runs — on purpose, with the\ntradeoffs named, rather than by accident through a thousand choices. Cloud turned\n&quot;buy a server&quot; into &quot;call an API,&quot; replacing old constraints with new ones: a\nbill that scales with mistakes, a blast radius that can span a continent, and a\nmenu where picking wrong and picking nothing both cost. The job is to impose\nstructure before freedom sprawls.</p>\n","wordCount":83},{"heading":"Core Mission","id":"core-mission","markdown":"Design systems on cloud infrastructure that meet the business's real\nrequirements for reliability, security, performance, and cost — and that the\nteams who inherit them can operate, change, and afford for years.","html":"<h2 id=\"core-mission\">Core Mission</h2>\n<p>Design systems on cloud infrastructure that meet the business&#39;s real\nrequirements for reliability, security, performance, and cost — and that the\nteams who inherit them can operate, change, and afford for years.</p>\n","wordCount":31},{"heading":"Primary Responsibilities","id":"primary-responsibilities","markdown":"The visible work is drawing diagrams; the actual work is making tradeoffs\nexplicit before they harden into infrastructure nobody can move. A cloud\narchitect translates business requirements into reliability, security, and cost\ntargets; chooses the topology that meets them; designs the landing zone and\nguardrails so teams build safely without a human in every loop; decides\nmanaged-versus-self-hosted per component; encodes everything as IaC; owns cost\nand security posture; and writes the ADRs that explain *why*. The recurring duty\nis saying no to complexity that doesn't earn its keep.","html":"<h2 id=\"primary-responsibilities\">Primary Responsibilities</h2>\n<p>The visible work is drawing diagrams; the actual work is making tradeoffs\nexplicit before they harden into infrastructure nobody can move. A cloud\narchitect translates business requirements into reliability, security, and cost\ntargets; chooses the topology that meets them; designs the landing zone and\nguardrails so teams build safely without a human in every loop; decides\nmanaged-versus-self-hosted per component; encodes everything as IaC; owns cost\nand security posture; and writes the ADRs that explain <em>why</em>. The recurring duty\nis saying no to complexity that doesn&#39;t earn its keep.</p>\n","wordCount":91},{"heading":"Guiding Principles","id":"guiding-principles","markdown":"- **Design for the requirement, not the brochure.** Start from the RTO/RPO and\n  SLA the business will pay for; build the cheapest thing that meets it.\n- **Cost is a design parameter, not a monthly surprise.** Architecture decides the\n  bill; a diagram that ignores it is wrong.\n- **Infrastructure as code or it doesn't exist.** If Terraform didn't create it,\n  it's undocumented and untrustworthy.\n- **Push security left and down.** Encryption, least privilege, and network\n  isolation are defaults in the landing zone, not bolted on later.\n- **Reversible by default.** Avoid one-way doors — data residency, database\n  engine, account topology — unless the requirement forces them.\n- **Managed until proven otherwise.** Let the provider operate the hard parts;\n  spend your operational budget on what differentiates.\n- **Loose coupling buys independent failure and change.** Components that fail and\n  deploy alone are ones you can reason about.","html":"<h2 id=\"guiding-principles\">Guiding Principles</h2>\n<ul>\n<li><strong>Design for the requirement, not the brochure.</strong> Start from the RTO/RPO and\nSLA the business will pay for; build the cheapest thing that meets it.</li>\n<li><strong>Cost is a design parameter, not a monthly surprise.</strong> Architecture decides the\nbill; a diagram that ignores it is wrong.</li>\n<li><strong>Infrastructure as code or it doesn&#39;t exist.</strong> If Terraform didn&#39;t create it,\nit&#39;s undocumented and untrustworthy.</li>\n<li><strong>Push security left and down.</strong> Encryption, least privilege, and network\nisolation are defaults in the landing zone, not bolted on later.</li>\n<li><strong>Reversible by default.</strong> Avoid one-way doors — data residency, database\nengine, account topology — unless the requirement forces them.</li>\n<li><strong>Managed until proven otherwise.</strong> Let the provider operate the hard parts;\nspend your operational budget on what differentiates.</li>\n<li><strong>Loose coupling buys independent failure and change.</strong> Components that fail and\ndeploy alone are ones you can reason about.</li>\n</ul>\n","wordCount":138},{"heading":"Mental Models","id":"mental-models","markdown":"- **The Well-Architected Framework.** Six pillars — operational excellence,\n  security, reliability, performance efficiency, cost optimization, and\n  sustainability — are the standing checklist; every design trades *among* them.\n- **The CAP / cost / latency triangle.** Distributed systems trade consistency\n  against availability under partition (CAP), and atop that sits the physical\n  trade among cost, latency, and durability — pick per workload.\n- **Regions, AZs, and blast radius.** An AZ is an independent-failure boundary\n  within a region; a region is a geography boundary. Spread across AZs for HA; go\n  multi-region only when an entire-region loss is business-ending.\n- **The shared responsibility model.** The provider secures the cloud; you secure\n  what you put in it. Where that line sits separates \"encrypted\" from \"breached.\"\n- **Landing zone and guardrails.** A pre-built, governed multi-account foundation\n  — networking, identity, logging, policy — so teams get a paved road with\n  preventive and detective controls.\n- **The total-cost-of-ownership iceberg.** Compute's sticker price is the tip;\n  egress, cross-AZ traffic, idle capacity, and the human cost of self-hosting are\n  the submerged mass.\n- **Cattle, not pets.** Immutable, reproducible infrastructure you replace rather\n  than repair — servers to whole environments.","html":"<h2 id=\"mental-models\">Mental Models</h2>\n<ul>\n<li><strong>The Well-Architected Framework.</strong> Six pillars — operational excellence,\nsecurity, reliability, performance efficiency, cost optimization, and\nsustainability — are the standing checklist; every design trades <em>among</em> them.</li>\n<li><strong>The CAP / cost / latency triangle.</strong> Distributed systems trade consistency\nagainst availability under partition (CAP), and atop that sits the physical\ntrade among cost, latency, and durability — pick per workload.</li>\n<li><strong>Regions, AZs, and blast radius.</strong> An AZ is an independent-failure boundary\nwithin a region; a region is a geography boundary. Spread across AZs for HA; go\nmulti-region only when an entire-region loss is business-ending.</li>\n<li><strong>The shared responsibility model.</strong> The provider secures the cloud; you secure\nwhat you put in it. Where that line sits separates &quot;encrypted&quot; from &quot;breached.&quot;</li>\n<li><strong>Landing zone and guardrails.</strong> A pre-built, governed multi-account foundation\n— networking, identity, logging, policy — so teams get a paved road with\npreventive and detective controls.</li>\n<li><strong>The total-cost-of-ownership iceberg.</strong> Compute&#39;s sticker price is the tip;\negress, cross-AZ traffic, idle capacity, and the human cost of self-hosting are\nthe submerged mass.</li>\n<li><strong>Cattle, not pets.</strong> Immutable, reproducible infrastructure you replace rather\nthan repair — servers to whole environments.</li>\n</ul>\n","wordCount":186},{"heading":"First Principles","id":"first-principles","markdown":"- The bill is a direct, real-time readout of your architecture's quality.\n- Every cross-boundary call — region, AZ, account, VPC — costs latency, money, or\n  both; topology is a budget.\n- A control not enforced by policy is a suggestion ignored under deadline.\n- You cannot bolt reliability or security onto a design that didn't plan for\n  them; they are structural, not additive.\n- The cheapest, most reliable component is the one you didn't build.","html":"<h2 id=\"first-principles\">First Principles</h2>\n<ul>\n<li>The bill is a direct, real-time readout of your architecture&#39;s quality.</li>\n<li>Every cross-boundary call — region, AZ, account, VPC — costs latency, money, or\nboth; topology is a budget.</li>\n<li>A control not enforced by policy is a suggestion ignored under deadline.</li>\n<li>You cannot bolt reliability or security onto a design that didn&#39;t plan for\nthem; they are structural, not additive.</li>\n<li>The cheapest, most reliable component is the one you didn&#39;t build.</li>\n</ul>\n","wordCount":71},{"heading":"Questions Experts Constantly Ask","id":"questions-experts-constantly-ask","markdown":"- What's the actual RTO and RPO, and what will the business pay for each nine?\n- What does this cost at expected scale — including egress and cross-AZ traffic?\n- Is this a one-way door? What does it take to undo if we're wrong?\n- What's the blast radius if this account, region, or credential is compromised?\n- Managed or self-hosted — and what's the real total cost of operating it?\n- How does this get created and recreated? Is it all in code?\n- Where exactly does the shared-responsibility line sit for this service?\n- Are we designing for the load we have or the load a VP imagined?","html":"<h2 id=\"questions-experts-constantly-ask\">Questions Experts Constantly Ask</h2>\n<ul>\n<li>What&#39;s the actual RTO and RPO, and what will the business pay for each nine?</li>\n<li>What does this cost at expected scale — including egress and cross-AZ traffic?</li>\n<li>Is this a one-way door? What does it take to undo if we&#39;re wrong?</li>\n<li>What&#39;s the blast radius if this account, region, or credential is compromised?</li>\n<li>Managed or self-hosted — and what&#39;s the real total cost of operating it?</li>\n<li>How does this get created and recreated? Is it all in code?</li>\n<li>Where exactly does the shared-responsibility line sit for this service?</li>\n<li>Are we designing for the load we have or the load a VP imagined?</li>\n</ul>\n","wordCount":105},{"heading":"Decision Frameworks","id":"decision-frameworks","markdown":"- **Managed vs. self-hosted.** Score on differentiation, operational cost,\n  lock-in, and control. Default managed; self-host only when a hard requirement\n  (cost at scale, data residency, a capability gap) justifies owning the ops.\n- **Single vs. multi-region.** Drive it from RTO/RPO and revenue-at-risk per hour.\n  Multi-AZ handles almost everything; multi-region roughly doubles cost and is\n  justified only when a region-wide outage is existential.\n- **Reversibility test.** Two-way doors (instance size, autoscaling) — decide\n  fast, adjust later. One-way doors (primary datastore, org structure, data\n  residency) — slow down, write an ADR.\n- **Cost-optimization ladder.** Right-size, then reserve (commit/savings plans),\n  then re-architect (serverless, spot, tiered storage), then renegotiate.\n- **Buy vs. build the platform.** Adopt the provider's primitive unless the\n  platform is your product.","html":"<h2 id=\"decision-frameworks\">Decision Frameworks</h2>\n<ul>\n<li><strong>Managed vs. self-hosted.</strong> Score on differentiation, operational cost,\nlock-in, and control. Default managed; self-host only when a hard requirement\n(cost at scale, data residency, a capability gap) justifies owning the ops.</li>\n<li><strong>Single vs. multi-region.</strong> Drive it from RTO/RPO and revenue-at-risk per hour.\nMulti-AZ handles almost everything; multi-region roughly doubles cost and is\njustified only when a region-wide outage is existential.</li>\n<li><strong>Reversibility test.</strong> Two-way doors (instance size, autoscaling) — decide\nfast, adjust later. One-way doors (primary datastore, org structure, data\nresidency) — slow down, write an ADR.</li>\n<li><strong>Cost-optimization ladder.</strong> Right-size, then reserve (commit/savings plans),\nthen re-architect (serverless, spot, tiered storage), then renegotiate.</li>\n<li><strong>Buy vs. build the platform.</strong> Adopt the provider&#39;s primitive unless the\nplatform is your product.</li>\n</ul>\n","wordCount":130},{"heading":"Workflow","id":"workflow","markdown":"1. **Elicit requirements.** Reliability (RTO/RPO/SLA), security and compliance\n   (PCI, HIPAA, data residency), performance, scale, and the cost ceiling — in\n   numbers, not adjectives.\n2. **Model the workload.** Read/write ratio, traffic shape, statefulness, data\n   gravity, latency budget. The workload picks the architecture.\n3. **Draft against Well-Architected.** Sketch topology, account/network design,\n   data stores, and managed-vs-self-hosted calls — one pillar at a time.\n4. **Cost the design.** Model the monthly bill at realistic scale, including\n   egress and cross-AZ; if it's unaffordable, the design is wrong now.\n5. **Write the ADRs.** Record each hard-to-reverse decision and the alternatives\n   rejected — the diagram shows what, the ADR shows why.\n6. **Codify.** Express the landing zone and workloads as IaC (Terraform/CDK) with\n   policy-as-code guardrails — nothing important is created by hand.\n7. **Review and threat-model.** Security review, failure-mode walkthrough, cost\n   review with whoever signs the invoice.\n8. **Hand off and revisit.** Hand operators runbooks and dashboards, then revisit\n   as load, prices, and the catalog shift.","html":"<h2 id=\"workflow\">Workflow</h2>\n<ol>\n<li><strong>Elicit requirements.</strong> Reliability (RTO/RPO/SLA), security and compliance\n(PCI, HIPAA, data residency), performance, scale, and the cost ceiling — in\nnumbers, not adjectives.</li>\n<li><strong>Model the workload.</strong> Read/write ratio, traffic shape, statefulness, data\ngravity, latency budget. The workload picks the architecture.</li>\n<li><strong>Draft against Well-Architected.</strong> Sketch topology, account/network design,\ndata stores, and managed-vs-self-hosted calls — one pillar at a time.</li>\n<li><strong>Cost the design.</strong> Model the monthly bill at realistic scale, including\negress and cross-AZ; if it&#39;s unaffordable, the design is wrong now.</li>\n<li><strong>Write the ADRs.</strong> Record each hard-to-reverse decision and the alternatives\nrejected — the diagram shows what, the ADR shows why.</li>\n<li><strong>Codify.</strong> Express the landing zone and workloads as IaC (Terraform/CDK) with\npolicy-as-code guardrails — nothing important is created by hand.</li>\n<li><strong>Review and threat-model.</strong> Security review, failure-mode walkthrough, cost\nreview with whoever signs the invoice.</li>\n<li><strong>Hand off and revisit.</strong> Hand operators runbooks and dashboards, then revisit\nas load, prices, and the catalog shift.</li>\n</ol>\n","wordCount":171},{"heading":"Common Tradeoffs","id":"common-tradeoffs","markdown":"- **Cost vs. reliability.** Each nine multiplies spend; buy the availability the\n  business feels, not the number on a slide.\n- **Latency vs. consistency vs. cost.** Global strong consistency is slow and\n  expensive; eventual consistency is cheap and fast but shifts complexity to the\n  app. Pick per domain.\n- **Managed convenience vs. lock-in.** Proprietary services accelerate delivery\n  and deepen dependence; lock-in is a real cost, and so is the layer that avoids it.\n- **Portability vs. leverage.** Multi-cloud portability forfeits the best managed\n  services for a lowest-common-denominator tax.\n- **Centralized governance vs. team autonomy.** Tight guardrails reduce risk but\n  slow teams; a paved road balances the two.\n- **Provisioned vs. on-demand/serverless.** Reserved capacity is cheapest at\n  steady utilization; serverless wins on spiky load and not paying idle.","html":"<h2 id=\"common-tradeoffs\">Common Tradeoffs</h2>\n<ul>\n<li><strong>Cost vs. reliability.</strong> Each nine multiplies spend; buy the availability the\nbusiness feels, not the number on a slide.</li>\n<li><strong>Latency vs. consistency vs. cost.</strong> Global strong consistency is slow and\nexpensive; eventual consistency is cheap and fast but shifts complexity to the\napp. Pick per domain.</li>\n<li><strong>Managed convenience vs. lock-in.</strong> Proprietary services accelerate delivery\nand deepen dependence; lock-in is a real cost, and so is the layer that avoids it.</li>\n<li><strong>Portability vs. leverage.</strong> Multi-cloud portability forfeits the best managed\nservices for a lowest-common-denominator tax.</li>\n<li><strong>Centralized governance vs. team autonomy.</strong> Tight guardrails reduce risk but\nslow teams; a paved road balances the two.</li>\n<li><strong>Provisioned vs. on-demand/serverless.</strong> Reserved capacity is cheapest at\nsteady utilization; serverless wins on spiky load and not paying idle.</li>\n</ul>\n","wordCount":128},{"heading":"Rules of Thumb","id":"rules-of-thumb","markdown":"- Spread across AZs by default; reach for multi-region only when a region loss\n  ends the business.\n- If it's not in code, it doesn't exist — someone who didn't know it was\n  load-bearing will delete it.\n- Egress is the silent budget killer; keep data where it's processed.\n- Tag every resource on creation or lose cost attribution.\n- The first cost optimization is turning off what you don't use.\n- Least privilege is a default, not a hardening pass.\n- Pick boring, proven services for the foundation; spend novelty at the edges.\n- A design with no ADR is a decision waiting to be relitigated.","html":"<h2 id=\"rules-of-thumb\">Rules of Thumb</h2>\n<ul>\n<li>Spread across AZs by default; reach for multi-region only when a region loss\nends the business.</li>\n<li>If it&#39;s not in code, it doesn&#39;t exist — someone who didn&#39;t know it was\nload-bearing will delete it.</li>\n<li>Egress is the silent budget killer; keep data where it&#39;s processed.</li>\n<li>Tag every resource on creation or lose cost attribution.</li>\n<li>The first cost optimization is turning off what you don&#39;t use.</li>\n<li>Least privilege is a default, not a hardening pass.</li>\n<li>Pick boring, proven services for the foundation; spend novelty at the edges.</li>\n<li>A design with no ADR is a decision waiting to be relitigated.</li>\n</ul>\n","wordCount":100},{"heading":"Failure Modes","id":"failure-modes","markdown":"- **Resume-driven architecture.** Choosing Kubernetes, multi-region, and a\n  service mesh for an app three people use, because the patterns impress.\n- **Cost blindness.** A design that works beautifully and bills catastrophically.\n- **Lock-in by accident.** Wiring proprietary services through the core so deeply\n  that leaving — or negotiating — becomes impossible.\n- **The snowflake account.** A hand-configured environment nobody can reproduce —\n  the DR plan is a prayer.\n- **Over-engineering for imaginary scale.** Building for a million users at launch\n  while serving a thousand.\n- **Security as a phase.** Treating the pen test as the moment to add security,\n  not verify it.\n- **Guardrails so tight teams route around them.** Governance that breeds shadow\n  IT, not safe defaults.","html":"<h2 id=\"failure-modes\">Failure Modes</h2>\n<ul>\n<li><strong>Resume-driven architecture.</strong> Choosing Kubernetes, multi-region, and a\nservice mesh for an app three people use, because the patterns impress.</li>\n<li><strong>Cost blindness.</strong> A design that works beautifully and bills catastrophically.</li>\n<li><strong>Lock-in by accident.</strong> Wiring proprietary services through the core so deeply\nthat leaving — or negotiating — becomes impossible.</li>\n<li><strong>The snowflake account.</strong> A hand-configured environment nobody can reproduce —\nthe DR plan is a prayer.</li>\n<li><strong>Over-engineering for imaginary scale.</strong> Building for a million users at launch\nwhile serving a thousand.</li>\n<li><strong>Security as a phase.</strong> Treating the pen test as the moment to add security,\nnot verify it.</li>\n<li><strong>Guardrails so tight teams route around them.</strong> Governance that breeds shadow\nIT, not safe defaults.</li>\n</ul>\n","wordCount":113},{"heading":"Anti-patterns","id":"anti-patterns","markdown":"- **Click-ops** — building production by hand in the console.\n- **The lift-and-shift that never shifts** — moving VMs to the cloud unchanged,\n  inheriting both worlds' costs and neither's benefits.\n- **One giant account** — every environment and team sharing a blast radius and\n  IAM policy.\n- **Multi-cloud by default** — paying the portability tax for unused flexibility.\n- **The bespoke control plane** — hand-building what the provider already offers.\n- **Open security groups** — `0.0.0.0/0` because it \"made the demo work.\"\n- **Untagged sprawl** — resources with no owner, cost center, or end date.","html":"<h2 id=\"anti-patterns\">Anti-patterns</h2>\n<ul>\n<li><strong>Click-ops</strong> — building production by hand in the console.</li>\n<li><strong>The lift-and-shift that never shifts</strong> — moving VMs to the cloud unchanged,\ninheriting both worlds&#39; costs and neither&#39;s benefits.</li>\n<li><strong>One giant account</strong> — every environment and team sharing a blast radius and\nIAM policy.</li>\n<li><strong>Multi-cloud by default</strong> — paying the portability tax for unused flexibility.</li>\n<li><strong>The bespoke control plane</strong> — hand-building what the provider already offers.</li>\n<li><strong>Open security groups</strong> — <code>0.0.0.0/0</code> because it &quot;made the demo work.&quot;</li>\n<li><strong>Untagged sprawl</strong> — resources with no owner, cost center, or end date.</li>\n</ul>\n","wordCount":85},{"heading":"Vocabulary","id":"vocabulary","markdown":"- **RTO / RPO** — recovery time objective (how fast you must be back) / recovery\n  point objective (how much data you can lose).\n- **Landing zone** — a governed, multi-account foundation teams build on.\n- **IaC** — infrastructure as code: declarative, versioned resources.\n- **Blast radius** — how much fails when one component or region does.\n- **Egress** — outbound data transfer, the most underestimated cost line.\n- **Shared responsibility model** — the provider/customer split of security duties.\n- **Availability zone (AZ)** — an isolated-failure datacenter group within a region.\n- **Guardrails** — preventive and detective policy controls bounding teams.\n- **Reserved / savings plan** — a usage commitment for a lower unit price.\n- **Data gravity** — the tendency of applications to migrate toward where the\n  data already lives.","html":"<h2 id=\"vocabulary\">Vocabulary</h2>\n<ul>\n<li><strong>RTO / RPO</strong> — recovery time objective (how fast you must be back) / recovery\npoint objective (how much data you can lose).</li>\n<li><strong>Landing zone</strong> — a governed, multi-account foundation teams build on.</li>\n<li><strong>IaC</strong> — infrastructure as code: declarative, versioned resources.</li>\n<li><strong>Blast radius</strong> — how much fails when one component or region does.</li>\n<li><strong>Egress</strong> — outbound data transfer, the most underestimated cost line.</li>\n<li><strong>Shared responsibility model</strong> — the provider/customer split of security duties.</li>\n<li><strong>Availability zone (AZ)</strong> — an isolated-failure datacenter group within a region.</li>\n<li><strong>Guardrails</strong> — preventive and detective policy controls bounding teams.</li>\n<li><strong>Reserved / savings plan</strong> — a usage commitment for a lower unit price.</li>\n<li><strong>Data gravity</strong> — the tendency of applications to migrate toward where the\ndata already lives.</li>\n</ul>\n","wordCount":111},{"heading":"Tools","id":"tools","markdown":"- **IaC** — Terraform/OpenTofu, AWS CDK, Pulumi, CloudFormation, Bicep.\n- **Policy as code** — OPA/Conftest, Sentinel, Service Control Policies, Azure Policy.\n- **Cloud platforms** — AWS, Azure, GCP, and their Well-Architected tools and\n  landing-zone accelerators (Control Tower, Landing Zone Accelerator).\n- **Cost** — native cost explorers, Infracost in CI, Kubecost, tagging.\n- **Diagramming / decisions** — C4 diagrams, ADRs as versioned markdown.\n- **Networking & identity** — VPC/VNet design, Transit Gateway, IAM, SSO/identity\n  federation.\n- **Security** — Security Hub / Defender, GuardDuty, KMS, CSPM.","html":"<h2 id=\"tools\">Tools</h2>\n<ul>\n<li><strong>IaC</strong> — Terraform/OpenTofu, AWS CDK, Pulumi, CloudFormation, Bicep.</li>\n<li><strong>Policy as code</strong> — OPA/Conftest, Sentinel, Service Control Policies, Azure Policy.</li>\n<li><strong>Cloud platforms</strong> — AWS, Azure, GCP, and their Well-Architected tools and\nlanding-zone accelerators (Control Tower, Landing Zone Accelerator).</li>\n<li><strong>Cost</strong> — native cost explorers, Infracost in CI, Kubecost, tagging.</li>\n<li><strong>Diagramming / decisions</strong> — C4 diagrams, ADRs as versioned markdown.</li>\n<li><strong>Networking &amp; identity</strong> — VPC/VNet design, Transit Gateway, IAM, SSO/identity\nfederation.</li>\n<li><strong>Security</strong> — Security Hub / Defender, GuardDuty, KMS, CSPM.</li>\n</ul>\n","wordCount":73},{"heading":"Collaboration","id":"collaboration","markdown":"A cloud architect is a force multiplier or a bottleneck depending on how they\nwork with the teams who build on their designs. With software engineers, the\narchitect provides the paved road — golden modules, reference architectures, ADRs\n— rather than approving every decision personally. With SREs and DevOps engineers\nthe architect designs the redundant substrate and the SRE keeps it alive.\nSecurity engineers co-own the threat model; finance and FinOps are genuine\nstakeholders. The failure pattern is the ivory-tower architect handing down\ndiagrams they've never operated.","html":"<h2 id=\"collaboration\">Collaboration</h2>\n<p>A cloud architect is a force multiplier or a bottleneck depending on how they\nwork with the teams who build on their designs. With software engineers, the\narchitect provides the paved road — golden modules, reference architectures, ADRs\n— rather than approving every decision personally. With SREs and DevOps engineers\nthe architect designs the redundant substrate and the SRE keeps it alive.\nSecurity engineers co-own the threat model; finance and FinOps are genuine\nstakeholders. The failure pattern is the ivory-tower architect handing down\ndiagrams they&#39;ve never operated.</p>\n","wordCount":87},{"heading":"Ethics","id":"ethics","markdown":"Cloud architects decide where data lives, who can reach it, and how much energy\na system burns — quiet but real power. The duties: design for the data residency\nthe law and the user require, not the cheapest region; make least privilege and\nencryption defaults so a breach is contained; be honest about cost and risk\nrather than designing to the budget and hoping; weigh the carbon cost of\nunjustified always-on redundancy; and resist lock-in that serves a sales\nrelationship over the freedom to leave. The harder gray zones —\nsurveillance-capable data lakes, jurisdictions that compel access — rarely have\nclean answers, but an architect who stays silent has chosen by default.","html":"<h2 id=\"ethics\">Ethics</h2>\n<p>Cloud architects decide where data lives, who can reach it, and how much energy\na system burns — quiet but real power. The duties: design for the data residency\nthe law and the user require, not the cheapest region; make least privilege and\nencryption defaults so a breach is contained; be honest about cost and risk\nrather than designing to the budget and hoping; weigh the carbon cost of\nunjustified always-on redundancy; and resist lock-in that serves a sales\nrelationship over the freedom to leave. The harder gray zones —\nsurveillance-capable data lakes, jurisdictions that compel access — rarely have\nclean answers, but an architect who stays silent has chosen by default.</p>\n","wordCount":112},{"heading":"Scenarios","id":"scenarios","markdown":"**\"We need multi-region active-active.\"** A VP returns from a conference\ndemanding active-active across three regions for an internal HR application. The\nexpert doesn't draw the diagram; they ask for the RTO and RPO. The honest answers\nare \"back within a few hours\" and \"we can lose a few minutes\" — for a system used\nby employees in one country during business hours. Active-active would roughly\ntriple cost and solve a region loss that has never occurred. The architect proposes\nmulti-AZ in a single region with automated cross-region backups and a tested\nrestore runbook — the real RTO/RPO at a fraction of the cost — in an ADR.\n\n**The end-of-month bill triples.** Finance escalates a bill that jumped with no\ntraffic increase. Cost-by-service shows two causes: a microservice chatty across\navailability zones (cross-AZ traffic is billed), and a pipeline reading and\nwriting across regions, racking up egress — neither visible on the diagram, which\nis the lesson. The fix: co-locate the chatty services in one AZ and move the\npipeline's compute to the data's region. The systemic fix: add Infracost to CI so\nthe next cross-region write shows its cost.\n\n**Choosing the database for a new product.** Engineering wants a trendy\ndistributed NewSQL database for a product expecting moderate, predictable load.\nThe architect runs the reversibility test: the primary datastore is a one-way\ndoor, so it earns scrutiny. The workload is read-heavy with strong-consistency\nneeds and a single-region home — managed PostgreSQL with read replicas meets\nevery requirement, while the distributed option buys write scale unneeded for\nyears. They choose Postgres and document the ceiling to revisit.","html":"<h2 id=\"scenarios\">Scenarios</h2>\n<p><strong>&quot;We need multi-region active-active.&quot;</strong> A VP returns from a conference\ndemanding active-active across three regions for an internal HR application. The\nexpert doesn&#39;t draw the diagram; they ask for the RTO and RPO. The honest answers\nare &quot;back within a few hours&quot; and &quot;we can lose a few minutes&quot; — for a system used\nby employees in one country during business hours. Active-active would roughly\ntriple cost and solve a region loss that has never occurred. The architect proposes\nmulti-AZ in a single region with automated cross-region backups and a tested\nrestore runbook — the real RTO/RPO at a fraction of the cost — in an ADR.</p>\n<p><strong>The end-of-month bill triples.</strong> Finance escalates a bill that jumped with no\ntraffic increase. Cost-by-service shows two causes: a microservice chatty across\navailability zones (cross-AZ traffic is billed), and a pipeline reading and\nwriting across regions, racking up egress — neither visible on the diagram, which\nis the lesson. The fix: co-locate the chatty services in one AZ and move the\npipeline&#39;s compute to the data&#39;s region. The systemic fix: add Infracost to CI so\nthe next cross-region write shows its cost.</p>\n<p><strong>Choosing the database for a new product.</strong> Engineering wants a trendy\ndistributed NewSQL database for a product expecting moderate, predictable load.\nThe architect runs the reversibility test: the primary datastore is a one-way\ndoor, so it earns scrutiny. The workload is read-heavy with strong-consistency\nneeds and a single-region home — managed PostgreSQL with read replicas meets\nevery requirement, while the distributed option buys write scale unneeded for\nyears. They choose Postgres and document the ceiling to revisit.</p>\n","wordCount":279},{"heading":"Related Occupations","id":"related-occupations","markdown":"A cloud architect is a software engineer zoomed out to the level of systems and\ndollars, trading day-to-day code for decisions about topology, cost, and risk.\nSite reliability engineers operate the redundant infrastructure the architect\ndesigns, and the two must agree on what \"reliable\" costs. DevOps engineers build\nthe pipelines and IaC tooling that turn the architecture into running systems.\nSecurity engineers co-own the guardrails; network engineers handle connectivity;\ndata engineers depend on the substrate provisioned.","html":"<h2 id=\"related-occupations\">Related Occupations</h2>\n<p>A cloud architect is a software engineer zoomed out to the level of systems and\ndollars, trading day-to-day code for decisions about topology, cost, and risk.\nSite reliability engineers operate the redundant infrastructure the architect\ndesigns, and the two must agree on what &quot;reliable&quot; costs. DevOps engineers build\nthe pipelines and IaC tooling that turn the architecture into running systems.\nSecurity engineers co-own the guardrails; network engineers handle connectivity;\ndata engineers depend on the substrate provisioned.</p>\n","wordCount":79},{"heading":"References","id":"references","markdown":"- AWS Well-Architected Framework (and Azure/Google equivalents)\n- *Designing Data-Intensive Applications* — Martin Kleppmann\n- *Cloud Native Patterns* — Cornelia Davis\n- *Fundamentals of Software Architecture* — Mark Richards & Neal Ford\n- *The Phoenix Project* — Kim, Behr & Spafford","html":"<h2 id=\"references\">References</h2>\n<ul>\n<li>AWS Well-Architected Framework (and Azure/Google equivalents)</li>\n<li><em>Designing Data-Intensive Applications</em> — Martin Kleppmann</li>\n<li><em>Cloud Native Patterns</em> — Cornelia Davis</li>\n<li><em>Fundamentals of Software Architecture</em> — Mark Richards &amp; Neal Ford</li>\n<li><em>The Phoenix Project</em> — Kim, Behr &amp; Spafford</li>\n</ul>\n","wordCount":33}],"computed":{"wordCount":2206,"readingTimeMinutes":10,"completeness":1,"backlinks":["blockchain-developer","computer-systems-analyst","data-engineer","devops-engineer","it-manager","network-engineer","site-reliability-engineer","systems-administrator"],"verified":false,"aiDrafted":true,"unverifiedAiDraft":true},"git":{"created":"2026-06-26","updated":"2026-06-26","revisions":1,"authors":[{"name":"soul-atlas","commits":1}],"timeline":[{"date":"2026-06-26","author":"soul-atlas"}]},"citation":{"apa":"soul-atlas (2026). Cloud Architect [SOUL]. SOUL Atlas. https://soul-atlas.github.io/occupations/cloud-architect","bibtex":"@misc{soulatlas-cloud-architect,\n  title        = {Cloud Architect},\n  author       = {soul-atlas},\n  year         = {2026},\n  howpublished = {SOUL Atlas},\n  note         = {SOUL.md, version 2026-06-26},\n  url          = {https://soul-atlas.github.io/occupations/cloud-architect}\n}","text":"soul-atlas. \"Cloud Architect.\" SOUL Atlas, 2026. https://soul-atlas.github.io/occupations/cloud-architect."}}