Navigating the Challenges of Policy as Code in Azure

TL;DR

The text is about the my experience with Policy as Code in Azure, which is a way of using policies to create and enforce rules for resources. I try to discusses some of the challenges and scenarios that arise after deploying policies, such as deny policies, audit policies, and deploy if not exists policies. Pointing out some of the pitfalls and difficulties of working with these policies, such as exemptions, permissions, remediation tasks, and error messages. I try raises some critical questions about the post-deployment management of policies, such as who is authorized, responsible, and accountable for creating exemptions, initiating remediation, and monitoring policies.

Intro

Over the past year, I’ve been deeply immersed in Azure policies, transitioning from Infrastructure as Code (IaC), specifically Terraform, to Policy as Code. It’s been a challenging journey, but the landscape has significantly improved. In this post, I aim to share my insights and reflections on working with Policy as Code.

Why Policy as Code?

You might ask, “Isn’t IaC sufficient?” Indeed, IaC is fantastic for its intended use cases. However, Azure policies offer broader reach and more robust enforcement mechanisms. For instance, while you can set resource tags using Terraform and require everyone to use them, enforcing this requirement can be tricky. Enter Policy as Code: you can establish a policy at a higher level and enforce it. This capability is particularly useful.

Moreover, it appears that Microsoft is steering towards this direction. Both AMBA and Azure Landing Zones extensively utilize policies to create resources and enforce various guardrails. This approach is a key differentiator among various archetypes.

The Post-Deployment Conundrum

There’s ample documentation on working with policies up to the deployment step. I have linked two of the main ones here. However, what happens post-deployment? That’s the question I want to explore and highlight.

Let’s consider a few scenarios in an enterprise-scale environment using Azure Landing Zones from the Cloud Adoption Framework (CAF). When using the recommendation from Microsoft a lot of different policies are deployed and this introduces some new questions we need to discuses and find good ways to solve.

Deny Policy: Imagine you’re a developer wanting to develop your application. You have a landing zone and start deploying your resources, only to be halted by a deny policy. This interruption can bring development to a standstill. The solution? An exemption from the policy. But this raises two critical questions:

  • Who is authorized to create the exemption?
  • Who assumes the risk if there’s a security issue?

Audit Policy: This scenario can be a mixed bag for developers. They have the freedom to do as they please, with some policies offering advice. But who is responsible for adhering to these guidelines? Who gets notified when some audit policies are non-compliant, and what value do they bring if no one takes responsibility to enforce them or investigate their non-compliance?

Deploy If Not Exists (DINE): This policy can be quite a challenge. Often, it simply doesn’t work. One potential pitfall is that the managed identity assigned to enforce the DINE policy needs sufficient permissions. Even with adequate permissions, it often fails, necessitating manual remediation tasks. Unfortunately, these tasks don’t always work, requiring manual problem-solving or figuring out why it didn’t work. The error messages are often unhelpful, and working with it can be slow. A remediation task can easily take 20 minutes before the result is ready, and triggering a policy scan to update the status for policies is also time-consuming.

Navigating Post-Deployment Challenges

What happens when policies fail to work as expected? Who initiates the remediation process? Who monitors the policies and assumes responsibility for the outcomes? These are critical questions that need addressing, and managing them effectively is crucial.

When policies fail, it’s often unclear who should start the remediation. Should it be the developers who are directly affected, or the architects who designed the policies? This ambiguity can lead to delays and inefficiencies.

Monitoring policies is another challenge. It’s not just about tracking their status, but also understanding their impact on the system. Who should be responsible for this? The responsibility could lie with the team that deployed the policy, or perhaps a dedicated monitoring team.

These questions highlight the need for clear roles and responsibilities in managing Azure policies. However, finding the right balance is not straightforward. I have pondered these issues extensively but have yet to find a definitive solution.

I welcome insights and discussions on this topic. If you have any advice, tools, or ideas on how to address these challenges post-deployment, please share them.

Questions

In this post, I want to share some questions that I have been thinking about regarding Azure Policy, but I have not found satisfactory answers yet. I hope to spark some discussion and get some insights from others who have experience with Policy as Code.

  • Who is accountable for the compliance status of a resource?
  • Who is responsible for ensuring that DINE policies are functioning properly?
  • Who is authorized to grant exemptions to resources from policies?
  • Who is liable for the potential risks of each exemption?
  • Is there a single team that manages the entire platform, or do resource owners have different roles and responsibilities for their scopes?
  • Who is in charge of initiating remediation tasks when policies fail?
  • Who is capable of troubleshooting and resolving issues when remediation tasks don’t work?

Ett svar til «Navigating the Challenges of Policy as Code in Azure»

  1. […] I have written about some challenges around policy as code. You can find the previous blog here. There I discuss some problems I see with policy as code in general. Now I’m going to delve a […]

    Liker

Legg igjen en kommentar

Blogg på WordPress.com.