Security Benchmarking Authorization Policy Engines: Rego, Cedar, OpenFGA & Teleport ACD

Intro

Back in 2024, Amazon Web Services (AWS) engaged Trail of Bits (ToB) to perform a comparative assessment between several authorization and access management policy languages. If you're unfamiliar with the concept of a policy engine, it's essentially a fully-featured engine that offloads authorization decisions in an application. Instead of implementing an authorization framework from scratch, it is possible to define policies written in specific languages that dictate which people or machines can run which actions on which resources.

The goals of the ToB research work were to identify broadly applicable threats to policy languages, to identify language features that partially or fully mitigate those threats, and to provide security recommendations to improve the general design of policy languages. In practice, this was an in-depth threat modeling exercise involving Cedar, Rego, and OpenFGA policy languages. The final deliverable of their research can be downloaded from ToB's Github repository.

Though many security engineering aspects have been covered in this analysis and it has served as the foundation for our work, it merely resulted in a static threat model. Given that we're evaluating custom programming languages and their execution environments, we thought that it would be interesting to explore the possibility of developing a dynamic evaluation framework.

Thanks to the support of Teleport, we developed an automated framework for evaluating the security of such authorization engines. Our Security Policy Evaluation Framework (SPEF) is a testing and benchmarking system designed to evaluate the robustness and correctness of various authorization policy engines. It provides a consistent, automated environment for executing policy test cases across multiple languages. This framework is primarily intended for researchers, security engineers, and policy developers who want to benchmark how different policy engines behave under predefined test conditions.

Before introducing SPEF, let's briefly review the policy frameworks currently integrated into our evaluation framework:

Rego is a high-level, declarative query language, created by Styra. It extends Datalog and is used to write policies for decision-making in applications. It is primarily used with the Open Policy Agent (OPA) and was designed to define access control, resource validation, and data filtering policies.

How Rego Works:

  • Write policies in .rego files.
  • Load them into OPA.
  • Query OPA via an HTTP API or embedded SDK with an input (JSON).
  • OPA evaluates the policy and returns a decision (e.g., allow = true).

Cedar is a declarative open-source authorization language and engine developed by AWS. It was designed to allow defining fine-grained, role-based access control (RBAC) and attribute-based access control (ABAC) policies in a safe, fast, and auditable way and evaluating those policies.

How Cedar Works:

  • Define schema (entities, actions, resources, attributes).
  • Write policies using the Cedar language.
  • Evaluate policies using the Cedar engine with a JSON input request.
  • Get a decision (e.g., allow or deny).

OpenFGA is an open-source fine-grained authorization system built for scalable relationship-based access control (ReBAC), inspired by Google's Zanzibar system. It includes a policy engine based on relationships (who is related to what and how) and was designed for high-scale, low-latency access control. It is maintained by OpenFGA (part of the CNCF sandbox as of 2023) and initially created by Auth0/Okta.

How OpenFGA Works:

  • Define your model (types, relations, permissions).
  • Store relationships (e.g., user X is a writer on document Y).
  • Query the API: "Can user X perform action Y on object Z?".
  • OpenFGA returns true or false based on the relationship graph.

Teleport ACD (Access Control Decision) is a component of the Teleport Platform, which is available as an open-source project, or an Enterprise option, created and maintained by Teleport. It moves the access-control decisions in Teleport from the agent to a centralized service representable as a gRPC API.

How Teleport ACD Works:

  • Teleport agents establish trust with an internal Auth service.
  • Agents then verify user-provided certificates, which encode a user's roles.
  • The agents then query the Teleport Access Control Decision API for access-control decisions (e.g., allow/deny) using Teleport's Predicate language. Access-control decisions are made by querying role specifications stored as YAML files, which define resources permissions and implement a default-deny strategy.

Key Features

RegoCedarOpenFGATeleport ACD
Declarative logic: You define what is allowed or denied, but not how to enforce itDeclarative logicDeclarative logicRelationship-based access control (ReBAC): Policies depend on user-object relationships (e.g., group membership, ownership)
Allows For Complexity: Queries can use and policies can generate structured data as output (i.e., not just true/false)Analyzable: Tool can verify the policies are what you expect (i.e., auditable)Authorization modeling: You define a type system and relationships, not rulesDecoupling: A clear separation between the Teleport platform's Policy Decision Point and Policy Enforcement Point
Open source: Go-basedSupport for session and entity attributes: has the ability to incorporate attributes of the principal, resource and a wide range of contextual informationScalable and performant: Optimized for large graphs of users and resourcesOpen source: Go-based
Safe by design: Type-safe, memory safe and data-race safeAudit-friendly: Clear mapping of who has access and why
Fast: Policies can be indexed for fast retrieval and allows for real-time evaluation with bounded latency.
Open source: Rust-based

Security Policy Evaluation Framework (SPEF) Architecture

The framework is built in a modular design that keeps orchestration, execution, evaluation, and result processing as separated steps. At a high level, it starts by loading available test cases and filtering them based on the arguments provided - such as specific test case IDs, ID ranges, or an explicit list.

Each selected test is executed inside a dedicated Docker container that runs the corresponding policy engine. For every test case and policy language, the evaluation takes place in an isolated environment specific to that engine. This approach not only avoids conflicts with the host environment but also makes it easier to support additional engines down the line, since they operate independently from one another.

Under the hood, the internal modules handle different stages of the process: some manage the orchestration of tests, others take care of interacting with policy engines, running evaluations, and collecting outputs. Each engine interaction is abstracted, so whether it communicates via HTTP, system shell calls, or custom logic, the test execution remains consistent. Each test case in the framework is designed to be evaluated independently for every supported policy language. When a test is defined, the expected outcome is defined as well - this is known as the expected_result. The framework uses this information to compare the actual result produced by the policy engine with what was expected. Based on this comparison, the test is marked as one of several possible outcomes: PASS, FAIL, TIMEOUT, ERROR, or NOT APPLICABLE. This approach allows defining a consistent validation, whether a query/assertion should succeed, trigger an error or timeout.

Defining Test Cases

Each test case in the framework is defined through a manifest file that describes both the scenario to evaluate and the specific inputs required by each policy engine. The manifest includes metadata like the test ID, a short scenario title, and a description of what the test aims to verify. For each supported engine, a corresponding block specifies the necessary elements for execution:

  • OpenFGA test cases require three main components to run a test: an authorization model (which defines types and relations), a query (including user, relation, and object), and a set of tuples that represent the current state of the authorization data. These elements work together to simulate a real-world decision scenario and allow the engine to check if access should be granted or denied based on the model's logic.

  • Cedar test cases need a set of entities to define the actors, resources, and relationships involved in a scenario. Each test includes a policy file written in Cedar's policy language, and a query that specifies the principal, the action, and the accessed resource.

  • Rego test cases evaluates test cases based on a combination of a policy file (written in the Rego language) and a corresponding query input. The engine then computes whether the query is allowed or denied, or leads to an error.

  • Teleport ACD (Access Control Decision) tests handles evaluations using a bootstrap configuration file which contains definitions of roles, users, and permissions in a format specific to Teleport. The framework uses this file to spin up a test instance of the engine and simulate an access request based on the defined setup. Each test is paired with an expected_result, which not only defines whether the outcome should be a success or an error, but may also include a logical assertion on the output itself.

To better understand how the framework works in practice, let's look at a concrete example that focuses on a specific security and performance scenario. In this case, we want to assess how each policy engine handles the use of regular expression built-ins, and whether they introduce any unexpected behavior or performance degradation during the test evaluation.

The test case is defined as follows:

id: testcase-16
scenario: User Induces a Denial of Service on the Policy Engine
description: Regular Expression Built-ins Do Not Introduce Security and Performance Risks
rego:
  - query: query.json
    policy: policy.rego
    reference: https://www.openpolicyagent.org/docs/latest/policy-performance/#in-memory-store-read-optimization
    expected_result:
      - status: success
      - condition: evaluation_time <= 0.05 # expecting a reasonable eval time

openfga:
  - authorization_model: authorization-model.json
    tuples: tuples.json
    query: query.json
    expected_result:
      - status: success
      - condition: evaluation_time <= 0.05 # expecting a reasonable eval time

cedar:
  - reference: https://cedarland.blog/design/why-no-regex/content.html <<Regular expressions and string formatting operators were intentionally omitted from the language because they work against these safety goals. This blog post describes why they can be dangerous and alternative approaches to writing policies without them.>>
    applicable: false

teleportacd:
  - type: evaluate-ssh-access
    login: mohamed
    username: mohamed
    server-id: f48b739e-7b6c-47e5-997e-f52e43273fae
    bootstrap: bootstrap_dangerous_regex.yaml
    expected_result:
      - status: success
      - condition: evaluation_time <= 0.05 # expecting a reasonable eval time

This test case can be executed on its own by using the --only flag and passing the ID:

$ python main.py --only 16

This will run the evaluation across all policy engines that support this scenario and output the final result to an HTML report:

?url=%2Fblog%2F_next%2Fstatic%2Fmedia%2Fspef-eval-report.099fc1cb

Using SPEF

To get started:

$ git clone https://github.com/gravitational/policy-languages-framework.git
$ cd policy-languages-framework
$ pip3 install -r requirements.txt

Note: Docker must be running on the system. Docker is required to execute policy evaluations within isolated containerized environments.

Then, start the framework using:

$ python3 main.py

The following arguments can be used to specify the test cases that will be executed:

--start: the test case id number to start from
--max: the last test case id to include in the run
--only: a comma-separated list of test case IDs to run (e.g., --only 03,07,12)

After execution, results are collected and summarized into an HTML report, with one row per test case and one column per policy engine. The report includes status labels such as PASS, FAIL, ERROR, TIMEOUT, and NOT APPLICABLE, based on the evaluation output and expected conditions.

Below is the result matrix showing how each engine performed when processing all currently implemented test cases:

Test Case IDRegoCedarOpenfgaTeleportacd
testcase-01
Policy Engine Must Enforce Deny Rules Even When Runtime Errors Occur
FAILFAIL*PASSPASS
testcase-02
Arithmetic Overflow/Underflow and Missing Entities Cause Validation Errors
PASSFAILN/APASS
testcase-03
Handling Undefined Values in Deny/Allow Rules Without Impacting Policy Decisions
FAILPASSPASSN/A
testcase-04
Negations on Undefined Values Does Not Cause Expected Denials
FAILPASSPASSPASS
testcase-05
Policy Must Produce Explicit Forbid/Allow
FAILPASS*PASS*PASS*
testcase-06
Built-in Functions Do Not Introduce Side-Effects or Non-Deterministic Behavior
FAILPASS*PASS*PASS*
testcase-07
Errors in Double Negation Detected By The Engine
PASSPASSPASSPASS
testcase-08
Trace/Print Constructors Allows To Log Values/Decisions
PASS*FAIL*FAIL*FAIL*
testcase-09
Policy Execution Engine is Resilient Against Denial-of-Service via Recursive or Expanding Inputs
FAILPASSPASSPASS
testcase-10
Set Operations Scale Predictably with Entity Store Size (1000 sets)
FAILPASSPASSPASS
testcase-11
Set Operations Scale Predictably with Entity Store Size (10.000 sets)
PASSPASSFAILPASS
testcase-12
Set Operations Scale Predictably with Entity Store Size (100.000 sets)
PASSPASSFAILPASS
testcase-13
Network Built-in Functions Produce Predictable and Consistent Runtime Results
FAILN/AN/AN/A
testcase-14
Policy Engine Write Performance Scales with Entity Store Size (1)
N/AN/APASSN/A
testcase-15
Policy Engine Write Performance Scales with Entity Store Size (2)
N/AN/APASSN/A
testcase-16
Regular Expression Built-ins Do Not Introduce Security and Performance Risks
FAILN/AFAILFAIL
testcase-17
Policy Execution Engine is Resilient Against Application-Induced Denial of Service
FAILPASSFAILFAIL
testcase-18
Policy Execution Must Prevent Built-in Functions from Interacting with Host System
FAIL*PASS*PASS*PASS*
testcase-19
Policy Enforcement Must Correctly Handle Alternative Encodings to Prevent Unauthorized Internal Access
PASS*PASSPASSPASS
testcase-20
No Unsafe Memory Operations in Dependencies
FAIL*PASS*PASS*PASS*
testcase-21
Use of Memory Unsafe Operations
FAIL*FAIL*FAIL*FAIL*
testcase-22
Built-in Functions Do Not Introduce Uncontrolled Side Effects
FAILPASS*PASS*PASS
testcase-23
Policies Cannot Access or Exfiltrate Cross-Application Context Data
FAIL*PASS*PASS*PASS*
testcase-24
Unauthorized Policy Overrides Do Not Alter Access Control Decisions
FAILPASSN/APASS
testcase-25
Policy Engine Must Correctly Handle Input Manipulation to Prevent Unauthorized Access via Encoding Abuse
PASSPASSPASSPASS
testcase-26
Unauthorized Changes to Dependency Modules Do Not Impact Policy Behavior
FAILN/APASS*N/A
testcase-27
Internal Threats Cannot Introduce Malicious Changes to Built-ins
FAIL*N/AN/AN/A

Note: * Predicted result

Results

Our early results show that Rego is expressive but error-prone, failing several tests due to runtime exceptions, non-determinism, and extensibility risks. Cedar is safe and deterministic, with strong validation and isolation, but it's less flexible for writing complex rules outside of typical access control. OpenFGA is simple and scalable for relationship-based models, but not suited for complex logic or validation-heavy use cases. TeleportACD performs reliably and enforces access decisions consistently, though it lacks fine-grained policy semantics and defers logic to external systems.

Wrapping up

Starting from prior state of the art work, we created a dynamic evaluation framework for testing different policy engines (Cedar, Rego, OpenFGA, Teleport ACD) using fully configurable queries and policies. Each of the purpose-built containers allows for verifying expected results or discovering issues for a given query/policy pair, in each of the policy frameworks.

We understood that the real-world usage of our Security Policy Evaluation Framework could vary, so we wanted to clarify that its intended purpose was creating a benchmark for each of the policy frameworks to monitor their evolution. Once benchmarks are created for each, subsequent versions of the policy frameworks could be tested to observe their progression and verify their consistency. While some people may want to use it as a comparison tool between the policy frameworks, we wouldn't encourage this. Additionally, we have not taken steps to optimize the policy engines or containers, so such comparisons could be flawed with respect to real-world usage.

This project was a collaboration between Teleport and Doyensec. The framework was created by the Doyensec team with inspiration and funding from Teleport.

Teleport
View Profile
Make An Enquiry

Prometheus data source update: Redefining our big ...

CloudBolt x StormForge: Why Automated FinOps Beats ...