In this day and age, establishing cryptographic trust and encryption between internal services is a must. Without this, attackers who gain access to your internal networks can easily impersonate services and intercept exchanged data. As time has gone on, the potential impact of compromise has only grown as machines are trusted with increasingly sensitive data and completing increasingly important tasks. It’s become obvious that the legacy technique of merely guarding your networks with a firewall isn’t enough.
TLS is the industry-standard protocol for establishing trust and encrypting communications between two parties. Typically, the server or service presents an X.509 certificate during the TLS handshake. This certificate encodes key information about the identity of the server and is signed by a certificate authority (CA). The client compares the signature of this X.509 certificate against a known list of trusted certificate authorities, and then verifies that the identity encoded within the certificate matches what it expects.
We can take this further using Mutual TLS (mTLS) where the client also presents a certificate. This allows trust to be established in both directions, with the server able to use the client’s identity in authorization decisions. Unlike bearer tokens, something normally used for client authentication, the nature of the asymmetric cryptography that underpins X.509 certificates and TLS means that the client’s identity can determined without revealing any secrets.
This mechanism relies on both parties trusting the certificate authority that has issued the other parties certificate. Operating systems ship with a default set of trusted certificate authorities. Whilst you could purchase certificates from one of the entities that operate these authorities, the cost of doing so for thousands of internal services would be significant. Additionally, you’re inferring a great deal of trust in these certificate authorities to remain uncompromised.
Instead, it makes sense to establish an internal CA for your infrastructure. All workloads within your infrastructure will then be configured to trust certificates issued by this internal CA. When creating and maintaining an internal CA, you have a number of choices. You must decide how the identity of workloads will be encoded into certificates, and how the CA will determine if the workload requesting a certificate is legitimate.
It’s important to recognise that the foundation of an authentication and authorization system based on mTLS is the security of the CA. CAs can be compromised in a number of ways, but two are most common. Either the attacker will exfiltrate the private key and use this to sign their own certificates, or, the attacker will find a way to trick the CA into issuing certificates. In both cases, the attacker now has the ability to impersonate workloads within your infrastructure. This makes configuring your internal CA correctly of the utmost importance.
Teleport Machine & Workload Identity is a flexible and secure solution for issuing short-lived certificates for workloads. It significantly reduces the effort involved in establishing and maintaining a certificate authority, and brings a whole host of other benefits, such as detailed audit logging and precise control of what certificates can be issued using the RBAC engine.
In this blog post, we’ll use Teleport Workload Identity to configure mTLS between an NGINX server and a client. We’ll see how Teleport allows you to strictly control and audit the issuance of X.509 certificates.
How it works
Teleport Workload Identity issues certificates using an internal CA managed by a Teleport cluster. The issuance of these certificates is gated by Teleport’s RBAC engine, with an identities access defined by the roles it possesses.
Within Teleport, a machines identity is represented as a Bot. Like a human user, a Bot possesses roles which control what actions it can take.
Teleport provides an agent, tbot
, which is installed close to the workloads that require certificates. The agent authenticates with the Teleport cluster as a Bot, requests the signing of certificates, and then writes these to a configured destination. The workloads that require certificates can then read these from the destination.
Teleport issues short-lived certificates, which limits the amount of time that an exfiltrated certificate would be useful to a bad actor. The tbot
agent automatically handles renewing these certificates before they expire.
The tbot
agent authenticates to the Teleport cluster as a Bot in a process known as joining. The exact nature of this process depends on the platform on which tbot
is installed. Typically, some form of platform identity available to tbot
is verified by the Teleport cluster. On AWS this might be the IAM role associated with an EC2 instance or on-prem this might be a physical TPM installed in the machine. Rules can be configured that restrict joining to specific VMs, containers and CI runs. This style of authentication avoids the existence of a long-lived secret such as an API token which are prone to exfiltration and allows for fine-grained access control.
SPIFFE
The X.509 certificates issued by Teleport Workload Identity are compatible with a modern open-source standard for workload identities, the Secure Production Identity Framework For Everyone (SPIFFE). This allows them to work with a number of off-the-shelf tools that are also SPIFFE compatible.
Due to this, there’s some additional SPIFFE-specific terminology to be aware of:
- Secure Verifiable Identity Document (SVID): This refers to a standardised way of encoding a workload identity into a document which can be verified by another party. Multiple forms of SVID are defined, but the most common is an X.509 certificate. Since Teleport issues X.509 certificates which are compatible with SPIFFE, the certificates that we will issue to the client and server can be referred to as SVIDs.
- Trust Bundle: This refers to the data needed by a workload verify an SVID. In our case of an X.509 SVID, this is the CA’s certificate.
You can learn more about SPIFFE at https://spiffe.io/docs/latest/spiffe-about/spiffe-concepts/
Putting this into practice
In our example, we will have two virtual machines. Whilst we’re using VMs for this example, these could just as easily be Pods in a Kubernetes Cluster.
Both VMs will run Ubuntu 24.04, but will be hosted on different cloud platforms. This will demonstrate how Teleport Workload Identity can be a useful tool in establishing trust in heterogeneous environments.
Our Teleport cluster will be hosted on Teleport cloud, and will be referred to as example.teleport.sh
.
The first VM will act as our server:
- Resolvable by DNS as
server.internal.example.com
- Hosted on Google Cloud Platform GCE
- It will be assigned a GCP service account called
server
- We will install
tbot
and NGINX.
The second VM will act as our client:
- Resolvable by DNS as
client.internal.example.com
- Hosted on Amazon Web Services EC2
- It will be configured to assume an IAM role called
client
- We will install
tbot
.
Configuring Teleport Workload Identity
Before we configure the software on the individual VMs, we first need to configure Teleport Workload Identity. We configure Teleport using resources specified in YAML. For each VM, we’ll need to create three resources: a Bot, a Role and a Join Token.
Before proceeding, make sure you’ve logged into the Teleport cluster with tsh
and have installed tctl
.
First, we’ll create the roles. These will grant the the ability to request X.509 SVIDs with specific properties. In our case, we’ll create role for the server that allows issuing a certificate for server.internal.example.com
and a role for the client that allows issuing a certificate for client.internal.example.com
. Later, we’ll assign these roles to the Bot identities.
Create server-role.yaml
:
kind: role
version: v6
metadata:
name: server-workload-id
spec:
allow:
spiffe:
- path: "/svc/server"
dns_sans: ["server.internal.example.com"]
Create client-role.yaml
:
kind: role
version: v6
metadata:
name: client-workload-id
spec:
allow:
spiffe:
- path: "/svc/client"
dns_sans: ["client.internal.example.com"]
Next, we’ll create the the join tokens. These control how the Bots will authenticate to Teleport.
For the server, we’ll use a gcp
join method which allows the VM to authenticate as the server-workload-id
bot using its GCP service account. Create server-join-token.yaml
:
kind: token
version: v2
metadata:
name: server-workload-id
spec:
roles: [Bot]
bot_name: server-workload-id
join_method: gcp
gcp:
allow:
- project_ids:
- my-project-123456
service_accounts:
- server@my-project-123456.iam.gserviceaccount.com
For the client, we’ll create an iam
method based join token which allows the VM to authenticate as the client-workload-id
bot using its assumed AWS IAM role. Create client-join-token.yaml
:
kind: token
version: v2
metadata:
name: client-workload-id
spec:
roles: [Bot]
bot_name: client-workload-id
join_method: iam
allow:
- aws_account: "12345678"
aws_arn: "arn:aws:sts::12345678:assumed-role/client/i-*"
With all the resource files defined, we now need to apply them to our Teleport cluster using tctl
:
$ tctl create -f ./*.yaml
Finally, we can create the Bots themselves. These will represent the identity of the machines within Teleport. We’ll want to specify the role and join token for each:
$ tctl bots add server-workload-id --roles server-workload-id --token server-workload-id
$ tctl bots add client-workload-id --roles client-workload-id --token client-workload-id
Configuring the server.internal.example.com
We’ll start by installing and configuring tbot
. Teleport provides a simple shell script for this that will automatically add the appropriate repositories and then install the package. You can also perform these steps manually.
$ curl https://goteleport.com/static/install.sh | bash -s 16.4.3 oss
Next, we’ll create a new user that our tbot
service will run as, create some directories and assign this user ownership of these directories. We’ll create two directories: /opt/workload-id
for storing the certificates and trust bundle output by tbot
and /var/lib/teleport/bot
to persist internal data that tbot
requires:
$ mkdir /opt/workload-id
$ useradd -r -s /bin/false teleport
$ chown -R teleport:teleport /opt/workload-id
Now we need to configure tbot
. For this, a YAML file is used. We’ll need to tell it how to authenticate to the Teleport cluster and where to write the certificates that NGINX will use. Since we’re on GCP, we’ll specify the join method as gcp
and we’ll specify the name of the token we created earlier.
$ tbot configure spiffe-x509-svid \
--storage memory:/
--proxy-server example.teleport.sh.com:443 \
--destination /opt/workload-id \
--join-method gcp \
--token server-workload-id \
--svid-path=/svc/server \
--dns-san server.internal.example.com > /etc/tbot.yaml
To keep tbot
running in the background, continually renewing our short-lived X.509 certificate, we’ll create a systemd service:
$ tbot install systemd \
--write \
--config /etc/tbot.yaml \
--user teleport \
--group teleport \
--anonymous-telemetry
$ systemctl daemon-reload && systemctl start tbot
Now that we’ve started tbot
, we can check that it’s working correctly by looking at the logs. We should also check that the svid.pem
, svid_bundle.pem
and svid_key.pem
artefacts have been output by tbot
:
$ journalctl -u tbot
$ ls -lah /opt/workload-id/
total 20K
drwxr-x---+ 2 teleport teleport 4.0K Jun 19 09:54 .
drwxr-xr-x 3 root root 4.0K Jun 19 09:54 ..
-rw-rw----+ 1 teleport teleport 1.5K Jun 19 10:03 svid.pem
-rw-rw----+ 1 teleport teleport 1.4K Jun 19 10:03 svid_bundle.pem
-rw-rw----+ 1 teleport teleport 1.7K Jun 19 10:03 svid_key.pem
Configuring NGINX
Next, we can install NGINX. Fortunately, this is a package available in the standard repositories of Ubuntu:
$ apt install nginx
Now we’ll need to configure NGINX to do a few things:
- To listen for TLS connections, using the X.509 SVID output by
tbot
as the server’s certificate. - To accept TLS certificates from clients and verify these using the certificate authority in the trust bundle output by
tbot
. - To serve a simple page that shows the identity of the client that has connected or returns a 403 to clients without a certificate.
Create /etc/nginx/sites-available/default
server
{
listen 443 ssl default_server;
listen [::]:443 ssl default_server;
server_name server.internal.example.com;
ssl_certificate /opt/workload-id/svid.pem;
ssl_certificate_key /opt/workload-id/svid_key.pem;
ssl_client_certificate /opt/workload-id/svid_bundle.pem;
ssl_verify_client optional;
location /
{
if ($ssl_client_verify != "SUCCESS")
{
return 403 "403: You have not presented a client cert\n";
}
return 200 "Hello, your identity is: $ssl_client_s_dn\n";
}
}
Restart the NGINX service so the new configuration is loaded:
$ systemctl restart nginx
Before we move on to configuring our client VM, we can test that our NGINX configuration works as we expect by using the server’s own certificate. The certificates issued by Teleport Workload Identity are not specifically client or server certificates, and this is useful in cases where a service that exposes an endpoint also needs to make calls to another service. Let’s use curl to test our endpoint:
$ curl --cert /opt/workload-id/svid.pem --key /opt/workload-id/svid_key.pem --cacert /opt/workload-id/svid_bundle.pem https://server.internal.example.com
Hello, your identity is: CN=server.internal.example.com
Configuring client.internal.example.com
On the client VM, we’ll only need to install tbot
. We’ll follow similar steps to how we installed it on the server with a few minor changes.
$ curl https://goteleport.com/static/install.sh | bash -s 16.4.3 oss
$ mkdir /opt/workload-id
$ useradd -r -s /bin/false teleport
$ chown -R teleport:teleport /opt/workload-id
This time, we’ll configure tbot.yaml
to use the iam
join method with the client-workload-id
token as our client VM is hosted on AWS EC2:
$ tbot configure spiffe-x509-svid \
--storage memory:/
--proxy-server example.teleport.sh.com:443 \
--destination /opt/workload-id \
--join-method iam \
--token client-workload-id \
--svid-path=/svc/client \
--dns-san client.internal.example.com > /etc/tbot.yaml
As we did before, we’ll create a systemd service:
$ tbot install systemd \
--write \
--config /etc/tbot.yaml \
--user teleport \
--group teleport \
--anonymous-telemetry
$ systemctl daemon-reload && systemctl start tbot
Testing our configuration
With everything configured, we can now check that it works.
On our client VM, we’ll use curl to make a HTTP request to NGINX. We’ll need to provide the certificate, key, and the bundle that contains the CA:
$ curl --cert /opt/workload-id/svid.pem --key /opt/workload-id/svid_key.pem --cacert /opt/workload-id/svid_bundle.pem https://server.internal.example.com
Hello, your identity is: CN=client.internal.example.com
Great - we can see that the client has been able to connect, and the server has correctly identified that the connection is coming from client.internal.example.com
. Currently, our server is configured to allow any connection from a client that presents a certificate signed by our Workload Identity CA, but, you could further tune the configuration to limit access to a specific identity.
Let’s see what happens when our client doesn’t present a certificate:
$ curl --cacert /opt/workload-id/svid_bundle.pem https://server.internal.example.com
403: You have not presented a client cert
We can see that our 403 message is returned - the server has rejected the unauthenticated client!
Exploring the Audit Log
The Teleport cluster has an audit log to provide insight into actions taken by users and bots within the cluster. Let’s take a look at what events have been emitted. Log into the Teleport Cluster and browse to the “Audit Log” under “Access Management”.
The first type of audit event that you’ll see is “Bot Join”. This is emitted each time tbot
initially joins or renews its certificates, and exposes information about the identity it used to join. For example, the name and the location of the VM that tbot
was running on:
# Bot [server-workload-id] joined the cluster using the [gcp] join method
{
"addr.remote": "34.122.104.232",
"attributes": {
"email": "server@my-project-123456.iam.gserviceaccount.com",
"google": {
"compute_engine": {
"instance_id": "1601951602541797202",
"instance_name": "server",
"project_id": "my-project-123456",
"zone": "us-central1-a"
}
}
},
"bot_name": "server-workload-id",
"cluster_name": "example.teleport.sh",
"code": "TJ001I",
"ei": 0,
"event": "bot.join",
"method": "gcp",
"success": true,
"time": "2024-06-19T10:23:50.478Z",
"token_name": "server-workload-id",
"uid": "b6e5cc9b-0779-4280-bc88-a6e9bef8e41d"
}
You’ll also receive a “SPIFFE SVID Issued” audit event each time the Bot requests the issuance of an X.509 SVID. This includes information about the user or bot that requested the certificate, and the contents of the certificate itself:
# User [bot-server-workload-id] issued SPIFFE SVID [spiffe:
{
"addr.remote": "34.122.104.232:53102",
"cluster_name": "example.teleport.sh",
"code": "TSPIFFE000I",
"dns_sans": [
"server.internal.example.com"
],
"ei": 0,
"event": "spiffe.svid.issued",
"hint": "",
"impersonator": "bot-server-workload-id",
"serial_number": "ca:e0:a3:70:39:07:3c:64:63:aa:ce:b2:ff:33:c6:e1",
"spiffe_id": "spiffe://example.teleport.sh/svc/server",
"svid_type": "x509",
"time": "2024-06-19T10:23:51.558Z",
"uid": "b8bd786c-3810-42ea-919b-337dc40badb5",
"user": "bot-server-workload-id",
"user_kind": 2
}
These audit events are a key part in securing your infrastructure. They can be shipped by Teleport to your chosen SIEM or log management platform, and here they can be monitored for unusual activity. This unusual activity can be further investigated using the audit log, and if necessary, you can take action to lock out a bad actor.
Conclusion
In this post we’ve explored the fundamentals of workload identity, seen how Teleport Workload Identity can be used to issue short-lived cryptographic identities to your workloads without the use of long-lived secrets, and explored the resulting audit trail.