Client certificate authentication for Azure Service Fabric cluster API endpoint

In two previous posts I explained how to setup SSL for a local Azure Service Fabric cluster and how to configure this for a cluster running on Azure. In this post I describe how to setup client certificate authentication for the same API endpoint. Client certificate authentication requires that a client can only access the API with a client authentication certificate (certificate purpose 1.3.6.1.5.5.7.3.2).

At the moment of writing this post there is no built-in support for client certificate authentication in Service Fabric that I could find. So although everything described below actually works, it won't win any beauty contests :)

Let's begin: first of all, client certificate authentication won't work without server authentication. So before you continue, make sure the Service Fabric API endpoint is protected by a server authentication certificate (check this post for details).

Before we start

On Windows, a server authentication certificate is bound to one or more specific TCP ports. When a client (a browser for example) sends an http request to this port, the server responds with the configured certificate (among other things). This is of course a gross oversimplification but it serves our purpose.

You can check which certificates are bound to which ports using the netsh command: netsh http show sslcert.

In the screenshot you can see that on my local machine, the certificate with thumbprint 6ffb99586b7580f67e8e6bb65a19067c62fb872b is bound to ports 44389 and 44399. If you look more closely at the output, you see that there is a property Negotiate Client Certificate for each port binding. If we can set this property to Enabled for the right port binding, we're done.

If we take the binding for port 44399 as example, the following two statements accomplish that (line breaks are just for readability, each statement should be on a single line):

netsh http delete sslcert ipport=0.0.0.0:44399  
netsh http add sslcert ipport=0.0.0.0:44399 `  
                       certhash=6ffb99586b7580f67e8e6bb65a19067c62fb872b `
                       appid="{214124cd-d05b-4309-9af9-9caa44b2b74a}" `
                       clientcertnegotiation=enable

If we take a look at the output now it looks like this (showing just port 44399):

That was easy! The real problem is: how to do the same on the virtual machines in an Azure Service Fabric cluster?

Setup entry points

To make this work, we can use a feature of Service Fabric called a setup entry point. Besides the long running process that each micro service actually is, you can have special setup tasks that run each time a service is started on a cluster node. We will use a setup entry point to enable client certificate negotiation. In my configuration (ServiceManifest.xml) this looks as follows:

<CodePackage Name="Code" Version="1.0.0">  
  <SetupEntryPoint>
    <ExeHost>
      <Program>EnableClientCertAuth.bat</Program>
      <Arguments>0.0.0.0:8677</Arguments>
      <WorkingFolder>CodePackage</WorkingFolder>
    </ExeHost>
  </SetupEntryPoint>
  <EntryPoint>
    <ExeHost>
      <Program>MyServices.SF.Api.exe</Program>
    </ExeHost>
  </EntryPoint>
</CodePackage>  

Besides the regular EntryPoint we now also have a SetupEntryPoint. It has a batch file as the program and we pass the ipport as argument. In my case this is 0.0.0.0:8677.

Batch file and PowerShell script

The batch file EnableClientCertAuth.bat should be located at the project root. It's very simple as it just calls a PowerShell script to do the real work:

powershell.exe -ExecutionPolicy Bypass `  
               -Command ".\EnableClientCertAuth.ps1 -IpPort %1"

The PowerShell script should also be located at the project root and both files must be copied to the build directory. In Visual Studio solution explorer:

First I'll show the PowerShell script itself, then an explanation of what happens.

param([String]$IpPort)

$match = (netsh http show sslcert |
          Select-String -Pattern $IpPort -Context:0,1 -SimpleMatch)
if ($match -eq $null) {  
  Write-Warning "IpPort $ipPort not found in output of 'netsh http show sslcert'"
  exit
}
else {  
  $certHash = $match.Context.PostContext.Split(@(": "), 1)[1]
}

Write-Output "Deleting SSL cert $certHash on port $IpPort"  
netsh http delete sslcert ipport=$IpPort

Write-Output @"  
Adding SSL cert $certHash on port $IpPort with clientcertnegotiation=enable  
"@
netsh http add sslcert ipport=$IpPort `  
                       certhash=$certHash `
                       appid="{11223344-5566-7788-9900-aabbccddeeff}" `
                       clientcertnegotiation=enable

The script has four steps:

  1. Use Select-String to find the output lines that match the specified ipport. We are looking for the certificate hash which always appears one line below the ipport.
  2. Get the certificate hash from the result by splitting on : if we actually found a result.
  3. Delete the binding for the specified ipport.
  4. Add the binding back but now with client certificate negotiation enabled. We need the certificate hash here so that is why we did all the parsing.

Privileged entry point

We now have a batch file, a PowerShell script and a setup entry point that runs the batch file. The only thing we haven't covered yet is that for binding a certificate to a port you need administrator privileges. So the batch file should run under a privileged account.

This is all very well described in the Service Fabric documentation so I'll just repeat here for the sake of completeness. First you add a principal to your application manifest:

<Principals>  
  <Users>
    <User Name="ApiSetupAdminUser" AccountType="LocalSystem" />
  </Users>
</Principals>  

And next you specify an additional policy in your ServiceManifestImport:

<Policies>  
  <RunAsPolicy CodePackageRef="Code" UserRef="ApiSetupAdminUser"
               EntryPointType="Setup" />
</Policies>  

Conclusion

If we deploy the updated Service Fabric application to our cluster (or locally), it will run the batch file on every node before our actual service starts. The certificate port binding will be removed and re-added with client certificate negotiation enabled.

To be honest, I'm not perfectly happy with the approach above for two reasons:

  1. I'm depending on string parsing for retrieving some information I need. I don't think a lot will change to the output format of netsh http show sslcert but it doesn't feel like a very stable solution.
  2. I'm actually hard-coding the ipport in my ServiceManifest.xml and I can't easily change this between environments.

Unfortunately, it's the only way I can think of to make this work. I'd rather have declarative support in the Service Fabric endpoint configuration instead. Something like this:

<Endpoint Protocol="https" Name="WebEndpointHttps" Type="Input"  
          Port="8677" EnableClientCertificateNegotiation="True" />

There is actually a UserVoice request for supporting client certificate authentication so if you think this is important, please vote for it.

Important note

You may think that we have now protected our API because a client must first present a valid client certificate. This is actually literally true: any client with a valid client certificate can access our API. We have only implemented the authentication part: a client must tell who he is before entering.

You still need to authorize clients somehow; determining what an authenticated client is actually allowed to do. You can, for example, maintain a list of valid client certificates and deny access to any other certificate. Or you map client certificates to users in a database.