This article is contributed. See the original author and article here.

Sometime either primary or secondary cluster certificates get expired before you can rotate with new certificate which can cause cluster to be inaccessible or unreachable then you can follow these steps to recover standalone Service Fabric cluster. If you looking to rotate a near expiry certificate refer to previous article : Certificate rotation Azure Service Fabric Standalone cluster – Microsoft Tech Community


This article assumes you are running cluster with thumbprint approach. In general, the common name approach is recommended for easy certificate management. More information about certificate on Standalone cluster refer to Secure a cluster on Windows by using certificates – Azure Service Fabric | Microsoft Docs


 


Recover Azure Service Fabric Standalone Cluster which is inaccessible or unreachable due to expired cluster certificates:


 



  1. Create or get the new certificate.  

  2. Deploy the new cert to all nodes manually by following https://docs.microsoft.com/en-us/powershell/module/pkiclient/import-pfxcertificate?view=win10-ps  


     3. RDP into each VM and make sure certificate is present and the private key is already ACL’d to  ‘Network Service’ 


          a ) Run certlm.msc 


          b) Find the new certificate


          c) Right click cert, Manage Private Keys, ensure NETWORK SERVICE has full permissions 


 


    4. Stop  and disable “Microsoft Service Fabric Host Service” service in command prompt with administrative rights. 


         Set-Service -ServiceName FabricHostSvc -StartupType disabled 


         net stop FabricHostSvc 


   5.  Locate ClusterManifest.current.xml in the cluster root folder like “C:ProgramDataSFFabricClusterManifest.current.xml” according to actual datapath deployed, and copy to somewhere like C:TempclusterManifest.xml 


   


   6.  Remove clusterManifest.xml read-only attribute and Modify the C:TempclusterManifest.xml and update with new thumbprint. 


     a) Replace all occurrences of old cert with the new thumbprint .


  


   7. Locate InfrastructureManifest.xml from .FabricFabric.DataInfrastructureManifest.xml path, for my case, it is C:ProgramDataSFvm0FabricFabric.DataInfrastructureManifest.xml as dataroot is at C:ProgramData, and copy to c:temp too.  


 


  8. Modify the C:TempInfrastructureManifest.xml and update with new thumbprint. 


      a) Replace all occurrences of old cert with the new thumbprint 


 


   9.  Run following cmdlet to update the Service Fabric cluster, replace the SvcFab path according to the actual path.   


New-ServiceFabricNodeConfiguration “C:ProgramDataSF” -FabricLogRoot “C:ProgramDataSFlog” -ClusterManifestPath “C:TempclusterManifest.xml” -InfrastructureManifestPath “C:tempInfrastructureManifest.xml” 


 


  10.  Look  for  ”C:ProgramDataSFvm0FabricFabric.Package.current.xml”  and note  the “Configuration version” 


GetImage.png


 


Cd into the corresponding folder 


GetImage (1).png


 


Edit “C:ProgramDataSFvm0FabricFabric.Config.0.131572537807340469Settings.xml”  and Replace all occurrences of old cert with the new thumbprint .


 


11. Set the services “Microsoft Service Fabric Host Service” startup type and start it again 


Set-Service -ServiceName FabricHostSvc -StartupType automatic 


net start FabricHostSvc 


 


12. Repeat the above steps on every cluster node. 


 


13. After step 12 you should able to reconnect to the cluster over SFX and PowerShell.   


 


14. Now, even the SFX is working, and you can call Connect-ServiceFabricCluster from one of cluster node and secure connection is fine, but Get-ServiceFabricClusterConfiguration still give you the old cluster thumbprint in deployment JSON file. 


 


Get-ServiceFabricClusterConfiguration still outputs the old cluster cert thumbprint expired as expected. 


GetImage.jpeg


 


15. We will have to use  set-ServiceFabricUpgradeOrchestrationServiceState to get into the cluster state  



  1. Connect-ServiceFabricCluster 

  2. Get-ServiceFabricUpgradeOrchestrationServiceState | Out-File .state.json 

  3. Replace the old thumbprint in state.json file with the new thumbprint. 

  4. Set it back “set-ServiceFabricUpgradeOrchestrationServiceState -StateFilePath c:60CU2state.json


16. Run Get-ServiceFabricClusterConfiguration cmdlet again, you should see the updated cert info. 


 

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.