Workbooks cannot be opened for all Azure-hosted orgs
Incident Report for Sigma Computing
Postmortem

Sigma Engineering has completed the postmortem of this incident. A Root Cause Analysis (RCA) is available on the Sigma Community site: Post Mortem for Sep 6, 2023: Workbooks cannot be opened for all Azure-hosted orgs.

Posted Sep 20, 2023 - 00:13 UTC

Resolved
Current Status: Production HMS Key Vault is back into healthy state as of 5:22PM EST and we expect all Azure customers can now access their workbooks and data warehouse connections.

Impacted Cloud Regions & Services: Sigma users hosted on Azure cloud

User Experience: All users on Azure hosted organizations can now access their workbooks and data warehouse connections.

Incident Start Time: Approximately 19:50 UTC Sep 6, 2023

Incident End Time: Approximately 21:45 UTC Sep 7, 2023

We will perform a root cause analysis (RCA) and publish our findings in a postmortem note to the Sigma Community site in coming days. You will receive a notification when that occurs.
Posted Sep 07, 2023 - 21:44 UTC
Update
Engineering is actively debugging our internal Production organization, data warehouse connections setup, and new HSM Key Vault to verify recovery of our test workbooks.
Posted Sep 07, 2023 - 21:35 UTC
Update
Our Infrastructure team is verifying and testing the internal workaround solution on internal Production testing organizations. Further investigation is underway on the scope of other Sigma assets that may require system updates in the event that we proceed with the internal workaround solution.
Posted Sep 07, 2023 - 19:26 UTC
Update
We are actively continuing to engage with Azure Support and moving along the escalation paths. Internally, we are verifying and testing the internal workaround solution in Staging environments.
Posted Sep 07, 2023 - 17:28 UTC
Update
Our infrastructure team is continuing to engage with Azure’s Technical Support Engineering while continuously escalating to work directly with a dedicated Azure HSM Engineering team. Progress was made on internal workaround solutions as we work through verifying the impacted underlying credentials and metadata.
Posted Sep 07, 2023 - 15:16 UTC
Update
Our infrastructure team is actively working alongside Azure’s Technical Support Engineering but continuously escalating to work directly with the dedicated Azure HSM Engineering team. Additionally, we have begun testing and debugging internal workaround solutions. We apologize for the continued trouble and will provide more information as we have it.
Posted Sep 07, 2023 - 13:19 UTC
Update
Our Infrastructure team continues to work alongside Azure’s Support and Engineering team to have the Production Azure HSM Key Vault return to a healthy state. Additionally, we have spun off parallel internal efforts to provide workaround solutions that will allow customers to reconnect to their warehouses and have workbooks recovered with limited functionality.
Posted Sep 07, 2023 - 10:39 UTC
Update
Through continued engineering engagement with Azure Support, we have learned that the deployment of the Production Azure HSM Key Vault has entered an inconsistent state. We are working with Azure Support to explore avenues for recovery of the keys in this managed HSM Key Vault, and this will require engaging Azure Engineering. Your patience is appreciated as our Infrastructure team liaises with Azure’s APAC team and remain committed to identifying mitigation measures on a rotational basis until this incident is resolved. Please stay tuned for the next update in 3 hours.
Posted Sep 07, 2023 - 07:12 UTC
Update
We're currently collaborating with Azure Support to address an issue related to establishing a secure connection for our key vault in the production environment. Initial assessments suggest it might be related to DNS caching on Azure's side. Azure's technical team is exploring possible solutions, including adjusting DNS caching settings. We appreciate your patience and will provide further updates as we make progress.
Posted Sep 07, 2023 - 05:17 UTC
Update
Engineering has verified a process in Staging where deleting the entire HSM Key Vault and restoring it from backup restores access to workbooks. They are next working on reproducing in Staging the "Unknown blocking process" issue that we encountered in Production. Once the "Unknown blocking process" issue has been reproduced in Staging, they will be able to confirm that this delete/restore of the Key Vault should resolve the issue in Production. Please stay with us for updates to follow.
Posted Sep 07, 2023 - 04:01 UTC
Update
Our engineering team remains committed to addressing the current issue and is tirelessly working to resolve it. We have a fix applied in our Staging environment which has been verified and works. However, we are encountering an "Unknown blocking process" error that is preventing its application in the Production environment. This is being investigated with Azure Cloud Support and Engineering. We anticipate that once this blocking process is removed, we should be able to run our fix in Production.
Posted Sep 07, 2023 - 02:15 UTC
Update
Engineering continues to thoroughly investigate this issue around the clock. They attempted a solution to the issue in Staging and verified that this worked in the Staging environment. However, this attempted solution does not apply to the Production environment, due to a conflict with a Cloud Provider Background process. We have escalated this issue with Azure escalation management to secure assistance in resolving this conflict. We'll keep you informed about our progress.
Posted Sep 06, 2023 - 23:15 UTC
Identified
Our engineers have identified the root cause for workbooks not loading and are developing mitigation measures to circumvent it. The Workbooks' loading issue was caused by Azure Key Vault becoming inaccessible, affecting our encryption/decryption service. Customer metadata remains secure, and workbook access will be restored when we regain access to Azure Key Vault. We're actively collaborating with Azure support to address the problem.
We'll provide another update within 60 minutes.
Posted Sep 06, 2023 - 22:06 UTC
Investigating
Current Status: We are investigating an issue causing workbooks to not load for all customers on Azure-hosted orgs. We are treating this as our highest priority. Our apologies for any inconvenience caused.

Impacted Cloud Regions & Services: Sigma users hosted on Azure

User Experience: Workbooks not loading for all customers on Azure-hosted clouds

Incident Start Time: Approximately 19:50 UTC Sep 6, 2023
Posted Sep 06, 2023 - 20:56 UTC
This incident affected: Azure (Sigma - Azure (US)).