Primary Cloud, Exosphere User Interface, ASU Regional Cloud, Cornell Regional Cloud, Hawaii Regional Cloud, TACC Regional Cloud, Docs Site, Website, Jetstream2 Support
Primary Data Center, ASU Data Center, Cornell Data Center, Hawaii Data Center, TACC Data Center
February 12, 2025 11:27AM EST
February 12, 2025 4:27PM UTC
[Resolved] Jetstream2 experienced network outages on Friday, February 7, 2025 and Saturday, February 8, 2025, each lasting multiple hours.
Log analysis showed a cascade failure stemming from an automated security update that was staged across hosts on those two mornings. This security update triggered an unattended service restart which caused routing failures in the networkd and free range routing services used by Jetstream2.
The purpose of automated security updates is intended to protect the Jetstream2 cloud and our users; however, we have determined that adopting a more manual process would be preferable. This approach does require planning and could lead to more frequent, scheduled downtimes in the future, but it would be an improvement for overall stability. Additionally, we are exploring supplemental networking monitors to enhance our existing service monitors. These adjustments will help prevent such outages going forward.
February 8, 2025 4:44PM EST
February 8, 2025 9:44PM UTC
[Monitoring] Networking issues have been resolved and seem to have stabilized. We are continuing to monitor the situation. Thank you for your patience and understanding throughout this service disruption.
February 8, 2025 9:32AM EST
February 8, 2025 2:32PM UTC
[Investigating] We are aware of networking issues across the system at the moment and are currently investigating the issues. Users may be unable to connect to Cacao or their instances. We appreciate your patience and understanding as we work towards resolving this issue.
February 7, 2025 10:05AM EST
February 7, 2025 3:05PM UTC
[Monitoring] Networking issues have been resolved and seem to have stabilized. We are continuing to monitor the situation. Thank you for your patience and understanding throughout this service disruption.
February 7, 2025 8:46AM EST
February 7, 2025 1:46PM UTC
[Investigating] We are aware of networking issues across the system at the moment. Users may be unable to connect to Exosphere, Horizon, Cacao, or their instances. We appreciate your patience and understanding as we work towards resolving this issue.