On 15 October, an issue occurred with our configuration service following a deployment. While initial testing showed everything working normally, the system was actually relying on cached data. Once this cache expired, some clients began receiving incorrect default configuration instead of their intended, client-specific setup.
Impact:
- Some users may have experienced missing or incorrect configuration-driven functionality.
- The issue lasted until 08:16, when corrective actions were completed.
What Happened:
- A deployment introduced a fault in the part of our system responsible for fetching configuration data.
- Testing did not detect the fault because cached data was still being served.
- Once the cache expired overnight, incorrect configuration was delivered to clients.
- The issue was reported early the following morning and resolved shortly afterwards.
Resolution:
- We rolled back the change to the last stable version.
- Caches were cleared and rebuilt to ensure correct configuration was delivered.
- Normal service was fully restored by 08:16.
Preventative Actions
- We are putting in place the following improvements:
- Stronger testing – all future testing of configuration will include cache clearing to ensure live data is always checked.
- Deployment safeguards – cache refresh steps will now be part of our deployment process.
- Updated documentation – engineering guidelines have been updated to make cache management clearer.
Lessons Learned
- This incident highlighted that reliance on cached data during testing can hide underlying issues. We are adjusting our processes so that similar problems are caught before changes reach production.
We apologise for any disruption caused and thank you for your patience while we resolved this issue.