Earlier this week, several Google services, including Gmail, YouTube, Drive, and others, stopped working for users around the world. Days after this massive outage, the technology giant has now explained why this happened.
The company is blaming its own automated storage quota management system for the outage. Explaining how and why this happened, Google says that the company is moving tools for verifying and tracking logged-in users to a new file storage system, and that process caused the issue.
In a blog post, the company wrote: “As part of an ongoing migration of the User ID Service to a new quota system, a change was made in October to register the User ID Service with the new quota system, but parts of the previous quota system were left in place which incorrectly reported the usage for the User ID Service as 0. An existing grace period on enforcing quota restrictions delayed the impact, which eventually expired, triggering automated quota systems to decrease the quota allowed for the User ID service and triggering this incident.”
While a team of engineers at Google were able to address the problem relatively quickly, the company has said that it plans to implement new measures to prevent a similar situation in the future. It also plans to improve monitoring systems to catch incorrect configurations sooner.
Tendering an apology, Google said, “We would like to apologize for the scope of impact that this incident had on our customers and their businesses. We take any incident that affects the availability and reliability of our customers extremely seriously, particularly incidents which span multiple regions.”
After the massive multi-region outage on Monday, Gmail again faced issues on Tuesday, which Google described as a subset of users seeing error messages, high latency, and/or other unexpected behavior that was resolved in a few hours.