IT News

UPDATE: **Summary:**...

Google Apps Status - 2 hours 15 min ago

Incident began at 2024-09-16 17:11 (times are in Coordinated Universal Time (UTC)).

Summary:

Google Drive users may experience increased error rates.

Description:

We are experiencing an intermittent issue with Google Drive beginning on Monday, 2024-09-16 10:11 US/Pacific.

Our engineering team continues to investigate the issue.

We will provide an update by Monday, 2024-09-16 11:45 US/Pacific with current details.

We apologize to all who are affected by the disruption.

Diagnosis:

Google Drive users may experience increased error rates.

Workaround:

Customers may see success in retries.

Affected products: Google Drive

Categories: IT News

RESOLVED: We're investigating reports of an issue with Gmail. We will provide more information shortly.

Google Apps Status - 22 August, 2024 - 11:51

Incident began at 2024-08-19 18:10 and ended at 2024-08-19 18:30 (times are in Coordinated Universal Time (UTC)).

Incident Report Summary

On Monday, 19 August 2024, at 11:10 AM US/Pacific, Gmail experienced an increase in the number of server unavailability (500/502) errors for a duration of 20 minutes. Impacted users may have encountered errors when attempting to access Gmail via IMAP / web / mobile clients.

To our affected customers, we sincerely apologize for the inconvenience this disruption may have caused. Our engineers have been focused on the investigation and analysis that followed, in order to ensure this issue does not recur.

Root Cause

Our engineering team has conducted a thorough investigation of the underlying cause and has identified a combination of factors that led to the issue:

  • A previously unknown condition existed in our system that, when triggered, could result in performance slowdowns and errors when the service instances are restarted during high-load situations.
  • A recent change that was introduced to improve the efficiency of our system interacted negatively with an ongoing unrelated canary-testing of another system configuration. This combination resulted in a high load situation.
  • At the same time, a routine rollout of a new software version caused a restart of the canary test area during a peak usage period, further increasing the load.
Remediation and Prevention

Google engineers were alerted to the service disruption on Monday, 19 August 2024 at 11:19 AM US/Pacific via our proactive monitoring system and immediately started an investigation. The team quickly concluded that the errors were caused by instance restarts and immediately halted all deployments to mitigate the issue at 11:30 US/Pacific.

Google is committed to preventing a repeat of the issue in the future and is completing the following actions:

  • Fix the latent issue that causes performance slowdowns and system errors when service instances are restarted under high load.
  • Introduce a step to force a system restart under high load as part of our efficiency rollouts process.

If your service or application was affected, we apologize — this is not the level of quality and reliability we strive to offer you, and we have taken and are taking immediate steps to improve the platform’s performance and availability.

Detailed Description of Impact

On Monday, 19 August, 2024 at 11:10 AM US/Pacific, a small percentage of Gmail customers experienced increased server unavailability errors for a duration of 20 minutes. Customers who were impacted experienced Gmail account unavailability errors on their IMAP, web, and mobile app. The impacted customers also experienced Internal server (500) errors along with other in app errors.

Affected products: Gmail

Categories: IT News

RESOLVED: We're investigating reports of an issue with Gmail. We will provide more information shortly.

Google Apps Status - 19 August, 2024 - 16:05

Incident began at 2024-08-19 18:10 and ended at 2024-08-19 18:30 (times are in Coordinated Universal Time (UTC)).

Mini Incident Report

We apologize for the inconvenience this service disruption/outage may have caused. We would like to provide some information about this incident below. Please note, this information is based on our best knowledge at the time of posting and is subject to change as our investigation continues. If you have experienced impact outside of what is listed below, please reach out to Google Workspace Support using help article https://support.google.com/a/answer/1047213.

(All Times US/Pacific)

Incident Start: 19 August, 2024 11:10

Incident End: 19 August, 2024 11:30

Duration: 20 minutes

Affected Services and Features:

Gmail

Regions/Zones: Global

Description:

On 19 August, 2024 11:10 US/Pacific, Gmail experienced increased server unavailability errors for a duration of 20 minutes. From preliminary analysis, the root cause of the issue was resource contention issues on the Gmail backend.

Google will complete a full Incident Report in the following days that will provide a full root cause.

Customer Impact:

Customers may have been unable to access Gmail via IMAP, web or mobile clients.

Affected products: Gmail

Categories: IT News

RESOLVED: We're investigating reports of an issue with Gmail. We will provide more information shortly.

Google Apps Status - 19 August, 2024 - 12:17

Incident began at 2024-08-19 18:10 and ended at 2024-08-19 18:30 (times are in Coordinated Universal Time (UTC)).

The issue with Gmail has been resolved for all affected users as of Monday, 2024-08-19 12:03 US/Pacific.

We will publish an analysis of this incident once we have completed our internal investigation.

We thank you for your patience while we worked on resolving the issue.

Affected products: Gmail

Categories: IT News

UPDATE: We're investigating reports of an issue with Gmail. We will provide more information shortly.

Google Apps Status - 19 August, 2024 - 12:04

Incident began at 2024-08-19 18:10 and ended at 2024-08-19 18:30 (times are in Coordinated Universal Time (UTC)).

We're investigating reports of an issue with Gmail. We will provide more information shortly.

Affected products: Gmail

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 15 August, 2024 - 14:49

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 15:32 (times are in Coordinated Universal Time (UTC)).

A Full incident Report is published in https://www.google.com/appsstatus/dashboard/incidents/5AjHWPCTCP5uenZRjuwz

Affected products: Google Calendar

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 15 August, 2024 - 14:47

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 15:32 (times are in Coordinated Universal Time (UTC)).

A Full incident Report is published in https://www.google.com/appsstatus/dashboard/incidents/5AjHWPCTCP5uenZRjuwz

Affected products: Google Tasks

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 15 August, 2024 - 14:46

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 15:32 (times are in Coordinated Universal Time (UTC)).

A Full incident Report is published in https://www.google.com/appsstatus/dashboard/incidents/5AjHWPCTCP5uenZRjuwz

Affected products: Google Drive

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 15 August, 2024 - 14:45

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 15:32 (times are in Coordinated Universal Time (UTC)).

A Full incident Report is published in https://www.google.com/appsstatus/dashboard/incidents/5AjHWPCTCP5uenZRjuwz

Affected products: Gmail

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 15 August, 2024 - 14:42

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 15:32 (times are in Coordinated Universal Time (UTC)).

Incident Report Summary

On 12 August 2024 at 06:20 US/Pacific, multiple Google Cloud and Google Workspace products experienced connectivity issues in europe-west2 for a duration of 40 minutes. During the time, ingress traffic to europe-west2 and egress traffic from europe-west2 experienced elevated latencies, connection timeouts, and connection failures.

Root Cause

On 12 August 2024 06:20 US/Pacific, primary and backup power feeds were both lost in a Google Point of Presence (POP) due to a substation switchgear failure. The affected POP hosts about ⅓ of serving first-layer Google Front Ends (GFEs) located in europe-west2 and some distributed networking equipment for that region. The power loss impacted the following Google products and services that depend on GFEs in that region:

  • Google Cloud APIs, Google Workspace, and other Google services like YouTube,
  • Customer-created global external application and proxy network load balancers, including Cloud CDN

The power loss also impacted the following Google Cloud products which depended on impacted networking equipment:

  • Customer-created regional external application, proxy network, and passthrough network load balancers in the europe-west2 region,
  • External protocol forwarding and VM external IP address connectivity for VMs in the europe-west2 region.
  • Google Cloud Interconnect connections in some LHR colocation facilities.

Impact was limited to situations where either or both of the following was true:

  • Inbound requests or connections were routed into the europe-west2 region of Google’s network, from the Internet, and those requests or connections depended on networking equipment that was offline, or unreachable pending reconvergence.
  • Outbound responses were routed to the Internet, from the europe-west2 region of Google’s network, and those responses depended on networking equipment that was without power.

The power outage caused Internet routes advertised by Google to be withdrawn in networks connected to Google’s network. The withdrawn routes were automatically replaced by other Google-advertised routes that didn’t depend on impacted networking equipment. Withdrawing and replacing routes relies on the BGP protocol and its timers, so replacement route convergence is not instantaneous, and overloading in the automatically selected replacement route GFEs extended the duration of the incident.

Detailed Description of Impact
  • Google Workspace: _Gmail, Google Calendar, Google Chat, Google Docs, Google Drive, Google Meet and Google Tasks users connecting to Workspace services from the UK region and surrounding areas experienced connectivity issues as described in the next point.
  • GFE-based products and services: _Customers on the Internet experienced a spike of broken connections followed by elevated latencies or HTTP error responses when communicating with GFE-powered Google APIs and services or customer-created global external application and proxy network load balancers. At roughly 06:23 US/Pacific, Google automatically redirected connections to the nearest possible first-layer GFEs with some latency penalty. Unfortunately, some of the nearest possible first-layer GFEs were overloaded until 06:48 when Google engineers made adjustments to more efficiently distribute incoming requests among nearby first-layer GFEs. Depending on the Google API or service or the customer-created global external load balancer, elevated latencies could have persisted until about 08:30 US/Pacific. Elevated latencies also could have applied to customer-created global external load balancers that had Cloud CDN enabled.
  • Regional Google Cloud products and services: _Until replacement routes were in effect, customers on the Internet experienced connection failures to the following GCP resources in the europe-west2 region:
    • Regional external application, proxy network, and passthrough network load balancers.
    • External protocol forwarding and VM external IP addresses.
  • Google Cloud Interconnect: _Google Cloud Interconnect connections in some LHR colocation facilities (lhr-zone1-47, lhr-zone1-832, lhr-zone1-2262, lhr-zone1-4885, lhr-zone1-99051 and lhr-zone2-47) remained offline from 06:20 US/Pacific to at least 06:57 US/Pacific, when power was restored.

At 06:43 US/Pacific, power was restored to the impacted networking equipment. Google networking equipment was fully operational by 06:57 US/Pacific, and connectivity to GFE-based products and services, regional Google Cloud products and services, and Google Cloud Interconnect resumed shortly thereafter.

Remediation and Prevention

Multiple Google engineering teams were alerted and automated recovery tooling was triggered as expected; however, manual adjustments were required to address subsequent first-layer GFE overload. Google is reviewing automation improvements in tasks that required manual intervention to reduce the duration of future power event impact. Similarly, Google is working to increase Cloud Interconnect control plane resilience and reduce mitigation time through automated reaction to isolation events.

Additionally Google's partner who maintains the affected facility power in LHR (London) is conducting a full root cause analysis with the switchboard manufacturer and substation owner(s) involved in supplying power, including follow up as to why stored or generated on-site emergency power did not carry loads.

Affected products: Google Docs

Categories: IT News

RESOLVED: ### **Summary:**...

Google Apps Status - 15 August, 2024 - 12:46

Incident began at 2024-08-06 15:00 and ended at 2024-08-07 10:29 (times are in Coordinated Universal Time (UTC)).

Incident Report Summary

On 06 August at 08:00 US/Pacific, Chrome OS devices experienced issues with logins globally for a duration of 19 hours 29 minutes. During this time, users on ChromeOS devices were able to complete the full sign-in process, including any Multi-factor Authentication when required. However, some users were immediately redirected back to the login page after logging in.

Root Cause

Google’s internal authentication system uses an HTTP endpoint microservice that provides clients with a list of accounts currently in the authentication cookies. This endpoint is mainly used for various account switcher user interfaces, providing a list of accounts the current session has access to.

The root cause of the issue is an incorrect configuration made to the Google authentication service, which resulted in a malformed response to the queries from this microservice in a particular condition. As a result, Chrome OS was unable to get the account details from the internal Google authentication service, which caused the sign-in failures.

Remediation and Prevention

Google engineers were alerted to the outage via a support case on Tuesday, 06 August at 15:51 US/Pacific, and immediately started an investigation. Once the nature and scope of the issue became clear, the configuration change was reverted and Google engineers ensured that sign-in worked again.

Google is committed to preventing a repeat of this issue in the future and is completing the following actions:

  • Migrate the endpoints used in Google authentication systems to a different format so that the issue is not triggered
  • Expand the integration testing environment to include more clients including Google Chrome so that all scenarios are identified
  • Include additional control mechanisms to ensure that configuration changes are approved appropriately

Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.

Detailed Description of Impact
  • On devices which were already configured, users were unable to add new users from the login screen.
  • Users were able to successfully authenticate however, and were sent back to the login screen before starting a session.
  • Users were unable to setup the first user account on their device (the device owner), resulting in the devices being reset to factory defaults

Affected products: ChromeOS

Categories: IT News

RESOLVED: **Summary:**...

Google Apps Status - 14 August, 2024 - 14:42

Incident began at 2024-08-08 19:06 and ended at 2024-08-08 23:16 (times are in Coordinated Universal Time (UTC)).

Incident Report Summary

On 8 August 2024, Gmail and Google Drive experienced service degradation globally for a duration of 4 hours and 10 minutes between 12:06 and 16:16 US/Pacific. During the incident, affected users experienced issues with email attachment and delivery functionalities in Gmail, and upload operations in Google Drive.

To our Gmail and Google Drive customers, we apologize for the impact this service disruption had on your organization. We have completed an internal investigation and are taking immediate steps to improve the quality and reliability of our services.

Root Cause

As part of a standard data operation, we restored a large volume of data into an internal Bigtable database. This operation resulted in Bigtable servers being overloaded.

During the restore operation, Bigtable identified certain tablets as ‘unloadable’ and added the corresponding entries into a [1]Chubby file, as expected. As the file size grew, it exceeded the memory limit allocated for Chubby, resulting in Bigtable being unable to write to the file.

The cumulative impact of these events led to a subset of Bigtable instances entering a degraded state, impacting read/write operations. These instances store internal data for Gmail and Google Drive, which consequently affected the performance of those services.

[1] - https://research.google/pubs/the-chubby-lock-service-for-loosely-coupled-distributed-systems

Remediation and Prevention

Google engineers were alerted to the issue on Thursday, 8 Aug 2024 at 11:58 US/Pacific via our monitoring system. Once the nature and scope of the issue were understood, our engineers devised and executed a multi-pronged approach to address the root cause and mitigate impact. The Chubby quota was increased to address the memory issue, while traffic from the affected instances was re-routed to avoid further impact. To ensure that any degraded instances were addressed, Bigtable master servers for the affected instances were successfully restarted. Impact was fully mitigated by 16:16 US/Pacific.

Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.

We are committed to preventing a repeat of this issue in the future and are completing the following actions:

  • We have completed a deep analysis of our Bigtable instances and ensured that the Chubby space quota limits and configurations are optimal across all instances.
  • We are working on enhancing our monitoring for Chubby quota usage in Bigtable to enable early detection and prevention of potential issues.
  • We will establish clear guidelines on the recommended volume of data that can be safely restored at any given time, minimizing the risk of service disruptions.
  • We have enhanced the logic for diverting traffic from Bigtable clusters to ensure smoother transitions and minimal impact on users.
Detailed Description of Impact

On Thursday, 8 August 2024 from 12:06 to 16:16 US/Pacific, Gmail and Google Drive experienced service degradation for a duration of 4 hours and 10 minutes.

Gmail

Affected users experienced issues with attachment functionality wherein they were unable to send emails or save drafts containing attachments. There was no impact to emails sent without attachments.

Google Drive

Affected users may have observed degraded performance while performing upload operations.

Affected products: Google Drive

Categories: IT News

RESOLVED: **Summary:**...

Google Apps Status - 14 August, 2024 - 14:39

Incident began at 2024-08-08 19:06 and ended at 2024-08-08 23:16 (times are in Coordinated Universal Time (UTC)).

Incident Report Summary

On 8 August 2024, Gmail and Google Drive experienced service degradation globally for a duration of 4 hours and 10 minutes between 12:06 and 16:16 US/Pacific. During the incident, affected users experienced issues with email attachment and delivery functionalities in Gmail, and upload operations in Google Drive.

To our Gmail and Google Drive customers, we apologize for the impact this service disruption had on your organization. We have completed an internal investigation and are taking immediate steps to improve the quality and reliability of our services.

Root Cause

As part of a standard data operation, we restored a large volume of data into an internal Bigtable database. This operation resulted in Bigtable servers being overloaded.

During the restore operation, Bigtable identified certain tablets as ‘unloadable’ and added the corresponding entries into a [1]Chubby file, as expected. As the file size grew, it exceeded the memory limit allocated for Chubby, resulting in Bigtable being unable to write to the file.

The cumulative impact of these events led to a subset of Bigtable instances entering a degraded state, impacting read/write operations. These instances store internal data for Gmail and Google Drive, which consequently affected the performance of those services.

[1] - https://research.google/pubs/the-chubby-lock-service-for-loosely-coupled-distributed-systems

Remediation and Prevention

Google engineers were alerted to the issue on Thursday, 8 Aug 2024 at 11:58 US/Pacific via our monitoring system. Once the nature and scope of the issue were understood, our engineers devised and executed a multi-pronged approach to address the root cause and mitigate impact. The Chubby quota was increased to address the memory issue, while traffic from the affected instances was re-routed to avoid further impact. To ensure that any degraded instances were addressed, Bigtable master servers for the affected instances were successfully restarted. Impact was fully mitigated by 16:16 US/Pacific.

Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.

We are committed to preventing a repeat of this issue in the future and are completing the following actions:

  • We have completed a deep analysis of our Bigtable instances and ensured that the Chubby space quota limits and configurations are optimal across all instances.
  • We are working on enhancing our monitoring for Chubby quota usage in Bigtable to enable early detection and prevention of potential issues.
  • We will establish clear guidelines on the recommended volume of data that can be safely restored at any given time, minimizing the risk of service disruptions.
  • We have enhanced the logic for diverting traffic from Bigtable clusters to ensure smoother transitions and minimal impact on users.
Detailed Description of Impact

On Thursday, 8 August 2024 from 12:06 to 16:16 US/Pacific, Gmail and Google Drive experienced service degradation for a duration of 4 hours and 10 minutes.

Gmail

Affected users experienced issues with attachment functionality wherein they were unable to send emails or save drafts containing attachments. There was no impact to emails sent without attachments.

Google Drive

Affected users may have observed degraded performance while performing upload operations.

Affected products: Gmail

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 12 August, 2024 - 12:33

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 14:00 (times are in Coordinated Universal Time (UTC)).

A mini incident report has been posted to https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF

Affected products: Google Docs

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 12 August, 2024 - 12:32

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 14:00 (times are in Coordinated Universal Time (UTC)).

A mini incident report has been posted to https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF

Affected products: Gmail

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 12 August, 2024 - 12:31

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 14:00 (times are in Coordinated Universal Time (UTC)).

A mini incident report has been posted to https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF

Affected products: Google Drive

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 12 August, 2024 - 12:31

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 14:00 (times are in Coordinated Universal Time (UTC)).

A mini incident report has been posted to https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF

Affected products: Google Tasks

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 12 August, 2024 - 12:29

Incident began at 2024-08-12 13:20 and ended at 2024-08-12 14:00 (times are in Coordinated Universal Time (UTC)).

A mini incident report has been posted to https://status.cloud.google.com/incidents/ETJGhvY9Xaktw7tgi8dF

Affected products: Google Calendar

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 12 August, 2024 - 08:41

Incident began at 2024-08-12 13:38 and ended at 2024-08-12 13:56 (times are in Coordinated Universal Time (UTC)).

The issue with Gmail, Google Calendar, Google Chat, Google Docs, Google Drive, Google Tasks has been resolved for all affected users as of Monday, 2024-08-12 08:35 US/Pacific.

During the issue, users connecting to Workspace services from the UK region may have experienced connectivity issues.

We will publish an analysis of this incident once we have completed our internal investigation.

We thank you for your patience while we worked on resolving the issue.

Affected products: Google Calendar

Categories: IT News

RESOLVED: **SUMMARY**...

Google Apps Status - 12 August, 2024 - 08:41

Incident began at 2024-08-12 13:38 and ended at 2024-08-12 13:56 (times are in Coordinated Universal Time (UTC)).

The issue with Gmail, Google Calendar, Google Chat, Google Docs, Google Drive, Google Tasks has been resolved for all affected users as of Monday, 2024-08-12 08:35 US/Pacific.

During the issue, users connecting to Workspace services from the UK region may have experienced connectivity issues.

We will publish an analysis of this incident once we have completed our internal investigation.

We thank you for your patience while we worked on resolving the issue.

Affected products: Google Tasks

Categories: IT News