Users have faced a lot of 500 server errors from 2:46 am UTC on the morning of 10th Oct, 22. SRE has been paged at 4:31 am UTC and noticed it was a storage issue. The storage caused errors while creating new files in storage for cache or message content, etc.
Our SRE team started cleaning up the storage at 4:57 am UTC and put mitigation measures in place so storage utilization would be better managed and would send alerts in case of failure.