On April 8th, in preparation for our scheduled deployment, we started an index build. This caused the database to become unresponsive, resulting in login failures for few customers.
Our monitoring systems alerted us to the issue. In response, we initiated an index rollback to restore database responsiveness and mitigate customer impact.
To support upcoming changes in the new deployment, we followed best practices and suggestions from MongoDB and began index creation ahead of time.
However, high I/O activity on the target collection caused both index and data storage to consume significantly more space than anticipated. The increased storage and index size lead to poor performance of the database. This was a result of how our managed MongoDB service provider handles storage management internally.
As a result, the db becomes unresponsive leading to login failures. We are currently awaiting a root cause analysis (RCA) from our managed MongoDB service provider to understand the underlying cause of the issue from their side.