Earlier today, a serious problem affected customers using Harvest between the hours of 4 a.m. EDT and 9 a.m. EDT. During this time, Harvest users temporarily lost access to their assigned projects and tasks, and were not able to track time or view time associated with those projects. By 9:18 a.m. EDT, all accounts were restored, no data was lost and all services are back to normal.
For customers who experienced the problem this morning – we are very sorry for the trouble this has caused. We know how important Harvest is to you, and we’re extremely disappointed with this incident given our core focus on stability and reliability.
In the name of transparency, we want to share with you what happened and let you know the safeguards we’re putting in place against situations like this in the future.
The Details
Our investigation started at 6:00 a.m. EDT after learning that customers could not see some of their assigned projects and tasks. We took the application offline at 7:25 a.m. EDT to fix the problem and restore data from our backup. We were able to bring back service at 8:46 a.m. EDT, and the system was fully restored and back to normal by 9:18 a.m. EDT.
For the technical audience, the root of this morning’s troubles were elements of the locking interaction update we released yesterday. An automated task changed data it was not designed to change, and disassociated some records from customers’ companies in the database.
We have put many safeguards in place – redundancy with servers of all levels, backups upon backups, and an exceptionally strong development team that take care of the infrastructure, performance and security of your data. Accidents, like what happened this morning, can happen, and we are prepared to deal with the problems, as we have done this morning.
Can we do better? Definitely. We wish we were alerted to the problem much earlier, diagnosed the cause quicker, and restored the database in a shorter time span. Our monitoring tools are designed to detect issues like this, and failed in this case, causing a longer restore time. We will be reviewing this incident closely in the coming days and make all necessary adjustments with our development process, support cycles and our alerts protocols.
Harvest is known for its stable and reliable service. As a cloud service, we are susceptible to occasional issues – this is a byproduct of a constantly evolving application. However, this isn’t an excuse for outages of any kind. Uptime is core to our business and we will continue to strive for excellence.
Thank you for your understanding and patience while we resolved this issue as quickly as possible. If you have any further questions or concerns, please let us know. You can reach us at support@getharvest.com or via phone at +1 212-226-4160.
Danny Wen & Shawn Liu, Co-founders