Over the past two days, Harvest has had two very short outages. On both Monday September 23rd at 5:30am EDT and Tuesday September 24th at 7:40am EDT, Harvest was unresponsive for around 3 minutes. These events are both caused by the same problem and we are working to resolve the issue as fast as possible. At 10pm EDT on Tuesday September 24th (what time is that for you?), we’ll be performing a brief database maintenance to resolve the issue. We don’t expect any service impact from this maintenance. Let me get into some of the technical issues behind these outages.
Over time our main database has grown steadily in size, and at a certain point the database becomes larger than the memory allocated to the database software on the servers that it is running on. There is an ancient art involved in getting this memory allocation just perfect, and we’ve found over time that it is possible to allocate too much memory to the database, and suffer poor performance as a result. We’ve found that gradually increasing the allocated memory as the database grows works well. Recently our databases have grown large enough that it has become more involved to restart a database server to increase its memory allocation, and to put that server directly back into action. We have large enough databases now that a database server with cold caches doesn’t perform well when put back into production. We need to warm the server’s cache gradually before the server can become a master server in our database cluster.
So we are left with a slightly more challenging situation than we had previously, and have had to adapt our procedure. The net result is that increasing the database memory allocation needs a new procedure, and it needs to be done in a staggered fashion, and the recent availability issues have been the result.
I apologize for the two issues with Harvest yesterday and this morning. We are taking the final steps to resolve this issue in a brief database maintenance tonight, Tuesday September 24th at 10pm EDT. We don’t expect the Harvest service to be impacted by this maintenance. Thanks for your patience, folks!