With rapid growth comes challenges, and during the Apple event today we stumbled. This was a disappointment to our customers, who expressed their unhappiness in no uncertain terms.
Since the last major Apple event in October, we devoted significant resources to prepare for today. We believed we solved the problem. We didn’t.
Here’s what happened today:
Throughout the morning, we saw a steady climb in traffic. Our servers performed well until 12:45 p.m. when the CPU spiked on all servers responsible for serving our customers’ websites. Our team began investigating immediately, and discovered the processes responsible for serving our websites were using excessive resources. The issue was traced to a problem connecting to our database cluster from each web server. This caused the sites to lock up and become unresponsive.
Over the next hour, we tracked down a configuration setting that limited the number of database connections. It wasn’t a limit we hit during our load-tests, but it became a factor under the load of an Apple event, which is significantly higher than our average daily traffic. We immediately changed our load-balancing configuration to spread the traffic between all available servers. Our sites quickly began to stabilize.
[Graph: CPU utilization on one of our web server nodes. The peaks represent the CPU spikes, followed by a recovery once we changed our load-balancing configuration]
The platform is now up and running normally to meet the needs of our customers.
Some of our customers may have lost confidence in us. We’re committed to identifying and repairing all problems, and restoring their faith in a platform that we firmly believe is second to none. To demonstrate our commitment to this goal, we will announce a date for a public load-test within the next two weeks. Anyone interested can watch a real-time test of our technology equivalent to an Apple event.
We know we have to work hard to win back our customer’s trust. This will be a long-term effort but we are committed to doing whatever it takes to prove to our customers and their customers that ScribbleLive is the world’s leader in real-time content delivery.
We value our customers and trust they have in us every day. We look forward to our next opportunity to prove that trust is well placed.
Michael De Monte, CEO