OpenStack IO performance unreliable due to DiskScrubbing
Following a bit of reading (turns out not enough), including an article on Cloud performance on the thebitsource , we concluded that our small scale development app which relies on a single MySQL server would be more than catered for by a RackSpace cloud server. We needed a small (but consistent) IO requirement and modest memory/CPU.
The first month of service was a complete success and we began to consider migration of live systems to cloud servers; then we where suddenly hit by a dramatic IO performance drop lasting 6 hours making the instance unusable during the online day.
The average wait time for IO increased to 60ms compared to less than 1 during normal service. The RackSpace support team responded quick to our support ticket to let us know it was due to DiskScrubbing.
The OpenStack system initiates a DiskScrubbing procedure each time an OS image is deleted to ensure your data is not still lurking on the disk when you leave. Writing a lots of zeros across an area of the disk kills IO on that disk for other users.
So I guess we were either on our own on a server for the first month, or just with quiet neighbours. But we soon guessed that this problem would occur when there are API’s available to quickly create and destroy new instances. And we were not wrong. The problem came back over and over, the worst period was 18 hours of dreadful IO. We asked to be moved to a new host, which was seamless but our new neighbours were just as noisy and our service was often unusable.
Rackspace’s only resolution suggestion was to better design our app for the cloud, which would be fine if we were ready to scale bigger than a single cloud server instance, which for this particular app we are not. I had wrongly assumed that storage could be provisioned from outside of the server you are on to remove this IO bottleneck. But at time of writing only the Rackspace cloud files service was available, which they clearly state is not suitable for database environments. Feeling more than a little burnt by the whole experience we swapped for a dedicated host from UK2 which meets our needs for cost and performance. (the rackspace entry point for dedicated servers was quite a significant jump from the cloud offering.)
I’d like to try the experiment again having re-read the comments in response to the above bitsource article recommending using amazons elastic block storage (EBS) I think we would have a very different experience. But with billing on an per IO and size of disk used I think we could quickly get up to the cost of a dedicated host. If I manage to convince the developers to risk the pain again I will give it a go. Watch this space!