Home Thoughts Thought Articles Master-Worker with Terracotta Case Study - Performance Results
Master-Worker with Terracotta Case Study - Performance Results
Written by James Heanly   
Friday, 28 September 2007 00:00
Article Index
Master-Worker with Terracotta Case Study
Terracotta and the Master/Worker Pattern
Learning Terracotta: An Example
Performance Results
All Pages

Performance Results

The proof-of-concept of Terracotta showed much promise, but did it actually deliver? A scenario was devised to test the scalability of the application. 89 data files were to be loaded, containing a total number of 872,998 records. These were taken from a real production system, thus giving us confidence that they constituted a realistic data set.

The Terracotta server and single master process were running on one machine. A varying number of distributed worker machines were used to process the data, with each machine running four workers. The results were as follows:

Worker Machines Workers Time (seconds)
1 4 416
2 8 261
3 12 214
4 16 194
5 20 193

These can be graphed as follows:

image007

With only one worker machine running four workers, the total time taken to load the 89 files was 416 seconds. By simply adding one more worker machine to our distributed computing system, the time taken to load the 89 files was almost halved. Further performance improvements were obtained with the addition of each new worker machine.

As can be seen from the graph above, the scalability does begin to plateau. With the addition of more workers, the database server comes under increased load, and at some point in time most likely becomes the bottleneck

Conclusion

By expanding upon Joseph Boner’s work with Terracotta and the Master/Worker pattern we were able to build a distributed computing component into our application.

At this stage, our application is running in a production environment for a customer whose data-processing requirements only warrant the use of a single machine. However, having performance-tested across multiple machines using a much larger data set taken from another production environment, we remain confident that Terracotta will be able to scale our application when the time comes.

The database ultimately proved to be the bottleneck, but given on the performance we are getting, this is something we can live with. Indeed, if you were running a process that was solely CPU-bound, who knows what scalability improvements could be achieved.

Currently in our architecture, the Masters (ie processes creating the tasks to be performed), all run on the same machine With Terracotta, it is now a trivial exercise for us to add additional Master machines to spread the Master load, should the current single Master machine become overloaded.

Terracotta has given our application some real grunt. We are looking forward to watching its performance over the next few months with additional real-world datasets and seeing how our other applications can benefit from our experience.