Programming models have been an important research question in parallel computing all the time and become even more important because programmers need to use different levels of parallelism starting with the machine instructions and ending at the global software design at the same time.
We observe that certain characteristic features of programming models have been valid for a long time while others are changing faster. This allowed to develop very sophisticated simulation programs that accumulated know-how over a long time. But, we feel an increased pressure to refactor such applications caused by the more fundamental change from single- to multi- and many-core processors. This justifies the need of programming models supporting evolutionary refactoring allowing to keep the valuable code base and making modern computer hardware useable for those codes.
An example of such a refactoring is an improvement of the scalability in Dalton, an application in computational chemistry that has a very long history back into the 1980s. More and more computational methods have been added to it in the course of time. They have been implemented with a very flexible software-design using a master-worker pattern that supports good load balancing and the use of parallel computers with very different processor numbers. The classical design-pattern foresees one master controlling all workers executing the computational tasks.
A performance analysis showed how the centralised communication became a bottleneck due to the increasing number of available processor cores. On the other hand, the computational model of the application combines the desired final result from independently computed contributions.
I could therefore restructure the application in a way that the single master has been replaced by a team of masters where each of them controls smaller groups of worker processes. Each master takes responsibility for one of the independent computations and exchanges the contribution of his group with the other masters. This approach leads to collective communications between smaller worker collectives that can be coordinated with fewer overhead as well as to a faster combination of different contributions into the final result.
The illustration shows a scheme of the refactored master-worker design and an execution trace of two iterations demonstrating a more than doubled overall performance due to the use of two master-worker groups together with an improvement of the single-node performance in master computations.
First results have been recently published and will be extended by applying dynamic load balancing and maybe even hybrid programming models.
 Xavier Aguilar, Michael Schliephake, Olav Vahtras, Judit Gimenez, Erwin Laure: Scaling Dalton, a molecular electronic structure program. IEEE 7th International Conference on E-Science, e-Science 2011, Stockholm, Sweden, December 5-8, 2011. DOI: 10.1109/eScience.2011.43.