The January 2013 issue of “Software & Systems Modeling” contains a new empirical study on UML quality modeling. The paper is titled “Effects of stability on model composition effort: an exploratory study ” by Kleinner Farias, Alessandro Garcia, and Carlos Lucena. The full paper is available for download. Here is my review.
The paper investigates the use of model composition to evolve system design. With model composition, you start from some model A which you want to extend or change. For this modification, you have a model B that describes the extension or change. You use one of several established model composition heuristics such as “Merge”, “Override”, or “Union” to combine the two models A and B. This creates a socalled “output composed model” CM. Typically, there will be semantic and syntactic discrepancies between A and B. You need to identify and resolve these inconsistencies in CM to produce the final, “intended model” AB. According to the authors, this last step from CM to AB is particularly errorprone. Therefore, they set out to provide some guidance for this last step: can we identify model compositions with high inconsistency rates and high resolution effort?
The authors define and measure inconsistency rate and resolution effort as follows:

Inconsistency rate: the percentage of elements in CM that have an inconsistency due to the model composition. These inconsistencies are partly detected automatically (e.g. violations of UML wellformedness rules as checked by IBM RSA), and partly the result of manual verification.

Resolution effort: the number of operations (additions, changes, deletions of model elements) necessary to resolve the inconsistencies in CM to produce AB. In programming language terms, this would be called the “code churn” between two revisions of a program.
As influencing factor for inconsistency rate and resolution effort, the authors consider the stability of the model. Stability is measured by applying a selection of the size, inheritance, and coupling metrics calculated by SDMetrics to both CM and AB, and counting which metrics change their values by less than 20%. Let’s call these metrics “stable” with respect to CM and AB. The stability of CM with respect to AB then is simply the percentage of stable metrics.
The authors hypothesize that a high variation of the design characteristics of the design models increases the chance for incorrect manipulations, hence more inconsistencies. Similarly, high instability demands more restructuring modifications, hence unstable models should require higher resolution effort than stable models.
To test these hypotheses, the authors carry out a case study in which 180 model compositions are performed on three software systems (60 model compositions on each system). For each model composition, the authors measure model stability, inconsistency rates and effort resolution as described above. Both MannWhitneyU tests and Spearman’s rho confirm that stable models have significantly lower inconsistency rates and resolution effort.
My take on the study
As reports on empirical studies go, this one is very wellwritten. The data analysis was carried carefully, the discussion of the results and confounding factors is quite complete. But for my taste, the design of the study is a bit too “academic”. By that I mean that the practical value of the results is limited. My reasons for this are twofold.
 Rigorous modeling is not much used in practice, and evolving models using model composition heuristics probably constitutes only a small percentage of that. My guess is that for extending a model, most modelers will just make the necessary changes, going directly from A to AB, without any explicit B or CM in between. But that is just my uneducated guess, not based on any data, and I could be wrong with that assessment. Or things might change in the future.
 From a practical point of view, the choice to measure the independent variable (IV) “stability” between models CM and AB is unlucky: you need model AB to measure the IV, obviously. By the time you have AB, you can just as well measure the dependent variables directly. So you cannot use the IV in a predictive way, say, for decision making. I’m aware that the focus of the paper was to understand influencing factors for inconsistency resolution effort, not building prediction models. However, researchers should keep the practical applicability of the results in mind.
A more practically oriented choice of IV would be to measure instability between models A and CM, that is, the amount of change introduced just by the model composition. This data is available before inconsistency resolution. The hypotheses would be similar: the larger the changes in design characteristics when applying the model composition heuristic, the larger the potential for inconsistencies, and the more changes will be required to resolve the inconsistencies. Actually, I’m fairly optimistic that one would find similar results using the stability of A with respect to CM as independent variable.