Gualtiero Bazzana* G. Ru S. Scotto di Vettimo
Massimo Giunchi Italtel SIT BUCT, Siemens Telecomunicazioni
Etnoteam S.p.A. Via G. Reiss Romoli, Italia
Via A. Bono Cairoli, 34 20019 Castelletto di SS Padana Superiore,
20127 - MILANO (ITALY) Settimo Milanese, (ITALY) Cassina de Pecchi (ITALY)
This paper presents pragmatic approaches and experiences aiming at software product quality improvement based on quantitative measures.
Starting from the knowledge matured in the SCOPE Project, the paper presents the fundamentals of measurement-based SW product quality improvement and two case studies from the telecom application domain.
The former focuses on the improvement in maintainability reached by a data collection and analysis campaign adopting source code static analysis techniques.
The latter describes the improvement in efficiency derived from a simple yet profitable statistical analysis of the operational profile of the systems installed in field.
Last but not least, the paper gives hints on the relationship between process quality and product quality, in terms of:
The ISO/IEC 9126 is the result of the joint committee of ISO and IEC (International Electrotechnical Commission) and is published with the title "Information technology - Software product evaluation - Quality characteristics and guidelines for their use" [ISO 9126].
This standard gives a definition of software quality in terms of factors and ought to represent the end of a long lasting discussion on this subject ([GILL 92], [ARTH 85], [BOWE 85], [BOEH 78], [DEUT 88], [FORS 89], [GRAD 92] and [VMAR 90]). Its main content is the representation of quality of software as seen by software users. Six characteristics are defined in the standard as the building blocks of software product quality. They are:
Figure 1: The ISO 9126 quality model
In addition, a short description of an "evaluation process model" is given in an informative appendix, that defines also a set of sub-characteristics that details the concepts of the above mentioned six characteristics. The evaluation process model proposed by ISO 9126 has been designed so that it may in principle be applied to any phase of the development life cycle for each component of the software product.
It consists of three main stages:
The purpose of the initial stage ("Quality requirement definition") is to specify requirements in terms of quality characteristics (and possibly sub-characteristics). Since a software product is composed of different components, the requirements may differ for the various components.
The purpose of the second stage ("Evaluation preparation") is to set up evaluation and to prepare its basis. It is refined into three steps:
An Esprit project has been dealing with the issues of software product quality evaluation and certification. This project is called SCOPE (Software CertificatiOn Programme in Europe, Esprit project n. 2151) [DENE 92]. The objectives of the project can be summarised as follows:
An evaluation module should be composed by two parts plus some optional annexes. The first part is what we call the evaluation specifications; within it, the following information must be given: general and specific definitions, target characteristic (among the six proposed by ISO 9126) and optional refinement into sub-characteristics, the evaluation technique to be used (inspection, execution analysis, static analysis or modelling), the documents required (e.g.: the product parts needed, such as: user's manual, source code, etc.), details of the assessment method and of its underlying "theory", identification of factors and metrics (unambiguous questions and formulae together with their target scales or units), clear identification of the data to be collected, cost information, required structure and elements of the evaluation report, references (to standards, to recognised theoretical work, etc.). A Specification is necessary for performing the assessment but might not be enough; this is the reason why examples of interpretation are also needed as a second part. These examples should give guidelines on thresholds, score computation and anything related to the pass/fail decision process. Some evaluation modules might also have optional annexes needed for tailoring to specific environments in terms of definitions (that is to say how do the defined items apply, given a particular product developed, for instance, in C++ and specified using SADT), data collection and tool selection. During the lifetime of the project more than one hundred such evaluation modules were defined; the most valuable ones were refined and packed in a set of about twenty that cover all the six ISO 9126 characteristics and that are publicly available.
Besides, a co-ordinated work of standardisation within ISO is currently active in order to promote the application of the Evaluator's Guide [ISO 95], defining the phases that compose the software evaluation process, the evaluation levels that correspond to different requirements of quality and the evaluation techniques that can be applied depending on the considered evaluation level. This guide has been submitted to ISO/IEC/JTC1 SC7/WG6 and is intended to support the application of the ISO 9126.
As an outcome of the SCOPE Project, several services have been set-up for SW product quality evaluation with respect to ISO 9126; in particular an European agreement (known as "Euroscope") has been set-up among several companies in Europe in order to offer harmonised services for software product quality evaluation. Specific efforts are currently devoted to off-the-shelf packages and object-oriented code certification [CHEE 95].
For an extensive presentation of both technical and managerial aspects of SW product quality evaluation, the interested reader is referred to [BACH 94]; such book covers a wide range of topics on the subject, including:
During the SCOPE Project a significant number of case studies were performed, as summarised in [BACH 94], demonstrating the pragmatic feasibility and the cost-effectiveness of the proposed approach. After the end of the project, the assessment techniques defined were brought to industry, both in the form of third-party evaluation services and in the form of internal usage by QA. In the following two case studies of measurement based product evaluation are summarised; both of them have been experienced in 1994 outside the context of the SCOPE Project, but rather as part of the activities of two big software producing units engaged in the development of challenging products in the telecoms application domain.
The experience described in this paragraph has been matured at Siemens Telecomunicazioni Italia (STI), in the context of process/ product improvement activities focused on telecommunication systems for mobile phone handling, in accordance with the GSM International Standard, phase 2. In particular, maintainability evaluation has been applied to a SW development project characterised by the following data:
The goals of the product evaluation activity can be summarised as follows:
The activities undertaken are summarised with reference to the steps of the evaluation procedure proposed by ISO 9126.
In accordance with ISO 9126 and with the requirements expressed in the Quality Plan of the project, the target of evaluation was defined as the "Maintainability" characteristic, in terms of all its sub-characteristics: Analysability, Testability, Stability, Changeability.
Table 1: Static analysis metrics for maintainability tracking
As far as rating level definition is concerned, the following
steps had to be performed: definition of lower and upper thresholds
for metrics and backward integration of results: metrics -->
sub-characteristics --> characteristics.
The first aspect was accomplished starting from a set of reference
threshold values and customising them by means of the derivation
of metrics onto a statistical valid sample (about 30 KLOC) for
which metrics were automatically collected and then their meaningfulness
was manually validated by means of code inspection.
Table 2: Thresholds for static analysis metrics
As far as integration algorithms are concerned, these were defined considering all possible combinations of values and defining a rating at characteristic level subdivided into five classes (as suggested by ISO 9126): poor, average, fair, good, excellent. In order to take into account the different importance of metrics, weighted composition methods were also used.
The metrics (calculated for each function contributing to the source code of the product) were integrated using weighted composition algorithms in order to derive a single "maintainability index" at various levels of granularity (function, process, functional area, processor, network element, product).
The maintainability index was defined in the following way: Mi = _ (wi*ni), where Wi is the weight associated to each class (poor = 0, average = 0.25, fair = 0.5, good = 0.75, excellent = 1) and Ni is the percentage of functions falling in that class. In this way MI spans in the range [0 .. 1], with 0 meaning a bad maintainability and 1 meaning an optimal maintainability. Having defined as goal a target level for MI greater than 0.70, the Quality Plan of the project stated that subsystems with a maintainability index below 0.6 had to be manually inspected in order to decide whether reverse engineering activities were needed.
The selected metrics were automatically extracted from the software product, using the Logiscope static analyser; in order to keep the human overhead to a minimum, several batch programs were developed to run and control the jobs and to produce the reports.
In order to produce outputs suitable for various classes of users (development supervisors, designers and QA team) the following reports were produced at different levels of granularity: overall quality report, list of functions subdivided by rating class, Kiviat graphs of average metric values, distribution of metrics, distribution of sub-characteristics, calling graphs among functions and, for each function falling either in the poor or in the average classes, Kiviat graphs of metrics and calling graph.
The final assessment stage involved several aspects: identification of critical components for which re-engineering activities were needed, selection of the sample for manual code inspection, analysis of the MI with respect to the defined target, study of the variation of the maintainability index during the development progress, validation of metrics and models. Such aspects are dealt in more details in the next paragraph.
The analysis of data showed that:
Figure 2: Maintainability index for several subsystems of the product under analysis
As already mentioned, the development of the system under analysis was planned in several builds; for this reason, it was quite interesting to analyse the trend of the MI across the various builds; Fig. 3 shows such trend across the three most significant builds for a number of selected design parts; the following aspects can be observed:
Figure 3. Maintainability index for some areas across various builds
The analysis of control graphs also provided useful feedback to designers, pointing out situations like:
Figure 4: Multicollinearity of static analysis metrics
Correlation between MI and fault density was also analysed (see Fig. 5) resulting in a weak inverse correlation, meaning that the higher the MI, the better the product in terms also of reliability. This correlation is anyway thought to be not fully statistically valid and thus it will have to be analysed in more details in the future.
(included in the original copy in the ESI-ISCN´95 proceddings)
Figure 5: Correlation between Maintainability Index and Failure Density
It is possible to say that the adoption of static analysis techniques was positive since designers focused their reverse engineering efforts on troublesome modules, in accordance with a Pareto strategy.
As a consequence of the results of the evaluation activities, the following steps have been planned:
The experience described in this paragraph has been matured at Italtel SIT BUCT Linea UT, in the context of a long-lasting process/ product improvement effort [DAME 95]. In this context the main goal of the improvement program is the application of a Plan-Do-Check-Act scheme able to check and measure the products and the development process in a quantitative way, in order to single out, implement and monitor the improvement opportunities (as shown in Fig. 6).
(included in the original copy in the ESI-ISCN´95 proceedings)
Figure 6: Application of PDCA to process/ product improvement
The improvement program has its roots in the Quality Management System and is constantly kept under control by means of a measurement system [DAME 93].
Within the many improvement activities undertaken in the last years, the following ones are directly related to product quality evaluation/ improvement:
The analysis started from the collection of alarm logs produced during one month (October 1994) by eight major switches operating in field, characterised by varying size and typology.
Data was then statistically analyzed, paying attention in particular to the following aspects:
A first interesting thing to note was the fact that, among 900 types of events, a group of only 5 contributed to about the 40% of alarms notified, as shown in Fig. 7.
Figure 7: Pareto distribution of alarm logs
Moreover, among the first 20 events more frequently notified, it was notable to distinguish that only one was pertinent to software failures, whereas the others were related to periodic audits or to anomalies in the network.
The distribution of events during the day (see Fig. 8) was also very helpful to discriminate the events that caused the biggest overhead.
(included in the original copy of the ESI-ISCN´95 proceedings)
Figure 8: Distribution of events during the day
Fig. 9 shows the optimisation in operability (in terms of number of printed pages per month) that could be obtained simply by removing a very limited number of alarms that provided no major contribution in the monitoring of the status of the equipments.
Figure 9: Effects of improvement actions onto operability issues
It is possible to say that the adoption of the described techniques proved to be very useful since it provided valuable insights at both organisational and technical levels at a very low cost.
As a consequence, activities will be pursued with the following aims:
The relationship between product and process evaluation/ improvement is somewhat controversial [VOLL 93]: which one gives the highest return on investment? Are both necessary? Is one the pre-condition for the other?
The feeling of IT representatives is traced by an European wide awareness survey (reported in [BAZ1 93]), that had the following goals:
Figure 10: Appraisal of ISO 9000 certification with respect to the quality of delivered products
(included in original copy in the ESI-ISCN´95 proceedings)
Figure 11: Software product evaluation and other issues
This concept is heavily underlined when ISO 9000 certification was considered: a very low percentage of the interviewed declared that such certification can give a sufficient guarantee of the quality of the delivered products. Product and process are closely linked and cannot be separated when quality is analysed: this is confirmed by Fig. 12, showing that most people ask for a combined assessment (both process and product).
(included in the original copy in the ESI-ISCN´95 proceedings)
Figure 12: Reasons why ISO 9000 is felt necessary but not sufficient for guaranteeing software product quality
We stress the tight relationship that is perceived by all interviewed people, considering that no major difference in the judgement is evident looking at different roles or application domains or country (indeed, where ISO 9000 registration is more widespread - e.g. UK - the awareness of the need to combine it with product evaluation is particularly strong).
Process improvement is sometimes advocated as the new silver bullet for software engineering. Indeed many companies are devoting a great deal of efforts and investments in order to set-up quality systems and raise the maturity of the software development process [HERB 94]. These efforts are based on the assumption that the best practices of software development have a positive impact on the ultimate goals of a software producing unit, namely: timeliness, productivity and quality. We feel that a slight problem might exist: it is very difficult to quantify the gains and make them tangible; moreover, it is even harder to find out quantitative relationships between process maturity level and the achievements at company-wide level. This is a problem for the widespread adoption of process improvement.
Several experiences (for instance [BAZ2 93] [HERB 94], [DAME 95], [CUSU 91], [SEL 91]) report the impacts in timeliness, productivity and quality due to the adoption of best software engineering practices. The reader is however always a bit sceptical since the effects are indirectly inferred: there is no quantitative evidence that the good results were in fact a consequence of process improvement. Might be it was due to a change of project manager, or to good luck, or whatsoever. What is meant is that the software engineering community needs quantitative data showing evidence of positive correlation between process maturity levels and project results. This would be much more effective than any theoretical assumptions about good engineering practices. The reader will recognise suddenly that this approach has an intrinsic problem: we find difficult to have quantitative values for process maturity and stability of the development process, in order to track improvements and their effects on final goals. This might be due to the following reasons:
Bringing improvements to the development process of a complex software producing unit is a time consuming, trial-and-error activity. Unless the organisation is very rigid, you cannot think that process improvements become consolidated in a short time interval; rather, different attitudes will co-exist for quite a long time. Thus some projects might fully adopt new practices (these are commonly known as 'pilot projects'), some others could incorporate new issues to a limited extent, some others will keep on with the old practices; this can be due to various reasons, like: conclusion of a big project, severe schedule pressure, psychological resistance, etc.
The SEI has proposed a set of software measures [BAUM 92] that are compatible with the measurement practices of the Capability Maturity Model (CMM); within this set there are two very interesting process related indicators which provide information on the stability of the process by monitoring the number of requests to change the process and the number of waivers to the process. Such indicators are "Process Change Requests" and "Waivers from Process Standards", and their interpretation is as follows:
Tab. 3: Behaviours and Waivers - an example
Based on the afore mentioned assumptions, the "Process Standardisation" indicator is computed as follows:
PS = S ((wi - 1)* ci) / ((maxw -1)* ci)
where:
wi = weight associated to the adopted behaviour (in Table 3: [1..5]);
ci = number of exploited activities;
maxw = weight corresponding to the full adherence to QMS and process improvement (in Table 3: the value 5).
Process Standardisation ranges between 0 and 1, where 1 represents a situation of complete adherence to the QMS, 0 corresponds to complete deregulation, while intermediate values provide the difference between what defined into the QMS and what is actually applied. The indicator can be computed at various levels of granularity. For the purpose of keeping track of process improvement effects, it is felt that it is useful to compute it for each project both at global level and for each of the major phases of the development process. The approach has been applied to three major releases of the Linea UT switching system, and has proven to be very fruitful for the following reasons:
The fundamental concepts presented in this paper can be summarised as follows:
Concerning the SCOPE Project, it is worthwhile to remember the former partners of the Consortium: Atomic Energy Authority (UK), The City University (UK), Dublin City University (Ireland), ElektronikCentralen (Denmark), Etnoteam (Italy), Gesellschaft fur Reaktorsicherheit (Germany), Glasgow Polytechnic (UK), GMD (Germany), Institut Catala de Tecnologia (Spain), University of Strathclyde (UK), Verilog (Prime Contractor, France) and VTT (Finland). The SCOPE Project was supported by the Commission of European Communities - DG XIII; in particular we are indebted to the project officers David Callahan and Brice Le Pape.
The experiences matured at Italtel SIT BUCT Linea UT and are part of a long-lasting process/ product improvement effort that has been supported and sponsored by S. Dal Monte, U. Ferrari and G. Damele. For the specific improvement actions described in the paper we have to thank: MC. Aletti, F. Aquilio, MG. Corti, L. Giovanelli, G. Panzeri, G. Pisano, F. Pompili, D. Scrignaro, G. Vailati.
The experiences matured at Siemens Telecomunicazioni Italia fall within a major effort in software quality management applied to a mobile phone development project, supported and sponsored by G. Cecchetto, E. Pietralunga and G. Vulpetti. For the specific experience described in the paper we are indebted to: O. Balestrini, L. Barbieri, P. Bettoni, R. Delmiglio, G. Falzoni, B. Ferri, S. Finetti, A. Lora, A. Manini, B. Marelli, B. Montanari, L. Travaglini.
Finally we have to thank O. Fouillouze and M. Maiocchi for the support given in the set-up and supervision of activities.
[AMI 92] A. Kuntzman-Combelles, P. Comer, J. Holdsworth, S. Shirlaw
"Metrics Users' Handbook"
AMI Project, Cambridge, 1992
[ARTH 85] L.J. Arthur
"Measuring Programmer Quality"
John Wiley and Sons, 1985
[AZUM 93] M Azuma
"Information Technology - Software Product Evaluation - Indicators
and metrics", Working Draft within ISO/JTC1/SC7/WG6, Project
7.13.3, 1993
[BACH 94] R. Bache, G. Bazzana
"Software metrics for product assessment"
Mc Graw Hill, 1994
[BASI 81] V. Basili, D. Weiss
"A methodology for collecting valid software engineering
data"
IEEE Trans. on Software Engineering, Vol. se-10, November 1981
[BAUM 92] J.H. Baumert, M.S. McWhinney
"Software measures and the Capability Maturity Model"
CMU/SEI-92-TR-25, September 1992
[BAZ1 93] G. Bazzana, R. Brigliadori, O. Andersen, T. Jokela
"ISO 9000 and ISO 9126: friends or foes?"
Proceedings of IEEE Software Engineering Standards Symposium,
Brighton, September 1993
[BAZ2 93] G. Bazzana, P. Caliman, D. Gandini, R. Lancellotti,
P. Marino
"Software management by metrics: practical experiences in
Italy"
10th CSR Workshop, Amsterdam, October 1993
Published in:
Norman Fenton, Robin Whitty and Yoshinori Iizuka
"Software Quality Assurance and Measurement - A worldwide
perspective"
International Thomson Computer Press, 1995
[BAZ3 93] G. Bazzana, G. Damele, M. Maiocchi, G. Zontini
"Applying Software Reliability Models to a large industrial
dataset"
Information and Software Technology, Dec. 1993
[BAZZ 95] G.Bazzana, R. Delmiglio, A. Lora, O. Balestrini, S Finetti
"Quantifying the Benefits of Software Testing: an Experience
Report from the GSM Application Domain",
Proceedings of Objective Quality Conference, Florence, May 1995
Published by Springer-Verlag in "Lecture Notes in Computer
Science", N° 926
[BOEH 78] B.W. Boehm et al.
"Characteristics of Software Quality"
TRW series of Software Technologies, Vol. 1, North Holland, 1978
[BOLL 92] T.B. Bollinger, C. McGowan
"A Critical Look at Software Capability Evaluations"
IEEE Software, July 1992
[BOOT 94] P. Kuvaia, J. Simila, L. Krzanik, A. Bicego, S. Saukkonen,
G. Koch
"Software process assessment & improvement: the Bootstrap
Approach", Blackwell, 1994
[BOWE 85] T.P. Bowen, G.B. Wigle, J.T. Tsai
"Specification of Software Quality Attributes" Volumes
I, II and III
Rome Air Development Centre, RADC-TR-85-37, 1985
[CHEE 95] M. Cheek
"ESPRIT's Legacy of Software Evaluation and Certification",
IEEE Software, May 1995, pages 90-91
[CUSU 91] M.A. Cusumano
"Japan's software factories"
Oxford University Press 1991
[DAME 93] G. Damele, G. Bazzana, M. Giunchi, G. Rumi
"Setting-up and using a metrics program for process improvement"
AQuIS'93, Venice, October 1993
[DAME 95] G. Damele, G. Bazzana, M. Giunchi, G. Caielli, M. Maiocchi,
F. Andreis
"Quantifying the Benefits of Software Process Improvement
in Italtel Linea UT Exchange"
International Switching Symposium 95, Berlin, April 1995
[DAME 96] G. Damele, G. Bazzana, F. Andreis, F. Aquilio, S. Arnoldi,
M. Giunchi
"Process Improvement through Root Cause Analysis"
Accepted for presentation at AQUIS '96, Florence, January 1996
[DENE 92] B.De Neumann, G. Bazzana
"A Methodology for the Evaluation / Certification of Software"
3rd European Conference on Software Quality Assurance, Madrid,
November 1992
[DEUT 88] M.S. Deutsch, R.R. Willis
"Software Quality Engineering"
Prentice-Hall, 1988
[FORS 89] T. Forse
"Qualimetrie des systèmes complexes"
Les Editions d'Organisation, 1989
[GILL 92] A. C. Gillies
"Software quality - Theory and management
Chapman & Hall Computing, 1992
[GRAD 92] R.B. Grady
"Practical software metrics for project management and process
improvement"
Prentice-Hall, 1992
[GRAD 94] R.B. Grady
"Successfully applying software metrics"
IEEE SW, Vol.27, No.9, September 1994
[HERB 94] J. Herbsleb et al.
"Benefits of CMM-based Software Process Improvement: Initial
Results"
SEI Technical Report, August 1994
[HUMP 89] W.S. Humphrey
"Managing the Software Process"
Addison-Wesley, 1989
[ISO 9000-3] ISO 9000-3
"Quality Management and Quality Assurance Standards - Part
3: Guidelines for the Application of ISO 9001 to the Development,
Supply and Maintenance of Software"
ISO, September 1990
[ISO 9126] ISO/IEC 9126
"Information technology - Software evaluation - Quality characteristics
and guide-lines for their use"
ISO, December 1991
[ISO 95] ISO/ IEC JTC1/ SC7 N1317 (Editor: P. Robert)
"ISO/ IEC CD 14598 - 5.2 Information Technology - Evaluation
of software product - Part 5: Evaluator's Guide""
ISO Committee Draft, Jan 1995
[KUMI 93] H. Kumi
"Quality Management by ISO-9000 and by TQM"
Proceedings of EOQ 93, Helsinki
[MCCA 76] T.J. McCabe
"A complexity measure"
IEEE Transactions on SW Engineering, 1976
[MOLL 92] K.H. Moeller, D. Paulish
"Software Metrics: a practitioner's approach to improved
software develoment"
Chapman & Hall, 1992
[PAUL 91] M.C. Paulk, B. Curtis, M.B. Chrissis, E. Laverill, J.
Bamberg, T.C. Kasse, C. Timothy, K. Konrad, J.R. Perdue, C.V.
Weber, J.V. Withey
"Capability Maturity Model for Software"
SEI, Carneige Mellon University, Pittsburgh, 1991
CMU/SEI-91-TR-24, ADA240603
[PQMI 88] AT&T
"Process Quality Management and Improvement Guidelines"
AT&T Quality Steering Committee, Issue 1.1, 1988
[RAE 95] A. Rae, P. Robert, H. L. Hausen
"Software Evaluation for Certification - Principles, Practice
and Legal Liability"
Mc Graw-Hill, 1995
[ROBE 91] P. Robert, F. Seigneur
"A modular approach for software product assessment - The
brick concept"
In 2ème Rencontre Qualité Logiciel & Eurometrics
91. Actes et catalogue de l'exposition (1991)
[SCHN 92] N. Schneidewind
"Methodology for Validating Software Metrics",
IEEE Transactions on Sw Engineering, Vol.18, No.5, May 1992, pp.
410-422
[SEL 91] SEL, NASA Goddard Space Flight Center
"Software Engineering Laboratory Relationships, models and
management rules"
SEL-91-001, February 1991
[VMAR 90] A.Von Maryhauser
"Software engineering methods and management"
Academic Press, 1990
[VOLL 93] T.E. Vollman
"Software quality assessment and standards"
IEEE Computer, June 1993
I.S.C.N. International Software Consulting Network
Tel: +353 1 286 1583, Fax: +353 1 286 5078
email: office@iscn.ie