A software experiment and an analysis of CICS

I read and took notes on two papers today. The first was Kate Finney and Norman Fenton's 1996 paper Evaluating the Effectiveness of Z: Claims Made About CICS and Where to Go From Here. It basically critiques the claims made about the benefits IBM saw from using Z in the respecificatino or parts of CICS. The more impressive claims were that the parts of the system which used Z saw 2.5 times fewer defects and that the overall project cost was reduced by 9%. Because of this, CICS is (or, at least, was) often viewed as the poster-project for formal methods success.

Basically, the paper points out that this optimism is unfounded. There is (or was - I've not seen any follow-up studies) insufficient evidence to evaluate these claims. Basically, this is because CICS was not a proper experiment. The literature is unclear on what was being compared to what and how valid those comparisons were. For example, the 9% cost saving figure seems to appear pretty much out of thin air, with no indication how it was calculated. So while CICS seems to be a success story, it should be viewed with caution.

The second paper is a 2002 article by Carol Smidts, Xin Huang, and James C. Widmaier entitled Producing Reliable Software: An Experiment. This one is interesting because it involves having two different teams independently develop the same system using different methods. There was a formal methods team that used Haskell and Specware and a conventional team the using Carnegie Melon SEI Capability Maturity Model (CMM) level 4. They were each separately contracted to build a card access system from the same specification. The contract specified reliability levels and required process documentation, design documents, and test plans. The reliability of the actual software was tested by a third party using automated test suites.

The results of both teams were pretty dismal. Neither one got close to the target 0.99 reliability for "level 1," i.e. critical, defects, with the formal team getting 0.77 and the CMM team 0.56. Furthermore, the formal methods team failed to produce much of anything in the way of documentation. A disappointing showing all around.