I spent a couple of hours today working up a (hopefully) fuller summary of the ideas I posted yesterday. I tried to come up with a more drawn out summary of what I'm thinking. The idea is that I can develop these into a research outline and start working from there.
The two topics fit well into the research I already have. The research-oriented topic calls for case studies, of which I have over a dozen. I can also probably also use some of the general discussion type papers I have. I also have lots of papers on formalization of language constructs, language definitions, and even a few examples of formal verifications.
Anyway, below is an adaptation of the topic write-ups I sent to my advisor. I'll see what he things of them next week. I'm leaning toward the first topic as the better option, but I'm not certain if it has a sufficient theoretical aspect. In the mean time, I'll start reading up on my case studies this weekend.
Lessons Derived from Industrial Case Studies for Other Problem Domains
An analysis of various case studies on use of formal methods in industry, the results, and an attempt to draw conclusions regarding how these methods can be applied to other domains. This will be a three-part analysis, with the first part being a summary and analysis of the available data, the second part a comparison of the methods involved, and the third an attempt to generalize from the conclusions from the preceding parts.
Formal methods are normally applied to either very large and complex systems (CICS, for example) or to systems where safety and/or security are of paramount importance (SHOLIS, CDIS). Many case studies of such projects have
been published, as well as many papers discussing the lessons that can be drawn from these studies. In the first part of the thesis, I will go through many of the available studies (as well as commentaries on them) and select a promising subset for analysis. I will examine the type of project, the formal methodology used for it, and the results of the projects, with emphasis on the effects of formal methods.
The second part will be an analysis of the various formal methods involved in the case studies. There will probably be at least three distinct formal methods involved, maybe more. These will most likely include Z, which has been widely used and studied for some time, and SPARK Ada, because of the sheer volume of papers Praxis Critical Systems has published about it. These two are also likely candidates because of their differences: Z is a specification language, whereas SPARK uses verification conditions in the source code. The idea of this portion is to compare and contrast the different methods, the mathematical basis of each, and to what extent they are compatible or complementary. Basically, this will be the more mathematical part of the paper.
The last portion will be an examination of how the lessons learned in the first two parts can be applied to projects outside the typical formal methods domain. In particular, I am thinking of small to medium sized (as compared to, say, CICS) business and consumer type projects. The idea is to analyze the possible benefits of applying the previously discussed formal methods and what difficulties would likely arise. Areas of investigation would probably include the availability and complexity of tools, the mathematical background required to effectively use the formalism, suitability of the formalism for capturing business requirements, and how likely the target systems are to benefit from formalism. In other words, a discussion or how the formalisms can be applied to business problems and how difficult it would be for a business programming team to apply them. The goal of this section would be to draw conclusions on the appropriateness of the studied methods for business type projects and develop some guidelines for their use.
If required and/or warranted, an interesting addendum to this might be to actually attempt to apply a particular formal method to a small business project and analyze the results. I can think of a few small business or home
user type applications that I could probably build in a reasonable time frame, even with the overhead of learning to apply a formal analysis method. However, I'm not sure if that would be going overboard or not. Since the main point would be discussing the issues of applying the formal methods, I suspect mini-projects or examples might serve better than a full-blown application.
Development of a Formal Semantics for a Popular Programming Language
An application of existing work in the formal semantics of imperative programming languages to a meaningful subset of a language in popular use in the commercial development sector. This would attempt to define a formal semantics for a usable portion of a popular scripting language such as PHP (my current first choice) or Python (if I decided to focus on object orientation). The semantics would most likely be an axiomatic semantics based on the first-order predicate calculus (as that's what I'm most familiar with). However, I've also seen some some set-based approaches that look like they might be usefule.
Since Hoare Logics were introduced in 1969, there has been a great deal of work done in mathematically defining the meaning of various programming language constructs. Many methods have been used, from Dijkstra's weakest
precondition predicate transformer approach, to fixpoint treatments of recursion (which I need to read up on), to axiomatic definitions of object-oriented languages. However, relatively little of this has filtered down from academia to programming practitioners in the business sector. In part, this is due to the fact that much of the work on formal semantics has been done on toy languages, i.e. "Algol-like" and "Java-like" languages, as opposed to languages that are actually in widespread use.
My idea is to select a "non-trivial subset" of my language which can be used to build relatively simple, but real-world, systems. I already have a fair amount of source material to draw on for this, including Gries and Dijkstra's work on the basic control structures, Cavalcanti and Naumann's work on the axiomatic semantics of object-oriented languages, as well as various papers on particular language structure, formal verifications of other systems, and formal definitions of "toy" languages. I would apply and adapt these to my target language to try to build a complete semantics for my subset.
My first choices of language is probably PHP, because (1) I am quite familiar with it, (2) it is one of the most popular languages currently used for web applications, making the formalization of it of "practical" use, and (3) because it is not particularly complicated, using a fairly simple model for object orientation and lacking such features as runtime exceptions. I understand that PHP 5 has added some of this, but I'll most likely stick to PHP 4, since it is still being actively maintained and is still widely deployed, makeing it a reasonable target.
With regard to the structured programing type constructs of PHP, I believe a great deal of work can be transferred more or less directly from the semantics of Algol-like languages (Gries and Dijkstra) and from work done on
C. Non-critical structure (perhaps switch blocks and break and continue statements) can probably be dispensed with. I will probably leave out object oriented programming as well, since it complicates things and doesn't seem especially popular in the PHP community anyway. As for the standard library and various different configuration settings, I think the best approach would be to formalize only what I need for examples and such.
There are several aspects of the language of which I have not seen much, if any, formal discussion. One is that it is a weakly and dynamically typed language. Apart from object-orientation, I have not seen a great deal of work on axiomatic formalizations of data typing. Because PHP is weakly typed, there will probably be some interesting work to do with defining a formal semantics for type conversion. Also, I do not think I have seen any formalization of a dynamically typed language. I am not yet positive what impact this feature would have on the semantics.
Another interesting feature would be PHP's array structures, which are a strange hybrid of a dynamically sized array and a dictionary/hash table. Trying to capture the semantics of this dual-purpose structure could make for an interesting challenge. Of course, if this proved too complicated, I could always place restrictions on its use, e.g. formalize only the array aspect or disallow the joining of the two aspects in the same variable.
You can reply to this entry by leaving a comment below. You can send TrackBack pings to this URL. This entry accepts Pingbacks from other blogs. You can follow comments on this entry by subscribing to the RSS feed.