Reasonable programs

Context

In search of some constituents of an ideal programming language last week I finally quickly looked at Haskell. It was a bit of a disappointment, though I definitely need to actually try it on a real task.

Yet Another Haskell Tutorial (which, BTW, I didn’t like) pointed me to OCaml. The latter had both some advantages and shortcomings compared to Haskell. In any case it was evident that OCaml’s power is far below what I have in mind.

While trying to verify that my impression of OCaml was not unique I looked for some critique of it and came across a mention of Idris. A superficial study of its documentation didn’t reveal anything promising either.

After such a short excursion into the realm of functional programming languages I gave a little thought over my own ideas, some of which can now be summarized in this post.

Programs and their reasonable properties

In order to write software programs or modify them one must be able first of all to reason about them. Functional languages try to facilitate it via purity. Yet a fully pure program is impractical so they anyway have to deal with state and IO.

Looking at the problem more broadly, we should realize that the purity status is just one property of a program and other properties can contain useful information about it too.

In this post I use the term “program” for any self-contained piece of code (e.g. a function, procedure, etc).

So what information about a program can help to reason about it? An incomplete list follows:

Is it pure?
What side effects does it have?
Which global variables does it access?
What IO operations does it perform?
From a security audit point of view this point can be elaborated with a lot of detail, such as:
- Can it access files outside its working directory?
- Does it execute code generated from unsantized user input?
What is its time complexity?
What is its space complexity?
Does it allocate/deallocate dynamic memory or works entirely off of the stack?
What are its preconditions and postconditions?
What are its invariants?
Under what conditions is it re-entrant?
Under what conditions is it tread-safe?
Can it throw/raise an exception?
Does it contain imprecise numerical computations?
Is it deterministic?
Is it platform independent?
Can it be safely interrupted?

For a good programming language answers to those questions should be automatically obtainable. It must be possible to enforce constraints on any of those properties:

require that the function’s time or space complexity be O(1);
require that the function is devoid of imprecise arithmetic;
require that the function doesn’t open files;
etc.

Next, one could track a certain set of properties of each function and notice any effect on them caused by the changes made to the source code.

Let’s call such properties of programs that can be useful when reasoning about them reasonable properties.

Reasonable programming language

In an ideal programming language one should be able to query any reasonable property of a program interactively in the REPL shell. Of course it would be too much to ask from the compiler to be able to automatically evaluate the time or space complexity of a program, but at least the language must enable the programmer to derive or prove such properties of a program inside the programming environment using the actual source code (instead of the abstract algorithm and/or the data structures that the program is supposed to implement). It must be possible to state the desired reasonable properties of the functions in the same language (the only difference is that this part is evaluated at compile time). The language should only provide some basic introspection API and means to code the proofs that will be executed during compilation. The halting problem is not a problem - no one cares if compilation will eventually finish if it doesn’t complete in reasonable time, thus programs and their proofs must simply be written in such a way as to meet the compilation time limit requirements. The library of reasonable properties and related facilities should be implemented in the language itself and should grow over time as new properties are introduced.

Automatically derived or verified reasonable properties of program components (functions/procedures, classes, modules, etc) should be part of the compilation artifacts so that the compiler can take advantage of that information when compiling the higher level components.