Friday, May 17, 2013

Formal systems, axioms, inference rules, formal proofs

I'll start this post by describing a simple formal system that I made up.

formal system is comprised of axioms and inference rules. Each axiom and inference rule is defined syntactically, that is, by ordered sequences of symbols that follow strict syntactic rules but do not (necessarily) have any meaning. The set of all symbols allowed in a formal system are explicitly listed in its alphabet, simply, a finite, non-empty set of symbols.

My formal system, let's call it FS,  uses the alphabet {E,0,1} and has only one axiom and two inference rules. Here is the axiom (in the box below):

Axiom 1 [A1]:
 E0 

And here are the two inference rules in FS (in the boxes below):

Inference rule 1 [IR1]:
(where α is any non-empty finite sequence of 0's and 1's)

E0α

Inference rule 2 [IR2]:
Eα
(where α is any non-empty finite sequence of 0's and 1's)

E1α

It is important to realize that, in the two inference rules above, the symbol α is NOT an acceptable sequence of symbols in FS. It cannot be, since α does not belong to the alphabet of FS. Instead, α is a placeholder that stands for any non-empty finite sequence of 0's and 1's. So, for example, α could stand for 0. Or it could stand for 1. Or it could stand for 1101. In fact, in each application of the inference rules above, α will stand for a single finite sequence of 0's and 1's chosen from the infinite set of such sequences. 

But what do we mean by an "application" of an inference rule? To answer this question, let's first discuss what formal systems are used for. The goal of a formal system is to formalize or mechanize the proofs of a well-defined set of theorems.  Each sequence (or string) of symbols produced by the formal system is a theorem

There are two ways for a formal system to "produce" or "prove" a theorem: using an axiom or using an inference rule. 

First, an axiom is, by definition, a theorem. So, going back to our FS, its first theorem is the string E0. Why? Because E0 is an axiom of FS! That is the proof of E0 in FS. There is no other justification needed. Each axiom of a formal system is a theorem of that formal system.

Second, and more interestingly, theorems can be produced (or proved) using inference rules and already proven theorems. As you can tell from the two examples above, an inference rule has two parts, separated by a horizontal line. Above the horizontal line comes the premise of the inference rule (there could be more than one premise). Below the horizontal line is the conclusion of the inference rule (there is exactly one conclusion). Each inference rule can be interpreted as: "If the premise is a theorem of FS, then the conclusion is also a theorem." So each application of an inference rule is a way of proving a new theorem given previously proven theorems.

Let's illustrate this point with several applications of the rules [IR1] and [IR2] in FS.

Assume that we start the process with an empty set of theorems, denoted by { }.

At this point, neither [IR1] nor [IR2] can be applied because we have not yet proved any theorems, let alone any theorems that match the form of their premise. However, we do have an axiom to kick start the theorem proving process. Indeed, axiom [A1] proves the theorem E0.

After this first proof, our set of theorems is as follows: { E0 }

Now that our set of theorems is not empty any more, we can check if any of its elements matches the premises of [IR1] or [IR2]. It turns out that E0 does match the premise of [IR1], since 1) both E0 and Eα start with the symbol E and 2) 0, being a non-empty finite sequence of 0's and 1's, can replace α. Since the premise of [IR1] is true (that is, it is the theorem of FS that we prove in the previous step), we can apply the inference rule [IR1] and, doing so, we prove its conclusion, namely the theorem E00. How did we obtain this new theorem? We started with E0α and replaced α with 0, which was the match in the premise: α must be replaced uniformly throughout all parts of the inference rule being applied.

After this second proof, our set of theorems is as follows: { E0 , E00 }.

What choices do we have for the next proof? We could:
  1. Use axiom [A1]
  2. Apply [IR1] to E0
  3. Apply [IR2] to E0 (with α replaced by 0)
  4. Apply [IR1] to E00 (with α replaced by 00)
  5. Apply [IR2] to E00 (with α replaced by 00)
Since we already used the first two options, there is no point in choosing them again: each would yield a theorem that we have already proved. But we can pick any one of the three other options. Let's choose option 5. Its premise matches E00 with α replaced by 00; therefore its conclusion yields the theorem E100.

After this third proof, our set of theorems is as follows: { E0 , E00 , E100 }.


What UNUSED options do we have for the next proof? We could:
  1. Apply [IR2] to E0 (with α replaced by 0)
  2. Apply [IR1] to E00 (with α replaced by 00)
  3. Apply [IR1] to E100 (with α replaced by 100)
  4. Apply [IR2] to E100 (with α replaced by 100)
Why don't you pick one of these options and determine which new theorem it proves? (the answers are given below) As you can tell, the number of theorems in our set increases fast. To every new theorem we prove, we can apply the two inference rules of FS to produce two new theorems.

So, how many theorems does FS generate in total?

That's right! This simple formal system can prove an infinite number of theorems, even though each one of these theorems is comprised of a finite number of symbols from the alphabet of FS. By the way, here are the theorems produced by the four options above, in order: E10E000E0100, and E1100.

So, after these four additional proofs, our set of theorems is as follows:

E0 , E00 , E100 E10 E000,  E0100 ,  E1100 }

Now, let's be more precise about the concept of proof in a formal system. A formal proof is a numbered sequence of theorems, each of which is justified by the use of an axiom or the application of an inference rule to one or more previous theorems in the sequence. Each step or theorem in the proof must be justified. The last line in the proof is the theorem that the proof establishes, while the previous lines are intermediate theorems, sometimes called lemmas. 

As an example, here is the proof of the theorem E0100 in FS
  1. E0 [by A1]
  2. E00 [by IR1 applied to theorem on line 1]
  3. E100 [by IR2 applied to theorem on line 2]
  4. E0100 [by IR1 applied to theorem on line 3]
In the sample proof given above, the justifications are given in square brackets.

Now your turn. Here are a few exercises:
  1. Is E1010 a theorem of FS? Prove your answer.
  2. Is E0101 a theorem of FS? Prove your answer (Watch out: this one is tricky).
  3. What is common to all of the theorems of FS ?
  4. What is common to all of the non-theorems of FS, that is, to all of the strings built upon the alphabet of FS that are not theorems of FS ?
  5. Can you find an interpretation or meaning for FS, that is, a way to interpret the theorems of FS as true sentences in a restricted domain and the non-theorems of FS as false or nonsensical sentences in the same domain?
The main learning outcome for these exercises is for you to realize when you are working INSIDE the formal system and when you are working OUTSIDE the formal system.

More on this in a later post.

No comments:

Post a Comment