Theorem proving, resolution
Luger: 2.3, 13, (15)
Why theorem proving in an AI course?

proving theorems is considered to require high intelligence

if knowledge is represented by logic, theorem proving is reasoning

theorem proving uses AI techniques, such as (heuristic) search

(study how people prove theorems. Differently!)
What is theorem proving?
Reasoning by theorem proving is a weak method, compared to experts
systems, because it does not make use of domain knowledge. This, on the
other hand, may be a strength, if no domain heuristics are available (reasoning
from first principles). Theorem proving is usually limited to sound
reasoning.
Differentiate between

theorem provers: fully automatic

proof assistants: require steps as input, take care of bookkeeping and
sometimes 'easy' proofs.
Theorem proving requires

a logic (syntax)

a set of axioms and inference rules

a strategy on when how to search through the possible applications of the
axioms and rules
Examples of axioms
p > (q>p)
(p>(q>r)) > ((p>q) >(p>r))
p \/ ~p
p>(~p>q)
Notation: I use ~ for "not", since it's on my keyboard.
Examples of inference rules
name 
from 
derive 
modus ponens 
p, p>q 
q 
modus tollens 
p>q, ~q 
~p 
and elimination 
p/\q 
p 
and introduction 
p, q 
p/\q 
or introduction 
p 
p\/q 
instantiation 
for all X p(X) 
p(a) 
rename 
for all X phi(X) 
for all Y phi(Y) 
existsintroduction 
p(a) 
exists X p(X) 
substitution 
phi(p) 
phi(psi) 
replacement 
p>q 
~p\/q 
implication 
assume p ... ,q 
p>q 
contradiction 
assume ~p ...,false 
p 
resolution 
p\/phi, ~p\/psi 
phi \/ psi 
(special case) 
p, ~p\/psi 
psi 
(more special case) 
p, ~p 
false 
Strategies
forwards  start from axioms, apply rules
backwards  start from the theorem (in general: a set of goals), work
backwards to the axioms
depthfirst or breadthfirst
when to apply which rule
general questions:
are the rules correct (sound)?
is there a proof for every logical consequence (complete)?
can we remove rules (redundant)?
Having redundant rules may allow shorter proofs, but a larger search
space.
Resolution and refutation
We want to prove theory > goal.
The theory is usually a set of facts and rules, that can be
treated as axioms (by the rule called implication above). A theory
is usually quite stable, and used to prove various goals.

we use contradiction, add ~goal to the axioms, and try to
prove false.

the theory and ~goal are put in clausal form  a set (conjunction) of
clauses
[see Luger p 558560]

use resolution and unification to derive false
Clauses
a clause is a universially quantified disjunction of literals
a literal is an atomic formula or its negation
examples of clauses: p \/ q, p(X) \/ q(Y), ~p(X) \/ q(X).
A special case is Hornclauses: they have at most one positive (not
negated) literal.
Three subclasses:
facts (1 pos, 0 neg): p(a,b).
rules (1 pos, > 0 neg): ~p \/ ~q \/ r  often written as: p /\ q >
r
goals(0 pos): ~p \/ ~q  if we want to prove p/\q from the theory,
we add this clause  can be written as p /\ q > false
Skolemization (p. 559)
How to remove existential quantification? Use function symbols!
Example: every person has a mother: (for all X) person(X) >
(exists Y) mother(X,Y)
Give a name to Y: the mother of X. (for all X) person(X) >
mother(X,mother_of(X))
This allows us to define datastructures in the logic.
Example: if X is an element and Y a list, then there is a list with
head X and tail Y.
(for all X,Y) elt(X) /\ list(Y) > (exists Z) list(Z) /\ head(Z,X)
/\ tail(Z,Y)
first we name Z the cons of X and Y, then we get 3 rules:
elt(X) /\ list(Y) > list(cons(X,Y)).
elt(X) /\ list(Y) > head(cons(X,Y), X).
elt(X) /\ list(Y) > tail(cons(X,Y), Y).
Unification
Unification (2.3.2) is used to perform instantiation before
(during) resolution.
Simple case:
Resolve list(nil) with ~elt(X) \/ ~list(Y) \/ list(cons(X,Y)).
First we must instantiate Y to nil, then we can derive
~elt(X) \/ list(cons(X,nil)).
Hard case:
I want to prove that there is a list that has the head mother_of(john).
So I take as a goal its negation ~(exists X)(list(X) /\ head(X,mother_of(john)))
This gives the clause ~list(X) \/ ~head(X,mother_of(john))
We can resolve with ~elt(X) \/ ~list(Y) \/ head(cons(X,Y),X).
Problem 1: the X's in both clauses "accidentally" have the
same name.
Solution 1: rename to ~elt(Z) \/ ~list(Y) \/ head(cons(Z,Y),Z).
Problem 2: what is the common instance of head(cons(Z,Y),Z)
and head(X,mother_of(john))?
Answer  the result of unification: head(cons(mother_of(john),Y),mother_of(john))
The unifier is: {Z/mother_of(john), X/cons(mother_of(john),Y)}
The resolvent (resulting clause) is
(~list(X) \/ ~elt(Z) \/ ~list(Y)){Z/mother_of(john),
X/cons(mother_of(john),Y)} =
~list(cons(mother_of(john),Y)) \/ ~elt(mother_of(john))
\/ ~list(Y)
Exercise: use the first listrule and the facts elt(mother_of(john))
and list(nil)to complete the proof.
Unification as rewriting
the algorithm maintains a set of equalities. Each time, remove
an equality and apply the applicable rule.

Var1 = Var1  nothing (just remove the equality)

Var1 = Expr  in all other equalities, replace Var1 by Expr, keep Var1
= Expr
exception: if Var1 occurs in Expr, then FAIL

f(A_{1},..,A_{n}) = f(B_{1},..,B_{n}) 
add {A_{1} = B_{1}, .., A_{n} = B_{n}}

f(A_{1},..,A_{n}) = g(B_{1},..,B_{m}) 
FAIL
Example (the equality selected is underlined)
{head(cons(Z,Y),Z) = head(X,mother_of(john))}  use
the first frule, f=head, n=2.
{cons(Z,Y) = X, Z = mother_of(john)}
{cons(mother_of(john),Y) = X, Z = mother_of(john)}
no more applicable rules, and no FAIL, so we have a unifier.
Resolution and Prolog
A Prolog program consists of Hornclauses: facts and rules.
We give the program a goal, which it turns into a Horn clause with
only negative literals.
Prolog uses linear resolution:
it takes the goal and a rule/fact, and produces a new goal, until it
derives false. This proves the goal.
(Other outcomes are that Prolog runs out of options and cannot prove
the goal, or that it goes into an infinite loop.)
Linear resolution is not complete for predicate logic in general, but
it is complete in this special case.
Prolog uses depthfirst search in a search tree  choices arise
when more than one rule is applicable.