Artificial
Intelligence Using Statistical Methods
Summer Class 2011 - Uppsala University
Student Presentation Schedule
Greetings!
If you
are reading this, then you are considering taking this class this
summer. A warning: this URL is under construction and some content may
change, but the overall structure of the class as well as student
expectations are explained below. Please feel free to contact me (Bill
Sverdlik ) at wsverdlik@emich.edu.
So what's this
class all about ?
I hope the title explains it all. The content of this course came about
after several discussion between Roland Bol and myself. Roland thought
a class that was a continuation of the class Artificial Intelligence
(offered in the Fall) would be appropriate and together we came up with
topics not covered in that course that we would like to see in this
couse. Most of the topics are probabilistic/statistically oriented,
with some planning (probabilistic planning) thrown in.
Is the
material from Artificial Intelligence a prerequisite for this course ?
For the most part, the answer is no. When we discuss planning, I will
assume you are familar with state space search.
How much Math
is required ?
Not much really. I assume that students have an elementary
background in probability theory and we will review it the first few
lectures. There will be some use of derivatives, integrals,
vector spaces, gradients , etc. We will deal almost exclusively with
discrete random varibles and various applications of Bayes Rule.
An alternative name for the second half of this class could have
been "Applications of
Dynamic Programming",
as we will use the dynamic programming paradigm to solve various
problems in Hidden Markov Models and Markov Decision Processes. In
addition, you should be familiar with simple concepts from graph theory
(undirected/directed paths, trees, etc), matrix algebra (matrix
manipulation) and elementary analysis of algorithms (big-O ).
Student
Expectations and Produceables
I'll be a little vague here, because this is class size specific.
Students will be organized into working groups of 3-5 students and you
must remain in the same group for all work performed. Each group is
required to complete all of the following:
1) A software project and short paper write up describing the project to be presented to the class
2) A research paper to be presented to the class
3) Act as an "opponent" to some other group presenting a research
paper. You must act as an opponent in a topic different than the one
you selected in 2)
In all cases, I will provide samples you can choose from, or you can
find topics and papers of your own interest. Below, you will find a
quick overview of topics we will cover. In selecting projects/papers to
do, you must pick from topics 2 through 5, and your produceables must come from
at least 2 different topics.
So for example, a group may select to do a software project in Bayesian
Belief Networks (topic 3) , present a paper on Applications of Hidden
Markov Models to Linguistics (topic 4), and act as an opponent to some
other group presenting a paper on Markov Decision Processes (topic 5).
Important!
For each
produceable, you must clearly indicate each group members specific
contribution to the project. I will meet with each group individually
to assess this.
Student/Team Evaluation Form
In assigning you a final grade, the following criteria will be considered
- have you demonstrated that you understand the material presented in
the class. Do you know what dynamic programming is, do you understand
the Viterbi algorithm, do you know the value iteration algorithm (these
are just a few examples).
- did you contribute to your groups success?
- was the quality of your group's work good ?
- did your group provide a broad range of topics in its produceables ?
- did your group do one of the more difficult projects ,indicated by an (!!!)
Schedule
We will meet for approximately 8 hours per week. Classes will meet on
the weeks June 6-10, June 13-17, and June 20-24. We will have a 4 week
break and then meet again for two weeks for presentations and projects
: July 25-29 and August 1-5. We will have a guest speaker for topic 3
(Bayesian Belief Networks) - Dr. Michael Ashcroft.
Important Date: Thursday June 23, 2011 (last
class before vacation). You must present to me a 1 page paper with the
names of your team, your software project, and which team you will be
an opponent to. We will spend some time at the end of the class where
each group will describe to the class what they will be doing.
Topics and Notes:
1) The Preliminaries (a quick intro to probability theory)
Class Notes
- random variables, the axioms, Bayes Theorem (read Bayes Nets slides 1-31)
- money in the envelope problem
- Monty Hall problem
- Joint Distributions (read Bayes
Nets slides 32-41)
2)
Naive Bayesian Inferences
Class Notes
- Naive
Bayesian Inference(pdf) Naive
Bayesian Inferences
(PowerPoint)
- MAP (Maximal a Posteriori) versus MLE (Maximum Likelihood
Estimate) and a quick and dirty introducion to EM
(Expectation-Maximization) algorithm
PLEASE NOTE: too many subscipts, formulas, etc for me to provide a
Powerpoint. Please take good notes! (OK, here's a .pdf I found on the
internet but it contains errors!)
Paper Presentation:
The early
(very early) medical diagnosis programs MYCIN
and CADEUSUS
employed Naive Bayesian inference as a main component of the search
engine. Interestingly, the accuracy of MYCIN was shown to be better
than many doctors. A paper presentation here would include a
discusssion of the inference mechanism and search engine. Please note
that the wikipedia enrty for MYCIN contains a link to a very
good
e-book
.
Programming
Project: Spam Detection
We
discussed the details of a
Bayesian Spam filter in class. You should implement a text based
labeller. It is not necessary to lable e-mails as Spam/noSpam, it can
be any boolean valued test you like (e.g. Interesting/NotInteresting ,
Funny/notFunny , etc.) and does not need to be
restricted
to e-mails (e.g. newspaper articles, blog postings). However, you
should pick media where many eaxmples are easy to procure. Spam e-mail
databases are easy to find on the interenet. Some papers describing
bayesian approaches to spam filtering are provided here , here
, and here .
3) Bayesian Belief Networks
Class Notes
Bayesian Belief Networks (from our guest lecturer Michael
Ashcroft) [Powerpoint] [ .pdf]
Bayesian Belief Networks (from
Andrew Moore)
Paper Presentation:
A google search on "Bayesian Belief Networks" +applications
gives many many interesting links. A few are provided here.
1) Social
Networks
2) Food design !!!
(This one sounds a little weird)
3) Ecology
4) Somewhat more technical reading can be found by googling "Bayesian
Networks learned from data". The idea is that the parameters to the
network can be derived from small data sets (see below).
5) Inference in Belief Networks: A
Procedural Guide, Huang and Darwiche 1994 (IJAR95.pdf)
A very clear explanation of how to produce a junction tree for a
Bayesian Network, and perform inference on it.
It also includes a number of good methods for making inference
implementations more efficient.
6) Quantifying the uncertainty of a
belief net response: Bayesian error-bars for belief net inference, Van
Allen et al 2007 (Variance.pdf)
Though the paper as a whole is difficult, it has a very good and simple
explanation of the Variable Elimination Algorithm. If you are
feeling
courageous it also explains how to calculate error bars for the
probabilities produced.
7) Learning Equivalent Classes of
Bayesian Network Structures, Chickering 2002a and Optimal Structure
Identification with Greedy Search, Chickering 2002b (Chickering02a.pdf
and Chickering02b.pdf)
The number one and two algorithms for (equivalent) state space search
when learning BNs from Data. Warning: The PDAG-to-DAG algorithm
presented in 2002a is wrongly implemented. Refer to the original
Dor and Tarsi (1992) for the correct alogorithm.
8) Learning Bayesian Networks,
Neapolitan 2004
(Not sure if Uppsala has a copy, but it is available via inter-library
loan and I notice he also has a video lecture site:
http://videolectures.net/kdd07_neapolitan_lbn/)
A good overview of Bayesian Networks from no presumed knowledge,
although lacking actual implementation instructions for many
algorithms. It does
include implementation instructions for sampling and a good explanation
of Dirichlet probability density functions. It also includes a
discussion/advocacy of causal BNs in certain circumstances.
9) Probabilty Structures and Decision
Graph, Jensen and Neilson 2007
(Available online through Uppsala University.)
Good overview of the relation between Bayesian Networks and Influence
Diagrams. Difficult to understand and poorly implemented junction
tree
algorithm for Influence Diagrams... look at below instead!
10) Solving influence diagrams: Exact
algorithms, Shachter 2010 (Shachter10.pdf)
A good, if concise, generalized junction tree algorithm to include
decision and utility nodes.
11) Choose your own paper.
Instructor approval is required.
Programming Project:
Credit Card Fraud
Part A
Create a program that includes a Bayesian Network constructed from the
conditional probabilities given in Fraud Purchases Report. The network
should be able to predict the probability of a purchase being fraudulent given a (possibly incomplete) vector of inputs for the
remaining variables.
Part B
Use the program in part A to categorise the input vectors in the Credit
Card Purchases data-set into those that should be investigated further,
and those that should not be.
Data Files:
Fraud_.rtf
Fraud_20000_Full
Fraud_20000_Missing.txt
4) Hidden Markov Models (HMM)
Class Notes :
Hidden Markov Models
by Andrew Moore (modified slides)
[.pdf]
The Rabiner paper (.pdf)
Training HMM -
Yemini (Columbia University)
Related References:
nice
tutorials on dynamic programming (from MIT)
quick
and easy guide to vector quantization
speech
recognition and vector quantization
easy(not) intro to EM algorithm
(Baum-Welsh)
Paper Presentation:
(!!!)1) Sequence Matching (or......things don't alway happen when they are
supposed to)
In our discussion of automated speech recognition (ASR), training was
based upon spectrograms, which employ time as an axis. Unfortunately,
we all speak at different rates. Thus, while we all may exhibit similar
speech patterns when pronouncing a word (as revealed by a spectrogram),
the distinguishing characteristics in the spectrogram may occur at
different time intervals. The dynamic
time warping
(DTW) algorithm addresses this issue. A similar problem occurs in
DNA/protein sequence matching: given two sequences over a given
alphabet, the problem is to find subsequences within each parent
sequence that closely match. This can occur at different places within
each paren sequence. Two well known algorithms for addressing this are Needleman-Wunsch
algorithm and the Smith-Waterman
algorithm. Note that DTW is "deforming" time, while the sequence
alignment algorithms are deforming space. A presentation here would
include discussion of these algorithms, analysis of the respective
algorithms, application domains, and possibly related algorithms.
(!!!)2) Other Markov Models
The Markov models we have looked at are of order 1, that is, we need
memory of 1 state to speak about the probability of a next state. A
Markov model (or Markov Chain) of order 2 would require memory of the 2
most previous states, that is P(qt+1| qt,qt-1,qt-2,.....q2,q1)
= P(qt+1| qt,qt-1)
. A presentation here should include theory and applications of Markov
chains of higher order.
3) Those #!ck@ng HMMs
Got your attention ? There has been ongoing research into applying HMMs
to recognition of emotions in speech. I don't know much about this, but
a google of "hidden markov model emotion" gives *many* references.
4) Model estimation with
Baum-Welsh algorithm
(!!!) This one is a little messy , but not too messy. We already saw one
method for training an HMM: using unlabelled data with the Viterbi
algorithm and deriving the required probabilities. Baum-Welsh is
another method employing EM (Expectation-Maximization). First, you will
need to learn about the backward probabilities from the
forward-backward algorithm (we managed to do without them until now),
and then use EM. Some good references here would be Andrew Moore's pdf on HMM
(towards the end) or the Rabiner
paper. Google searches will yield many references.
5) Choose your own paper
Please get my approval on this.
Programming
Project:
(!!!) 1) Handwriting recognition
There can be an easier version of this and a more difficult one.
Easy: design a system based on HMMs to recognize a finite vocabulary of
hand written words were the input is printed by the user. Letters are
assumed to be seperate.
Hard: same as the easy project but input is not printed (i.e. your own
handwriting).
For either version of this project, assume a vocabulary of 20 words.
For training and input, you might want to use graph paper.
2) Choose your own project
There are many models of the noisy channel problem. Please get my
approval on this.
5) Planning and Markov Decision Processes (MDP)
Planning is the activity that preceeds acting. Not all acting require
planning , however , reflexive or reactive behavior requires no
planning (do you plan when you pick up a hot fry pan?). Thus,
we
will informally say planning is the activity that occurs before
embarking on a "difficult" task ( we won't even try to define
"difficult); its a deliberate activity where one attempts to enumerate
the actions required to attain a specific goal.
We'll begin our discussion with some definitions and describe two
models of planning . Next we'll consider representational issues,
followed by inference methods. We'll look in some detail at the STRIPS
model and then consider Partially Ordered Planners (POP). Finally ,
we'll consider Markov Decision Processes, the Value Iteration Algorithm
(some more Dynamic Programming!!) and some applications.
Class Notes:
1) Introduction
2) Markov Decision Processes (MDPPlanning)
Paper Presentation:
1)Modeling Customer Retention as a Markov Decision Process -
How much should a company spend on retaining a customer? What other
actions might a company consider?
2) A Planning System Based on
Markov Decision Processes to Guide People with Dementia Through
Activities of Daily Living -
this paper looks really interesting. It is not very technical; perhaps
presenters could look for other applications of Markov Decision
Processes in health care.
3) Real Applications of MDPs
The AI world is full of lots of claims about glorious things that
systems can do. Except they don't always work as advertised. These
three papers , by the same author, are a general survey of real life
applications of MDP. I would like a presentation of these papers to
answer the question: do they (MDPs) really work??
Real Applications of MDPs 1
Real Applications of MDPs 2
Real Applications of MDPs 3
(!!!)4) Robot Navigation with
POMDP (Partially Observable Markov Decision Processes)
- presenters will need to give a little foundation on POMDP. Good
paper! For some reason , the authors names and affiliations have not
appeared in the pdf. The citation should be "Robot navigation with
markov models: A framework for path planning and learning with limited
computational resources" by Sven Koenig, Richard Goodwin and Reid G.
Simmons in "Reasoning with Uncertainty in Robotics:
Lecture Notes in Computer Science", 1996, Volume 1093/1996
Programming Project:
(!!!)Implement a Partial Order Planner (POP) or (!) Implement a non-interleaved (sequential) planner
You may select the specification language as you like
(propositional logic or first order predicate calculus). Specific
details and algorithms may be found in the following excerpt from
Russell and Norvig "Artificial Intelligence: A Modern Approach",
Prentice-Hall, 1995 :
The POP planner should be able to address Sussmans anomaly and similar
problems. Either version of this assignment should be able to solve
problems where a linear (total) order exist like changing a flat tire
or putting on shoes.
POP 1
POP 2
The following will be helpful reading.