Artificial Intelligence Using Statistical Methods
Summer Class 2011 - Uppsala University



Student Presentation Schedule


Greetings! If you are reading this, then you are considering taking this class this summer. A warning: this URL is under construction and some content may change, but the overall structure of the class as well as student expectations are explained below. Please feel free to contact me (Bill Sverdlik ) at wsverdlik@emich.edu.

So what's this class all about ? I hope the title explains it all. The content of this course came about after several discussion between Roland Bol and myself. Roland thought a class that was a continuation of the class Artificial Intelligence (offered in the Fall) would be appropriate and together we came up with topics not covered in that course that we would like to see in this couse. Most of the topics are probabilistic/statistically oriented, with some planning (probabilistic planning) thrown in.

Is the material from Artificial Intelligence a prerequisite for this course ? For the most part, the answer is no. When we discuss planning, I will assume you are familar with state space search.

How much Math is required ?  Not much really. I assume that students have an elementary background in probability theory and we will review it the first few lectures. There will be some use of derivatives, integrals, vector spaces, gradients , etc. We will deal almost exclusively with discrete random varibles and various applications of Bayes Rule. An alternative name for the second half of this class could have been "Applications of Dynamic Programming", as we will use the dynamic programming paradigm to solve various problems in Hidden Markov Models and Markov Decision Processes. In addition, you should be familiar with simple concepts from graph theory (undirected/directed paths, trees, etc), matrix algebra (matrix manipulation) and elementary analysis of algorithms (big-O ).

Student Expectations and Produceables

I'll be a little vague here, because this is class size specific. Students will be organized into working groups of 3-5 students and you must remain in the same group for all work performed. Each group is required to complete all of the following:
1) A software project and short paper write up describing the project to be presented to the class
2) A research paper to be presented to the class
3) Act as an "opponent" to some other group presenting a research paper. You must act as an opponent in a topic different than the one you selected in 2)

In all cases, I will provide samples you can choose from, or you can find topics and papers of your own interest. Below, you will find a quick overview of topics we will cover. In selecting projects/papers to do, you must pick from topics 2 through 5, and your produceables must come from at least 2 different topics. So for example, a group may select to do a software project in Bayesian Belief Networks (topic 3) , present a paper on Applications of Hidden Markov Models to Linguistics (topic 4), and act as an opponent to some other group presenting a paper on Markov Decision Processes (topic 5).

Important! For each produceable, you must clearly indicate each group members specific contribution to the project. I will meet with each group individually to assess this.

Student/Team Evaluation Form

In assigning you a final grade, the following criteria will be considered
- have you demonstrated that you understand the material presented in the class. Do you know what dynamic programming is, do you understand the Viterbi algorithm, do you know the value iteration algorithm (these are just a few examples).
- did you contribute to your groups success?
- was the quality of your group's work good ?
- did your group provide a broad range of topics in its produceables ?
- did your group do one of the more difficult projects ,indicated by an (!!!)


Schedule

We will meet for approximately 8 hours per week. Classes will meet on the weeks June 6-10, June 13-17, and June 20-24. We will have a 4 week break and then meet again for two weeks for presentations and projects : July 25-29 and August 1-5. We will have a guest speaker for topic 3 (Bayesian Belief Networks) - Dr. Michael Ashcroft.

Important Date: Thursday June 23, 2011 (last class before vacation). You must present to me a 1 page paper with the names of your team, your software project, and which team you will be an opponent to. We will spend some time at the end of the class where each group will describe to the class what they will be doing.


Topics and Notes:

1) The Preliminaries (a quick intro to probability theory)
Class Notes
- random variables, the axioms, Bayes Theorem (read Bayes Nets slides 1-31)
- money in the envelope problem
- Monty Hall problem
- Joint Distributions (read Bayes Nets slides 32-41)


2) Naive Bayesian Inferences 
Class Notes  
Naive Bayesian Inference(pdf)   Naive Bayesian Inferences (PowerPoint)
-  MAP (Maximal a Posteriori) versus MLE (Maximum Likelihood Estimate) and a quick and dirty introducion to EM (Expectation-Maximization) algorithm
PLEASE NOTE: too many subscipts, formulas, etc for me to provide a Powerpoint. Please take good notes! (OK, here's a .pdf I found on the internet but it contains errors!)

Paper Presentation:
         The early (very early) medical diagnosis programs MYCIN and CADEUSUS employed Naive Bayesian inference as a main component of the search engine. Interestingly, the accuracy of MYCIN was shown to be better than many doctors. A paper presentation here would include a discusssion of the inference mechanism and search engine. Please note that the wikipedia enrty for MYCIN contains a link to a very good  e-book  .

 Programming Project: Spam Detection
         We discussed the details of a Bayesian Spam filter in class. You should implement a text based labeller. It is not necessary to lable e-mails as Spam/noSpam, it can be any boolean valued test you like (e.g. Interesting/NotInteresting ,  Funny/notFunny ,  etc.) and does not need to be restricted to e-mails (e.g. newspaper articles, blog postings). However, you should pick media where many eaxmples are easy to procure. Spam e-mail databases are easy to find on the interenet. Some papers describing bayesian approaches to spam filtering are provided here , here , and here .

3) Bayesian Belief Networks

Class Notes

Bayesian Belief Networks (from our guest lecturer Michael Ashcroft) [Powerpoint] [ .pdf]
Bayesian Belief Networks (from Andrew Moore)

Paper Presentation:
A google search on "Bayesian Belief Networks" +applications gives many many interesting links. A few are provided here.
1) Social Networks
2) Food design !!! (This one sounds a little weird)
3) Ecology

4) Somewhat more technical reading can be found by googling "Bayesian Networks learned from data". The idea is that the parameters to the network can be derived from small data sets (see below).

5) Inference in Belief Networks: A Procedural Guide, Huang and Darwiche 1994 (IJAR95.pdf)
A very clear explanation of how to produce a junction tree for a Bayesian Network, and perform inference on it.
It also includes a number of good methods for making inference implementations more efficient.

6) Quantifying the uncertainty of a belief net response: Bayesian error-bars for belief net inference, Van Allen et al 2007 (Variance.pdf)
Though the paper as a whole is difficult, it has a very good and simple explanation of the Variable Elimination Algorithm.  If you are feeling
courageous it also explains how to calculate error bars for the probabilities produced.

7) Learning Equivalent Classes of Bayesian Network Structures, Chickering 2002a and Optimal Structure Identification with Greedy Search, Chickering 2002b (Chickering02a.pdf and Chickering02b.pdf)

The number one and two algorithms for (equivalent) state space search when learning BNs from Data.  Warning: The PDAG-to-DAG algorithm presented in 2002a is wrongly implemented.  Refer to the original Dor and Tarsi (1992) for the correct alogorithm.

8) Learning Bayesian Networks, Neapolitan 2004
(Not sure if Uppsala has a copy, but it is available via inter-library loan and I notice he also has a video lecture site:
http://videolectures.net/kdd07_neapolitan_lbn/)

A good overview of Bayesian Networks from no presumed knowledge, although lacking actual implementation instructions for many algorithms.  It does
include implementation instructions for sampling and a good explanation of Dirichlet probability density functions.  It also includes a discussion/advocacy of causal BNs in certain circumstances.

9) Probabilty Structures and Decision Graph, Jensen and Neilson 2007
(Available online through Uppsala University.)

Good overview of the relation between Bayesian Networks and Influence Diagrams.  Difficult to understand and poorly implemented junction tree
algorithm for Influence Diagrams... look at below instead!

10) Solving influence diagrams: Exact algorithms, Shachter 2010 (Shachter10.pdf)

A good, if concise, generalized junction tree algorithm to include decision and utility nodes.

11) Choose your own paper. Instructor approval is required.


Programming Project: Credit Card Fraud
Part A
Create a program that includes a Bayesian Network constructed from the conditional probabilities given in Fraud Purchases Report. The network should be able to predict the probability of a purchase being fraudulent given a (possibly incomplete) vector of inputs for the remaining variables.

Part B
Use the program in part A to categorise the input vectors in the Credit Card Purchases data-set into those that should be investigated further, and those that should not be.

Data Files:
Fraud_.rtf
Fraud_20000_Full
Fraud_20000_Missing.txt



4) Hidden Markov Models (HMM)

Class Notes :
Hidden Markov Models by Andrew Moore (modified slides)   [.pdf]
The Rabiner paper (.pdf)
Training HMM - Yemini (Columbia University)

Related References:
nice tutorials on dynamic programming (from MIT)
quick and easy guide to vector quantization
speech recognition and vector quantization
easy(not) intro to EM algorithm (Baum-Welsh)

Paper Presentation:
(!!!)1) Sequence Matching (or......things don't alway happen when they are supposed to)
In our discussion of automated speech recognition (ASR), training was based upon spectrograms, which employ time as an axis. Unfortunately, we all speak at different rates. Thus, while we all may exhibit similar speech patterns when pronouncing a word (as revealed by a spectrogram), the distinguishing characteristics in the spectrogram may occur at different time intervals. The dynamic time warping (DTW) algorithm addresses this issue. A similar problem occurs in DNA/protein sequence matching: given two sequences over a given alphabet, the problem is to find subsequences within each parent sequence that closely match. This can occur at different places within each paren sequence. Two well known algorithms for addressing this are Needleman-Wunsch algorithm and the Smith-Waterman algorithm. Note that DTW is "deforming" time, while the sequence alignment algorithms are deforming space. A presentation here would include discussion of these algorithms, analysis of the respective algorithms, application domains, and possibly related algorithms.

(!!!)2) Other Markov Models
The Markov models we have looked at are of order 1, that is, we need memory of 1 state to speak about the probability of a next state. A Markov model (or Markov Chain) of order 2 would require memory of the 2 most previous states, that is P(qt+1| qt,qt-1,qt-2,.....q2,q1) = P(qt+1| qt,qt-1) . A presentation here should include theory and applications of Markov chains of higher order.

3) Those #!ck@ng HMMs
Got your attention ? There has been ongoing research into applying HMMs to recognition of emotions in speech. I don't know much about this, but a google of "hidden markov model emotion" gives *many* references.

4) Model estimation with Baum-Welsh algorithm
(!!!) This one is a little messy , but not too messy. We already saw one method for training an HMM: using unlabelled data with the Viterbi algorithm and deriving the required probabilities. Baum-Welsh is another method employing EM (Expectation-Maximization). First, you will need to learn about the backward probabilities from the forward-backward algorithm (we managed to do without them until now), and then use EM. Some good references here would be Andrew Moore's pdf on HMM (towards the end) or  the Rabiner paper. Google searches will yield many references.

5) Choose your own paper
Please get my approval on this.

Programming Project:
(!!!) 1) Handwriting recognition
There can be an easier version of this and a more difficult one.
Easy: design a system based on HMMs to recognize a finite vocabulary of hand written words were the input is printed by the user. Letters are assumed to be seperate.
Hard: same as the easy project but input is not printed (i.e. your own handwriting).
For either version of this project, assume a vocabulary of 20 words. For training and input, you might want to use graph paper.

2) Choose your own project
There are many models of the noisy channel problem. Please get my approval on this.

5) Planning and Markov Decision Processes (MDP)

Planning is the activity that preceeds acting. Not all acting require planning , however , reflexive or reactive behavior requires no planning (do you plan when you pick up a hot fry pan?).  Thus, we will informally say planning is the activity that occurs before embarking on a "difficult" task ( we won't even try to define "difficult); its a deliberate activity where one attempts to enumerate the actions required to attain a specific goal.

We'll begin our discussion with some definitions and describe two models of planning . Next we'll consider representational issues, followed by inference methods. We'll look in some detail at the STRIPS model and then consider Partially Ordered Planners (POP). Finally , we'll consider Markov Decision Processes, the Value Iteration Algorithm (some more Dynamic Programming!!) and some applications.

Class Notes:  
1) Introduction
2) Markov Decision Processes    (MDPPlanning)

Paper Presentation:

1)Modeling Customer Retention as a Markov Decision Process
- How much should a company spend on retaining a customer? What other actions might a company consider? 

2) A Planning System Based on Markov Decision Processes to Guide People with Dementia Through Activities of Daily Living - this paper looks really interesting. It is not very technical; perhaps presenters could look for other applications of Markov Decision Processes in health care.

3) Real Applications of MDPs
The AI world is full of lots of claims about glorious things that systems can do. Except they don't always work as advertised. These three papers , by the same author, are a general survey of real life applications of MDP. I would like a presentation of these papers to answer the question: do they (MDPs) really work??
Real Applications of MDPs 1
Real Applications of MDPs 2
Real Applications of MDPs 3

(!!!)4) Robot Navigation with POMDP (Partially Observable Markov Decision Processes) - presenters will need to give a little foundation on POMDP. Good paper! For some reason , the authors names and affiliations have not appeared in the pdf. The citation should be "Robot navigation with markov models: A framework for path planning and learning with limited computational resources" by Sven Koenig, Richard Goodwin and Reid G. Simmons in "Reasoning with Uncertainty in Robotics:
Lecture Notes in Computer Science", 1996, Volume 1093/1996

Programming Project:
(!!!)Implement a Partial Order Planner (POP) or (!) Implement a non-interleaved (sequential) planner

You may select the specification language as you like (propositional logic or first order predicate calculus). Specific details and algorithms may be found in the following excerpt from Russell and Norvig "Artificial Intelligence: A Modern Approach", Prentice-Hall, 1995 :

The POP planner should be able to address Sussmans anomaly and similar problems. Either version of this assignment should be able to solve problems where a linear (total) order exist like changing a flat tire or putting on shoes.

POP 1
POP 2

The following will be helpful reading.