List of Tables |
List of Figures |
Table of Notations |
Preface |
Road Map |
Preliminaries / I: |
Introduction / 1: |
Ratinalist and Empiricist Approaches to Language / 1.1: |
Scientific Content / 1.2: |
Questions that linguistics should answer / 1.2.1: |
Non-categorical phenomena in language / 1.2.2: |
Language and cognition as probabilistic phenomena / 1.2.3: |
The Ambiguity of Language: Why NLP Is Difficult / 1.3: |
Dirty Hands / 1.4: |
Lexical resources / 1.4.1: |
Word counts / 1.4.2: |
Zipf's laws / 1.4.3: |
Collocations / 1.4.4: |
Concordances / 1.4.5: |
Further Reading / 1.5: |
Exercises / 1.6: |
Mathematical Foundations / 2: |
Elementary Probability Theory / 2.1: |
Probability spaces / 2.1.1: |
Conditional probability and independence / 2.1.2: |
Bayes' theorem / 2.1.3: |
Random variables / 2.1.4: |
Expectation and variance / 2.1.5: |
Notation / 2.1.6: |
Joint and conditional distributions / 2.1.7: |
Determining P / 2.1.8: |
Standard distributions / 2.1.9: |
Bayesian statistics / 2.1.10: |
Essential Information Theory / 2.1.11: |
Entropy / 2.2.1: |
Joint entropy and conditional entropy / 2.2.2: |
Mutual information / 2.2.3: |
The noisy channel model / 2.2.4: |
Relative entropy or Kullback-Leibler divergence / 2.2.5: |
The relation to language: Cross entropy / 2.2.6: |
The entropy of English / 2.2.7: |
Perplexity / 2.2.8: |
Linguistic Essentials / 2.2.9: |
Parts of Speech and Morphology / 3.1: |
Nouns and pronouns / 3.1.1: |
Words that accompany nouns: Determiners and adjectives / 3.1.2: |
Verbs / 3.1.3: |
Other parts of speech / 3.1.4: |
Phrase Structure / 3.2: |
Phrase structure grammars / 3.2.1: |
Dependency: Arguments and adjuncts / 3.2.2: |
X' theory / 3.2.3: |
Phrase structure ambiguity / 3.2.4: |
Semantics and Pragmatics / 3.3: |
Other Areas / 3.4: |
Corpus-Based Work / 3.5: |
Getting Set Up / 4.1: |
Computers / 4.1.1: |
Corpora / 4.1.2: |
Software / 4.1.3: |
Looking at Text / 4.2: |
Low-level formatting issues / 4.2.1: |
Tokenization: What is a word? / 4.2.2: |
Morphology / 4.2.3: |
Sentences / 4.2.4: |
Marked-up Data / 4.3: |
Markup schemes / 4.3.1: |
Grammatical tagging / 4.3.2: |
Words / 4.4: |
Frequency / 5: |
Mean and Variance / 5.2: |
Hypothesis Testing / 5.3: |
The t test / 5.3.1: |
Hypothesis testing of differences / 5.3.2: |
Pearson's chi-square test / 5.3.3: |
Likelihood ratios / 5.3.4: |
Mutual Information / 5.4: |
The Notion of Collocation / 5.5: |
Statistical Inference: n-gram Models over Sparse Data / 5.6: |
Bins: Forming Equivalence Classes / 6.1: |
Reliability vs. discrimination / 6.1.1: |
n-gram models / 6.1.2: |
Building n-gram models / 6.1.3: |
Statistical Estimators / 6.2: |
Maximum Likelihood Estimation (MLE) / 6.2.1: |
Laplace's law, Lidstone's law and the Jeffreys-Perks law / 6.2.2: |
Held out estimation / 6.2.3: |
Cross-validation (deleted estimation) / 6.2.4: |
Good-Turing estimation / 6.2.5: |
Briefly noted / 6.2.6: |
Combining Estimators / 6.3: |
Simple linear interpolation / 6.3.1: |
Katz's backing-off / 6.3.2: |
General linear interpolation / 6.3.3: |
Language models for Austen / 6.3.4: |
Conclusions / 6.4: |
Word Sense Disambiguation / 6.5: |
Methodological Preliminaries / 7.1: |
Supervised and unsupervised learning / 7.1.1: |
Pseudowords / 7.1.2: |
Upper and lower bounds on performance / 7.1.3: |
Supervised Disambiguation / 7.2: |
Bayesian classification / 7.2.1: |
An information-theoretic approach / 7.2.2: |
Dictionary-Based Disambiguation / 7.3: |
Disambiguation based on sense definitions / 7.3.1: |
Thesaurus-based disambiguation / 7.3.2: |
Disambiguation based on translations in a second-language corpus / 7.3.3: |
One sense per discourse, one sense per collocation / 7.3.4: |
Unsupervised Disambiguation / 7.4: |
What Is a Word Sense? / 7.5: |
Lexical Acquisition / 7.6: |
Evaluation Measures / 8.1: |
Verb Subcategorization / 8.2: |
Attachment Ambiguity / 8.3: |
Hindle and Rooth (1993) / 8.3.1: |
General remarks on PP attachment / 8.3.2: |
Selectional Preferences / 8.4: |
Semantic Similarity / 8.5: |
Vector space measures / 8.5.1: |
Probabilistic measures / 8.5.2: |
The Role of Lexical Acquisition in Statistical NLP / 8.6: |
Grammar / 8.7: |
Markov Models / 9: |
Hidden Markov Models / 9.1: |
Why use HMMs? / 9.2.1: |
General form of an HMM / 9.2.2: |
The Three Fundamental Questions for HMMs / 9.3: |
Finding the probability of an observation / 9.3.1: |
Finding the best state sequence / 9.3.2: |
The third problem: Parameter estimation / 9.3.3: |
HMMs: Implementation, Properties, and Variants / 9.4: |
Implementation / 9.4.1: |
Variants / 9.4.2: |
Multiple input observations / 9.4.3: |
Initialization of parameter values / 9.4.4: |
Part-of-Speech Tagging / 9.5: |
The Information Sources in Tagging / 10.1: |
Markov Model Taggers / 10.2: |
The probabilistic model / 10.2.1: |
The Viterbi algorithm / 10.2.2: |
Variations / 10.2.3: |
Hidden Markov Model Taggers / 10.3: |
Applying HMMs to POS tagging / 10.3.1: |
The effect of initialization on HMM training / 10.3.2: |
Transformation-Based Learning of Tags / 10.4: |
Transformations / 10.4.1: |
The learning algorithm / 10.4.2: |
Relation to other models / 10.4.3: |
Automata / 10.4.4: |
Summary / 10.4.5: |
Other Methods, Other Languages / 10.5: |
Other approaches to tagging / 10.5.1: |
Languages other than English / 10.5.2: |
Tagging Accuracy and Uses of Taggers / 10.6: |
Tagging accuracy / 10.6.1: |
Applications of tagging / 10.6.2: |
Probabilistic Context Free Grammars / 10.7: |
Some Features of PCFGs / 11.1: |
Questions for PCFGs / 11.2: |
The Probability of a String / 11.3: |
Using inside probabilities / 11.3.1: |
Using outside probabilities / 11.3.2: |
Finding the most likely parse for a sentence / 11.3.3: |
Training a PCFG / 11.3.4: |
Problems with the Inside-Outside Algorithm / 11.4: |
Probabilistic Parsing / 11.5: |
Some Concepts / 12.1: |
Parsing for disambiguation / 12.1.1: |
Treebanks / 12.1.2: |
Parsing models vs. language models / 12.1.3: |
Weakening the independence assumptions of PCFGs / 12.1.4: |
Tree probabilities and derivational probabilities / 12.1.5: |
There's more than one way to do it / 12.1.6: |
Phrase structure grammars and dependency grammars / 12.1.7: |
Evaluation / 12.1.8: |
Equivalent models / 12.1.9: |
Building parsers: Search methods / 12.1.10: |
Use of the geometric mean / 12.1.11: |
Some Approaches / 12.2: |
Non-lexicalized treebank grammars / 12.2.1: |
Lexicalized models using derivational histories / 12.2.2: |
Dependency-based models / 12.2.3: |
Discussion / 12.2.4: |
Applications and Techniques / 12.3: |
Statistical Alignment and Machine Translation / 13: |
Text Alignment / 13.1: |
Aligning sentences and paragraphs / 13.1.1: |
Length-based methods / 13.1.2: |
Offset alignment by signal processing techniques / 13.1.3: |
Lexical methods of sentence alignment / 13.1.4: |
Word Alignment / 13.1.5: |
Statistical Machine Translation / 13.3: |
Clustering / 13.4: |
Hierarchical Clustering / 14.1: |
Single-link and complete-link clustering / 14.1.1: |
Group-average agglomerative clustering / 14.1.2: |
An application: Improving a language model / 14.1.3: |
Top-down clustering / 14.1.4: |
Non-Hierarchical Clustering / 14.2: |
K-means / 14.2.1: |
The EM algorithm / 14.2.2: |
Topics in Information Retrieval / 14.3: |
Some Background on Information Retrieval / 15.1: |
Common design features of IR systems / 15.1.1: |
Evaluation measures / 15.1.2: |
The probability ranking principle (PRP) / 15.1.3: |
The Vector Space Model / 15.2: |
Vector similarity / 15.2.1: |
Term weighting / 15.2.2: |
Term Distribution Models / 15.3: |
The Poisson distribution / 15.3.1: |
The two-Poisson model / 15.3.2: |
The K mixture / 15.3.3: |
Inverse document frequency / 15.3.4: |
Residual inverse document frequency / 15.3.5: |
Usage of term distribution models / 15.3.6: |
Latent Semantic Indexing / 15.4: |
Least-squares methods / 15.4.1: |
Singular Value Decomposition / 15.4.2: |
Latent Semantic Indexing in IR / 15.4.3: |
Discourse Segmentation / 15.5: |
TextTiling / 15.5.1: |
Text Categorization / 15.6: |
Decision Trees / 16.1: |
Maximum Entropy Modeling / 16.2: |
Generalized iterative scaling / 16.2.1: |
Application to text categorization / 16.2.2: |
Perceptrons / 16.3: |
k Nearest Neighbor Classification / 16.4: |
Tiny Statistical Tables / 16.5: |
Bibliography |
Index |