Statistical Inference/Probability Theory

From Wiki**3

< Statistical Inference
Statistical Inference
Probability Theory
Transformations and Expectations
Common Families of Distributions
Multiple Random Variables
Properties of a Random Sample
Principles of Data Reductions
Point Estimation
Hypothesis Testing
Interval Estimation
Asymptotic Evaluations
Analysis of Variance and Regression
Regression Models

Set Theory

Sample Space

The set, S, of all possible outcomes of a particular experiment is called the sample space for the experiment. [Definition 1.1.1]

Event

An event is any collection of possible outcomes of an experiment, that is, any subset of S (including S itself). [Definition 1.1.2]

Let A be an event, a subset of S. We say the event A occurs if the outcome of the experiment is in the set A. When speaking of probabilities, we generally speak of the probability of an event, rather than a set. But we may use the terms interchangeably.

Base relationships

We first need to define formally the following two relationships, which allow us to order and equate sets:

Containment:

Equality:

Elementary operations

Given any two events (or sets) A and B , we have the following elementary set operations:

Union: The union of A and B, written , is the set of elements that belong to either A or B or both:

Intersection: The intersection of A and B, written , is the set of elements that belong to both A and B:

Complementation: The complement of A, written , is the set of all elements that are not in A:

Event operations

The elementary set operations can be combined: for any three events, A, B, and C, defined on a sample space S, the following relationships hold [Theorem 1.1.4].

Commutativity

Associativity

Distributive Laws

DeMorgan Laws

Disjoint events

Two events A and B are disjoint (or mutually exclusive) [Definition 1.1.5] if .

The events A1, A2, ... are pairwise disjoint (or mutually exclusive) if .

Disjoint sets are sets with no points in common. If we draw a Venn diagram for two disjoint sets, the sets do not overlap. The collection consists of pairwise disjoint sets. Note further that .

Event space partitions

If A1, A2,... are pairwise disjoint and , then the collection A1, A2, . . . forms a partition of S. [Definition 1.1.6]

The sets form a partition of . In general, partitions are very useful, allowing us to divide the sample space into small, non-overlapping pieces.

Basics of Probability Theory

Axiomatic foundations

For each event A in the sample space S we want to associate with A a number between zero and one that will be called the probability of A, denoted by P(A).

Sigma Algebra

A collection of subsets of S is called a sigma algebra (or Borel field) [Definition 1.2.1], denoted by , if it satisfies the following three properties:

  • (a) (the empty set is an element of ).
  • (b) if , then ( is closed under complementation).
  • (c) if , then ( is closed under countable unions).

The empty set is a subset of any set. Thus, . Property (a) states that this subset is always in a sigma algebra. Since , properties (a) and (b) imply that S is always in also. In addition, from DeMorganªs Laws it follows that is closed under countable intersections. If , then by property (b), and therefore . However, using DeMorgan's Law, we have . Thus, again by property (b), .

Associated with sample space S we can have many different sigma algebras. For example, the collection of the two sets is a sigma algebra, usually called the trivial sigma algebra. The only sigma algebra we will be concerned with is the smallest one that contains all of the open sets in a given sample space S .

Probability Function

Given a sample space S and an associated sigma algebra , a probability function [Definition 1.2.4] is a function P with domain that satisfies

  • 1. .
  • 2. .
  • 3. If are pairwise disjoint, then . [Axiom of Countable Additivity]

These three properties are usually referred to as the Axioms of Probability (or the Kolmogorov Axioms, after A. Kolmogorov, one of the fathers of probability theory). Any function P that satisfies the Axioms of Probability is called a probability function. The axiomatic definition makes no attempt to tell what particular function P to choose; it merely requires P to satisfy the axioms. For any sample space many different probability functions can be defined. Which one(s) reflects what is likely to be observed in a particular experiment is still to be discussed.

We need general methods of defining probability functions that we know will always satisfy Kolmogorov's Axioms. We do not want to have to check the Axioms for each new probability function. The following gives a common method of defining a legitimate probability function.

Defining Probability Functions

Let be a finite set. Let be any sigma algebra of subsets of S. Let be nonnegative numbers that sum to 1. For any , define (The sum over an empty set is defined to be 0.) Then P is a probability function on . This remains true if is a countable set [Theorem 1.2.6].

The physical reality of the experiment might dictate the probability assígnment.

The Calculus of Probabilities

Properties of the probability function applied to a single event

If P is a. probability function and A is any set in , then [Theorem 1.2.8]

  • (a) , where is the empty set;
  • (b) ;
  • (c) .

Properties of the probability function applied to any set pairs

If P is a probability function and A and B are any sets in , then [Theorem 1.2.9]

  • (a) ;
  • (b) ;
  • (c) If , then .

Bonferroni's Inequality

Formula (b) of Theorem 1.2.9 gives a useful inequality for the probability of an intersection. Since , we have, after some rearranging, .

This inequality is a special case of what is known as Bonferroni's Inequality. Bonferroni's Inequality allows us to bound the probability of a simultaneous event (the intersectíon) in terms of the probabilities of the individual events.

Properties of the probability function applied to collections of sets

If P is a probability function, then [Theorem 1.2.11]

  • (a) , for any partition ;
  • (b) , for any sets (Boole's Inequality)

There is a similarity between Boole's Inequality and Bonferroni's Inequality. In fact, they are essentially the same thing. We could have used Boole's Inequality to derive that (the special case of Bonferroni's Inequality, presented above). If we apply Boole*s Inequality to , we have and using the facts that and , we obtain

This becomes, on rearranging terms, , which is a more general version of the Bonferroni Inequality presented before.

Counting

Most often, methods of counting are used in order to construct probability assignments on finite sample spaces, although they can be used to answer other questions also.

Counting problems, in general, sound complicated, and often we must do our counting subject to many restrictions. The way to solve such problems is to break them down into a series of simple tasks that are easy to count, and employ known rules of combining tasks. The following theorem is a first step in such a process and is sometimes known as the Fundamental Theorem of Counting.

Fundamental Theorem of Counting

If a job consists of separate tasks, the -th of which can be done in ways, , then the entire job can be done in ways.

Possible Methods of Counting

Counting can be made with without replacement. Also, the order in which the task is performed may matter. Four types counting scenarios are thus possible: ordered or unordered, both of which may be done with or without replacement.

Factorial

For a positive integer n, n! (read n factorial) is the product of all of the positive integers less than or equal to n [Definition 1.2.16]. That is, . Furthermore, we define 0! = 1.

Enumerating Outcomes

Conditional Probability and Independence

Random Variables

Distribution Functions

Density and Mass Functions