Statistics
Textbooks
Boundless Statistics
A Closer Look at Tests of Significance
A Closer Look at Tests of Significance
Statistics Textbooks Boundless Statistics A Closer Look at Tests of Significance A Closer Look at Tests of Significance
Statistics Textbooks Boundless Statistics A Closer Look at Tests of Significance
Statistics Textbooks Boundless Statistics
Statistics Textbooks
Statistics
Concept Version 5
Created by Boundless

The Role of the Model

A statistical model is a set of assumptions concerning the generation of the observed data and similar data.

Learning Objective

  • Explain the significance of valid models in statistical inference


Key Points

    • Statisticians distinguish between three levels of modeling assumptions: fully-parametric, non-parametric, and semi-parametric.
    • Descriptions of statistical models usually emphasize the role of population quantities of interest, about which we wish to draw inference. Descriptive statistics are typically used as a preliminary step before more formal inferences are drawn.
    • Whatever level of assumption is made, correctly calibrated inference in general requires these assumptions to be correct; i.e., that the data-generating mechanisms have been correctly specified.

Terms

  • Simple Random Sampling

    Method where each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals.

  • covariate

    a variable that is possibly predictive of the outcome under study


Full Text

Any statistical inference requires assumptions. A statistical model is a set of assumptions concerning the generation of the observed data and similar data. Descriptions of statistical models usually emphasize the role of population quantities of interest, about which we wish to draw inference. Descriptive statistics are typically used as a preliminary step before more formal inferences are drawn.

Degrees of Models

Statisticians distinguish between three levels of modeling assumptions:

  • Fully-parametric. The probability distributions describing the data-generation process are assumed to be fully described by a family of probability distributions involving only a finite number of unknown parameters. For example, one may assume that the distribution of population values is truly Normal , with unknown mean and variance, and that data sets are generated by simple random sampling. The family of generalized linear models is a widely used and flexible class of parametric models.
  • Non-parametric. The assumptions made about the process generating the data are much fewer than in parametric statistics and may be minimal. For example, every continuous probability distribution has a median that may be estimated using the sample median, which has good properties when the data arise from simple random sampling.
  • Semi-parametric. This term typically implies assumptions in between fully and non-parametric approaches. For example, one may assume that a population distribution has a finite mean. Furthermore, one may assume that the mean response level in the population depends in a truly linear manner on some covariate (a parametric assumption), but not make any parametric assumption describing the variance around that mean. More generally, semi-parametric models can often be separated into 'structural' and 'random variation' components. One component is treated parametrically and the other non-parametrically.

Importance of Valid Models

Whatever level of assumption is made, correctly calibrated inference in general requires these assumptions to be correct (i.e., that the data-generating mechanisms have been correctly specified).

Incorrect assumptions of simple random sampling can invalidate statistical inference. More complex semi- and fully parametric assumptions are also cause for concern. For example, incorrect "Assumptions of Normality" in the population invalidate some forms of regression-based inference. The use of any parametric model is viewed skeptically by most experts in sampling human populations. In particular, a normal distribution would be a totally unrealistic and unwise assumption to make if we were dealing with any kind of economic population. Here, the central limit theorem states that the distribution of the sample mean for very large samples is approximately normally distributed, if the distribution is not heavy tailed.

[ edit ]
Edit this content
Prev Concept
Was the Result Important?
Does the Difference Prove the Point?
Next Concept
Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.