data snooping

(noun)

the inappropriate (sometimes deliberately so) use of data mining to uncover misleading relationships in data

Related Terms

  • Type I error

Examples of data snooping in the following topics:

  • Data Snooping: Testing Hypotheses Once You've Seen the Data

    • Testing hypothesis once you've seen the data may result in inaccurate conclusions.
    • The error is particularly prevalent in data mining and machine learning.
    • Data snooping (also called data fishing or data dredging) is the inappropriate (sometimes deliberately so) use of data mining to uncover misleading relationships in data.
    • Data-snooping bias is a form of statistical bias that arises from this misuse of statistics.
    • Although data-snooping bias can occur in any field that uses data mining, it is of particular concern in finance and medical research, which both heavily use data mining.
  • Is batting performance related to player position in MLB?

    • We will use a data set called bat10, which includes batting records of 327 Major League Baseball (MLB) players from the 2010 season.
    • The primary issue here is that we are inspecting the data before picking the groups that will be compared.
    • It is inappropriate to examine all data by eye (informal testing) and only afterwards decide which parts to formally test.
    • This is called data snooping or data fishing.
  • Distorting the Truth with Descriptive Statistics

    • Reporting bias involves a skew in the availability of data, such that observations of a certain kind may be more likely to be reported and consequently used in research.
    • Descriptive statistics is a powerful form of research because it collects and summarizes vast amounts of data and information in a manageable and organized manner.
    • correlate (associate) data or create any type of statistical relationship modeling relationship among variables;
    • In other words, every time you try to describe a large set of observations with a single descriptive statistics indicator, you run the risk of distorting the original data or losing important detail.
  • Gender Messages in Mass Media

    • The music video for "Pimp," a song by 50 Cent, Snoop Dogg, and G-Unit, demonstrates how harmful gender messages can be disseminated through mass media.
    • The music video for "PIMP," a song by 50 Cent, Snoop Dogg, and G-Unit, demonstrates how gender messages are disseminated through mass media.
  • Exercises

    • Alan, while snooping around his grandmother's basement stumbled upon a shiny object protruding from under a stack of boxes .
  • Data and Information

    • Data consists of nothing but facts, which can be manipulated to make it useful; the analytical process turns the data into information.
    • Binary files (readable by a computer but not a human) are sometimes called "data" and are distinguishable from human-readable data, referred to as "text" .
    • Once data is in digital format, various procedures can be applied on the data to get useful information.
    • Data processing may involve various processes, including:
    • Data processing may or may not be distinguishable from data conversion, which involves changing data into another format, and does not involve any data manipulation.
  • Analyzing Data

    • Data Analysis is an important step in the Marketing Research process where data is organized, reviewed, verified, and interpreted.
    • Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes.
    • In statistical applications, some people divide data analysis into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA).
    • All are varieties of data analysis.
    • Summarize the characteristics of data preparation and methodology of data analysis
  • MLA: Reporting Data

  • APA: Reporting Data

  • Chicago/Turabian: Reporting Data

Subjects
  • Accounting
  • Algebra
  • Art History
  • Biology
  • Business
  • Calculus
  • Chemistry
  • Communications
  • Economics
  • Finance
  • Management
  • Marketing
  • Microbiology
  • Physics
  • Physiology
  • Political Science
  • Psychology
  • Sociology
  • Statistics
  • U.S. History
  • World History
  • Writing

Except where noted, content and user contributions on this site are licensed under CC BY-SA 4.0 with attribution required.