Skip to main content

Glossary

Each entry includes a definition, why the idea matters, a short example, and links to related terms, guides, and curated external resources.

  • Data literacy

    The ability to read, interpret, question, and communicate about data in context.

  • Dataset

    A structured collection of values—often rows and columns—that can be analyzed or visualized.

  • Metadata

    Data about data: who collected it, when, how, definitions of fields, and limitations.

  • Data quality

    How well data fits its intended use across dimensions like accuracy, completeness, and timeliness.

  • Variable

    A measurable attribute that can differ across records or time (for example, age or revenue).

  • Observation

    A single recorded instance—often one row—representing an entity at a point in time.

  • Sample

    A subset drawn from a larger population used to estimate characteristics of the whole.

  • Population

    The full set of individuals, cases, or events a study or dataset aims to describe.

  • Bias

    Systematic distortion that pushes results away from the truth—through collection, processing, or interpretation.

  • Aggregation

    Combining many values into summaries such as totals, averages, or rates.

  • Distribution

    The shape of how values spread—where they cluster, tails, and outliers.

  • Outlier

    A value unusually far from most other values in a dataset or chart.

  • Rate

    A ratio that compares counts to a baseline—often per thousand or per hundred thousand.

  • Categorical data

    Values that represent groups or labels, such as country names or survey responses.

  • Numeric data

    Quantities measured as numbers, whether discrete counts or continuous measurements.

  • Missing data

    Fields intentionally or unintentionally left blank or unknown.

  • Validation

    Checks that data meets expected formats, ranges, and business rules.

  • Schema

    The planned structure of a dataset: table names, columns, types, and relationships.

  • Provenance

    The origin and history of a dataset—sources, transformations, and versions.

  • Data documentation

    Written materials—README files, data dictionaries, methodology notes—that explain how to use data.

  • Ethics (data)

    Principles for fair, transparent, and privacy-respecting collection and use of data.

  • Margin of error

    A range expressing sampling uncertainty around an estimate from a survey.

  • Visualization

    Graphical representations—charts and maps—that encode data values for quick comparison.