Sources

Public signals we use and how we use them

This catalog documents the source classes, example datasets, licenses, update cadence, coverage, known biases, and how each source feeds our models. All metrics on Lyell Data carry the Modeled + Public Data badge.

We do not process PII. We use aggregated signals only. Vendor telemetry is included only if public or licensed, with labeling.
Overview

Source taxonomy

  • Official open data — national statistics, sector regulators, central banks.
  • Labor market signals — public job posting indices by role/skill.
  • Developer activity — open-source repositories, package registries, Q&A activity.
  • Traffic & query signals — web/app traffic aggregates, search interest.
  • Sector reports (public) — non-paywalled, with documented methods.

Fit-for-purpose. Adoption estimates emphasize enterprise-weighted signals and official anchors; usage estimates emphasize traffic and query signals, then calibrated to panels where available.

Catalog

Master source list

All Official Labor market Developer activity Traffic & queries Sector reports
Source Class License Update cadence Coverage & known biases Used for
“Used for” indicates which models the source informs: Adoption Usage Calibration.
How it flows

From source to estimate

Signal → Normalize → Weight → Calibrate

  1. Ingest (public CSV/API scrape, rate-limit aware; docstring saved).
  2. Normalize by region/sector to reduce cross-country scale effects.
  3. Weight by learned reliability (coverage, recency, variance, license clarity).
  4. Calibrate using official anchors or analogous benchmark markets.

Reliability scoring rubric (0–5)

DimensionDescription
CoverageGeographic & sector breadth, sample size
RecencyUpdate frequency; lag vs. event time
Method clarityPublic methodology & reproducibility
LicenseOpen reuse (CC/OGL) vs. restrictive
StabilityURL/format stability; backfill support
Evidence

Signal weight by model (illustrative)

Signal class AI Adoption GenAI Usage Role
Official open dataHighMediumCalibration anchors; SME/Large splits
Labor market signalsMediumLowEnterprise intent; role mix
Developer activityMediumLowEcosystem maturity
Traffic & queriesLowHighUsage direction & scale
Sector reportsLowLowContextual cross-check
Actual weights vary by country/industry; see each dataset’s “Sources” box.
Policy

Inclusion & exclusion criteria

Included when…

  • Dataset is publicly accessible, documented, and legally reusable (or citation permitted).
  • Methodology and refresh cadence are stated or inferable.
  • Coverage suffices for our geography/industry cut (>= 60% desirable).

Excluded when…

  • License prohibits reuse/derivation and no permission is granted.
  • Opaque or unverifiable methodology; extreme volatility unexplained.
  • Contains personal data or disaggregated rows that risk re-identification.
Important: We cite proprietary reports for context sparingly and never as sole quant anchors.
Contribute

Submit a source or correction

Spotted an issue or have a high-quality public dataset? Send it here.

Last updated: Aug 2025 • Badge: Modeled + Public Data