Sources

Public signals we use and how we use them

This catalog documents the source classes, example datasets, licenses, update cadence, coverage, known biases, and how each source feeds our models. All metrics on Lyell Data carry the Modeled + Public Data badge.

We do not process PII. We use aggregated signals only. Vendor telemetry is included only if public or licensed, with labeling.

Overview

Source taxonomy

Official open data — national statistics, sector regulators, central banks.
Labor market signals — public job posting indices by role/skill.
Developer activity — open-source repositories, package registries, Q&A activity.
Traffic & query signals — web/app traffic aggregates, search interest.
Sector reports (public) — non-paywalled, with documented methods.

Fit-for-purpose. Adoption estimates emphasize enterprise-weighted signals and official anchors; usage estimates emphasize traffic and query signals, then calibrated to panels where available.

Catalog

Master source list

All Official Labor market Developer activity Traffic & queries Sector reports

Source	Class	License	Update cadence	Coverage & known biases	Used for

“Used for” indicates which models the source informs: Adoption Usage Calibration.

How it flows

From source to estimate

Signal → Normalize → Weight → Calibrate

Ingest (public CSV/API scrape, rate-limit aware; docstring saved).
Normalize by region/sector to reduce cross-country scale effects.
Weight by learned reliability (coverage, recency, variance, license clarity).
Calibrate using official anchors or analogous benchmark markets.

Reliability scoring rubric (0–5)

Dimension	Description
Coverage	Geographic & sector breadth, sample size
Recency	Update frequency; lag vs. event time
Method clarity	Public methodology & reproducibility
License	Open reuse (CC/OGL) vs. restrictive
Stability	URL/format stability; backfill support

Evidence

Signal weight by model (illustrative)

Signal class	AI Adoption	GenAI Usage	Role
Official open data	High	Medium	Calibration anchors; SME/Large splits
Labor market signals	Medium	Low	Enterprise intent; role mix
Developer activity	Medium	Low	Ecosystem maturity
Traffic & queries	Low	High	Usage direction & scale
Sector reports	Low	Low	Contextual cross-check

Actual weights vary by country/industry; see each dataset’s “Sources” box.

Policy

Inclusion & exclusion criteria

Included when…

Dataset is publicly accessible, documented, and legally reusable (or citation permitted).
Methodology and refresh cadence are stated or inferable.
Coverage suffices for our geography/industry cut (>= 60% desirable).

Excluded when…

License prohibits reuse/derivation and no permission is granted.
Opaque or unverifiable methodology; extreme volatility unexplained.
Contains personal data or disaggregated rows that risk re-identification.

Important: We cite proprietary reports for context sparingly and never as sole quant anchors.

Contribute

Submit a source or correction

Spotted an issue or have a high-quality public dataset? Send it here.

Last updated: Aug 2025 • Badge: Modeled + Public Data