Sources
Public signals we use and how we use them
This catalog documents the source classes, example datasets, licenses, update cadence, coverage, known biases, and how each source feeds our models. All metrics on Lyell Data carry the Modeled + Public Data badge.
We do not process PII. We use aggregated signals only. Vendor telemetry is included only if public or licensed, with labeling.
Overview
Source taxonomy
- Official open data — national statistics, sector regulators, central banks.
- Labor market signals — public job posting indices by role/skill.
- Developer activity — open-source repositories, package registries, Q&A activity.
- Traffic & query signals — web/app traffic aggregates, search interest.
- Sector reports (public) — non-paywalled, with documented methods.
Fit-for-purpose. Adoption estimates emphasize enterprise-weighted signals and official anchors; usage estimates emphasize traffic and query signals, then calibrated to panels where available.
Catalog
Master source list
All
Official
Labor market
Developer activity
Traffic & queries
Sector reports
Source | Class | License | Update cadence | Coverage & known biases | Used for |
---|
No sources match your filters.
“Used for” indicates which models the source informs: Adoption Usage Calibration.
How it flows
From source to estimate
Signal → Normalize → Weight → Calibrate
- Ingest (public CSV/API scrape, rate-limit aware; docstring saved).
- Normalize by region/sector to reduce cross-country scale effects.
- Weight by learned reliability (coverage, recency, variance, license clarity).
- Calibrate using official anchors or analogous benchmark markets.
Reliability scoring rubric (0–5)
Dimension | Description |
---|---|
Coverage | Geographic & sector breadth, sample size |
Recency | Update frequency; lag vs. event time |
Method clarity | Public methodology & reproducibility |
License | Open reuse (CC/OGL) vs. restrictive |
Stability | URL/format stability; backfill support |
Evidence
Signal weight by model (illustrative)
Signal class | AI Adoption | GenAI Usage | Role |
---|---|---|---|
Official open data | High | Medium | Calibration anchors; SME/Large splits |
Labor market signals | Medium | Low | Enterprise intent; role mix |
Developer activity | Medium | Low | Ecosystem maturity |
Traffic & queries | Low | High | Usage direction & scale |
Sector reports | Low | Low | Contextual cross-check |
Actual weights vary by country/industry; see each dataset’s “Sources” box.
Policy
Inclusion & exclusion criteria
Included when…
- Dataset is publicly accessible, documented, and legally reusable (or citation permitted).
- Methodology and refresh cadence are stated or inferable.
- Coverage suffices for our geography/industry cut (>= 60% desirable).
Excluded when…
- License prohibits reuse/derivation and no permission is granted.
- Opaque or unverifiable methodology; extreme volatility unexplained.
- Contains personal data or disaggregated rows that risk re-identification.
Important: We cite proprietary reports for context sparingly and never as sole quant anchors.
Contribute
Submit a source or correction
Spotted an issue or have a high-quality public dataset? Send it here.
Last updated: Aug 2025 • Badge: Modeled + Public Data