How Lyell Data models AI adoption & usage
Transparent, repeatable, and auditable methods for all metrics: data sources, modeling, uncertainty, QA, and revision policy.
Executive summary
What we publish. Country- and industry-level metrics for: (a) AI adoption — “% of firms using AI in ≥1 process”; (b) GenAI usage — e.g., “ChatGPT monthly active users (MAU)”. For each metric we show a point estimate and a modeled range.
What this is not. Not official statistics. Estimates are modeled from public datasets and directional proxies, documented on each page.
How to read a range. Ranges combine signal variance, coverage uncertainty, and revision risk. Bands are clipped to valid bounds (e.g., 0–100%).
Scope & definitions
AI adoption
Share of firms that report or demonstrate using AI in at least one business process (analytics, automation, customer service, coding assistance, etc.). Industry pages show SME vs Large splits.
SME vs Large. Unless a national standard exists, SME ≈ 10–249 employees; Large ≥ 250. Country pages state deviations.
GenAI/ChatGPT usage
Population-level or professional user metrics modeled from traffic, app telemetry (where public), job/task signals, and triangulation from multiple panels. “MAU” is unique users per month after de-duplication heuristics.
Industry classes
Industries are mapped to ISIC/NAICS families. Country pages may show local classifications with a mapping table. Example groups: IT Services; BFSI; Telecom; Manufacturing (and splits); Retail & E-commerce; Logistics; Healthcare; Education; Public Sector; Agriculture.
Out of scope
- Micro-firms (<10 employees) unless explicitly covered by a survey anchor.
- Hobby/individual usage for adoption metrics (still captured in usage/MAU where relevant).
- Proprietary vendor telemetry not available as public signals.
Data sources (public signals)
We blend multiple open signals. Each source is assessed for coverage, bias, and update cadence.
Source class | What it adds | Coverage & bias | Update cadence |
---|---|---|---|
Official releases (national stats, sector bodies) | Anchors, SME/Large splits, baseline adoption | Lagged; sector granularity varies; enterprise-skew | Quarterly–annual |
Labor market signals (job postings) | Demand & organizational intent; role mix | Urban/large-firm bias; duplicate posts | Weekly–monthly |
Developer activity (public repos) | Capability & ecosystem maturity | Open-source skew; private code unseen | Daily–monthly |
Traffic & query signals | Usage direction; consumer/pro adoption tilt | Bot noise; ISP/censorship effects; device skew | Daily–monthly |
Sector reports (public) | Context; cross-checks | Method variance; survivorship bias | Ad-hoc |
Modeling framework
4.1 Adoption score (by country × industry)
- Normalize signals. For each signal
s_k
we compute a regional/sectoral normalization (z-score or min-max) to reduce cross-country scale effects:ŝ_k = norm(s_k | region, sector)
. - Weighted fusion. Combine normalized signals by learned weights
w_k
:A* = Σ (w_k · ŝ_k)
. - Anchors & calibration. Scale and shift using anchor jurisdictions/sectors with official stats:
A = α_sector · A* + β_country
. - Size split. Apply firm-size offset:
A_SME = A − δ_size
,A_Large = A + δ_size
(δ depends on sector and country structure). - Bounds. Clamp to [0,100].
4.2 GenAI/ChatGPT usage (MAU)
- Traffic triangulation. Web/app traffic, device shares, and session heuristics → preliminary user counts.
- Deduplication heuristics. Cross-device, VPN/shared IP adjustments, and app/web overlap correction.
- Panel calibration. Where available, calibrate to public panel reference points; otherwise regional analogues.
- Professional tilt. Adjust for enterprise usage share using postings mix and enterprise signals.
Uncertainty & modeled ranges
We reflect uncertainty as a band around the point estimate.
- Signal variance. Each signal contributes a variance
σ_k²
estimated from rolling windows or cross-section dispersion. - Composite standard error (SE).
SE = √( Σ (w_k² · σ_k²) )
. - Range width.
[A − γ·SE, A + γ·SE]
withγ
in [1.0, 1.4] depending on coverage; widen if coverage < 60%. - Clamping. Clip to valid domain (e.g., 0–100%).
- Trend bands. Charts use the same SE + a small revision-risk premium for newer months.
Temporal methods
- Smoothing. Light EMA or Holt smoothing for noisy signals; preserve turning points.
- Nowcasting. When anchors lag, extrapolate with short-term signals; replace with actuals upon release.
- YoY & revisions. YoY computed on revised history; we maintain a changelog of back-revisions.
Quality assurance & governance
Automated checks
- Outliers: |z| > 3.5 → flag
- Trend breaks: |ΔYoY| > 15 pp → review
- Coverage < 60% → widen band / lower weight
- SME-Large gap sanity: |Δ| > 30 pp → recalibrate
Manual review
- Country/sector expert pass (context, policy/regulatory changes)
- Source audit: sampling bias and stale anchors
- Reproducibility spot-checks (code & parameters)
Versioning & revisions
- Semantic versioning:
major.minor.patch
- Back-revisions documented with reasons (new anchors, bugfix, re-weight)
- Stable URLs for CSVs; metadata includes
method_version
Ethics & privacy
- No PII; only aggregated, public or derivative signals
- Respect for national data laws (e.g., GDPR/LGPD); remove or mask where required
- Vendor telemetry used only if publicly disclosed or licensed
Known limitations & biases
- Access bias. Internet connectivity, censorship, and payment rails can under/over-represent users.
- Enterprise skew. Job postings lean toward larger firms; SME adoption may be understated.
- Language/culture. GenAI prompts and product choices vary by language; cross-country comparability is approximate.
- Early adopter optics. Low base countries can show large YoY changes that overstate maturity.
How to cite & reuse
Content and CSVs are licensed under CC BY 4.0. Please attribute as follows:
Lyell Data (2025). AI Adoption in <Country> — 2025. https://example.com/ai/industry/<slug>
Academic (APA):
Lyell Data. (2025). AI Adoption in <Country> — 2025 (Version <x.y.z>) [Dataset]. CC BY 4.0. https://example.com/ai/industry/<slug>
CSV schema
Column | Description |
---|---|
indicator | e.g., adoption_overall , adoption_sme , usage_mau |
value | Point estimate (numeric) |
lo , hi | Modeled range bounds |
segment | SME / Large / Total |
industry | Industry label (if applicable) |
country , year | Geography & reference year |
source_class | Always “Modeled + Public Data” |
updated | ISO date of last change |
method_version | SemVer of this methodology |
Authorship & review
Responsible authors
- Maarja Veskimägi — Industry AI Adoption Lead Author
- Siti Norhayati Omar — GenAI Usage & MAU Modeling Author
Review & feedback
Internal review before publication; external expert review on major updates. Feedback: /contact.
FAQ
Is the “modeled range” a confidence interval?
No. It’s a modeled uncertainty band that blends statistical variance with coverage and revision risk.
Why does my country show a wide range?
Likely due to sparse anchors or high disagreement among signals. As coverage improves, ranges narrow.
How often do you revise datasets?
Minor: rolling; Major: monthly/quarterly depending on anchors. See the changelog.
Can I get raw signal slices?
Yes—contact us for a license where permissible; we only ship aggregated, non-PII views by default.
Appendix
Formulas (compact)
Normalize: ŝ_k = norm(s_k | region, sector) Fusion: A* = Σ w_k · ŝ_k Calibrate: A = α_sector · A* + β_country Size split: A_SME = A − δ_size ; A_Large = A + δ_size Uncertainty: SE = √( Σ w_k² · σ_k² ), Range = [A − γ·SE, A + γ·SE]
ISIC ↔ NAICS quick mapping (examples)
Group | ISIC examples | NAICS examples |
---|---|---|
IT Services | J62–63 | 518, 519, 5415 |
BFSI | K64–66 | 52 |
Telecom | J61 | 517 |
Manufacturing | C10–33 | 31–33 |
Retail | G47 | 44–45 |
Public sector | O84 | 92 |
Evidence table (template)
Country | Industry | Signals used | Anchor? | Coverage | Notes |
---|---|---|---|---|---|
India | IT | postings, dev, traffic | Yes | High | Export-oriented IT; enterprise skew |
Brazil | Retail | postings, traffic | Partial | Medium | LGPD considerations; FX sensitivity |
Nigeria | Telecom | postings, traffic | No | Medium | Power/connectivity constraints |
Changelog (method_version)
- 1.1.0 — Added revision-risk premium in trend bands; clarified SME threshold text. 2025-08-08
- 1.0.0 — Initial public release: adoption & usage frameworks, range model, QA checks. 2025-07-20
method_version
in metadata and CSV.