Methodology

How Lyell Data models AI adoption & usage

Transparent, repeatable, and auditable methods for all metrics: data sources, modeling, uncertainty, QA, and revision policy.

Each dataset carries a Modeled + Public Data badge, a point estimate, and a modeled range.
1

Executive summary

What we publish. Country- and industry-level metrics for: (a) AI adoption — “% of firms using AI in ≥1 process”; (b) GenAI usage — e.g., “ChatGPT monthly active users (MAU)”. For each metric we show a point estimate and a modeled range.

What this is not. Not official statistics. Estimates are modeled from public datasets and directional proxies, documented on each page.

How to read a range. Ranges combine signal variance, coverage uncertainty, and revision risk. Bands are clipped to valid bounds (e.g., 0–100%).

2

Scope & definitions

AI adoption

Share of firms that report or demonstrate using AI in at least one business process (analytics, automation, customer service, coding assistance, etc.). Industry pages show SME vs Large splits.

SME vs Large. Unless a national standard exists, SME ≈ 10–249 employees; Large ≥ 250. Country pages state deviations.

GenAI/ChatGPT usage

Population-level or professional user metrics modeled from traffic, app telemetry (where public), job/task signals, and triangulation from multiple panels. “MAU” is unique users per month after de-duplication heuristics.

Industry classes

Industries are mapped to ISIC/NAICS families. Country pages may show local classifications with a mapping table. Example groups: IT Services; BFSI; Telecom; Manufacturing (and splits); Retail & E-commerce; Logistics; Healthcare; Education; Public Sector; Agriculture.

Out of scope

  • Micro-firms (<10 employees) unless explicitly covered by a survey anchor.
  • Hobby/individual usage for adoption metrics (still captured in usage/MAU where relevant).
  • Proprietary vendor telemetry not available as public signals.
3

Data sources (public signals)

We blend multiple open signals. Each source is assessed for coverage, bias, and update cadence.

Source classWhat it addsCoverage & biasUpdate cadence
Official releases (national stats, sector bodies) Anchors, SME/Large splits, baseline adoption Lagged; sector granularity varies; enterprise-skew Quarterly–annual
Labor market signals (job postings) Demand & organizational intent; role mix Urban/large-firm bias; duplicate posts Weekly–monthly
Developer activity (public repos) Capability & ecosystem maturity Open-source skew; private code unseen Daily–monthly
Traffic & query signals Usage direction; consumer/pro adoption tilt Bot noise; ISP/censorship effects; device skew Daily–monthly
Sector reports (public) Context; cross-checks Method variance; survivorship bias Ad-hoc
Every dataset page lists the specific sources used and their role in the estimate.
4

Modeling framework

4.1 Adoption score (by country × industry)

  1. Normalize signals. For each signal s_k we compute a regional/sectoral normalization (z-score or min-max) to reduce cross-country scale effects: ŝ_k = norm(s_k | region, sector).
  2. Weighted fusion. Combine normalized signals by learned weights w_k: A* = Σ (w_k · ŝ_k).
  3. Anchors & calibration. Scale and shift using anchor jurisdictions/sectors with official stats: A = α_sector · A* + β_country.
  4. Size split. Apply firm-size offset: A_SME = A − δ_size, A_Large = A + δ_size (δ depends on sector and country structure).
  5. Bounds. Clamp to [0,100].

4.2 GenAI/ChatGPT usage (MAU)

  1. Traffic triangulation. Web/app traffic, device shares, and session heuristics → preliminary user counts.
  2. Deduplication heuristics. Cross-device, VPN/shared IP adjustments, and app/web overlap correction.
  3. Panel calibration. Where available, calibrate to public panel reference points; otherwise regional analogues.
  4. Professional tilt. Adjust for enterprise usage share using postings mix and enterprise signals.
Transparency. Each country page specifies the active signals, anchor sets, and the version of this methodology (see Changelog).
5

Uncertainty & modeled ranges

We reflect uncertainty as a band around the point estimate.

  1. Signal variance. Each signal contributes a variance σ_k² estimated from rolling windows or cross-section dispersion.
  2. Composite standard error (SE). SE = √( Σ (w_k² · σ_k²) ).
  3. Range width. [A − γ·SE, A + γ·SE] with γ in [1.0, 1.4] depending on coverage; widen if coverage < 60%.
  4. Clamping. Clip to valid domain (e.g., 0–100%).
  5. Trend bands. Charts use the same SE + a small revision-risk premium for newer months.
Interpretation. Bands are not confidence intervals in a strict frequentist sense; they are modeled uncertainty ranges combining statistical and coverage considerations.
6

Temporal methods

  • Smoothing. Light EMA or Holt smoothing for noisy signals; preserve turning points.
  • Nowcasting. When anchors lag, extrapolate with short-term signals; replace with actuals upon release.
  • YoY & revisions. YoY computed on revised history; we maintain a changelog of back-revisions.
7

Quality assurance & governance

Automated checks

  • Outliers: |z| > 3.5 → flag
  • Trend breaks: |ΔYoY| > 15 pp → review
  • Coverage < 60% → widen band / lower weight
  • SME-Large gap sanity: |Δ| > 30 pp → recalibrate

Manual review

  • Country/sector expert pass (context, policy/regulatory changes)
  • Source audit: sampling bias and stale anchors
  • Reproducibility spot-checks (code & parameters)

Versioning & revisions

  • Semantic versioning: major.minor.patch
  • Back-revisions documented with reasons (new anchors, bugfix, re-weight)
  • Stable URLs for CSVs; metadata includes method_version

Ethics & privacy

  • No PII; only aggregated, public or derivative signals
  • Respect for national data laws (e.g., GDPR/LGPD); remove or mask where required
  • Vendor telemetry used only if publicly disclosed or licensed
8

Known limitations & biases

  • Access bias. Internet connectivity, censorship, and payment rails can under/over-represent users.
  • Enterprise skew. Job postings lean toward larger firms; SME adoption may be understated.
  • Language/culture. GenAI prompts and product choices vary by language; cross-country comparability is approximate.
  • Early adopter optics. Low base countries can show large YoY changes that overstate maturity.
9

How to cite & reuse

Content and CSVs are licensed under CC BY 4.0. Please attribute as follows:

Lyell Data (2025). AI Adoption in <Country> — 2025. https://example.com/ai/industry/<slug>

Academic (APA):

Lyell Data. (2025). AI Adoption in <Country> — 2025 (Version <x.y.z>) [Dataset]. CC BY 4.0. https://example.com/ai/industry/<slug>

CSV schema

ColumnDescription
indicatore.g., adoption_overall, adoption_sme, usage_mau
valuePoint estimate (numeric)
lo, hiModeled range bounds
segmentSME / Large / Total
industryIndustry label (if applicable)
country, yearGeography & reference year
source_classAlways “Modeled + Public Data”
updatedISO date of last change
method_versionSemVer of this methodology
10

Authorship & review

Responsible authors

  • Maarja Veskimägi — Industry AI Adoption Lead Author
  • Siti Norhayati Omar — GenAI Usage & MAU Modeling Author

Review & feedback

Internal review before publication; external expert review on major updates. Feedback: /contact.

11

FAQ

Is the “modeled range” a confidence interval?

No. It’s a modeled uncertainty band that blends statistical variance with coverage and revision risk.

Why does my country show a wide range?

Likely due to sparse anchors or high disagreement among signals. As coverage improves, ranges narrow.

How often do you revise datasets?

Minor: rolling; Major: monthly/quarterly depending on anchors. See the changelog.

Can I get raw signal slices?

Yes—contact us for a license where permissible; we only ship aggregated, non-PII views by default.

12

Appendix

Formulas (compact)

Normalize:   ŝ_k = norm(s_k | region, sector)
Fusion:      A*  = Σ w_k · ŝ_k
Calibrate:   A   = α_sector · A* + β_country
Size split:  A_SME = A − δ_size ; A_Large = A + δ_size
Uncertainty: SE = √( Σ w_k² · σ_k² ), Range = [A − γ·SE, A + γ·SE]
        

ISIC ↔ NAICS quick mapping (examples)

GroupISIC examplesNAICS examples
IT ServicesJ62–63518, 519, 5415
BFSIK64–6652
TelecomJ61517
ManufacturingC10–3331–33
RetailG4744–45
Public sectorO8492

Evidence table (template)

CountryIndustrySignals usedAnchor?CoverageNotes
IndiaITpostings, dev, trafficYesHighExport-oriented IT; enterprise skew
BrazilRetailpostings, trafficPartialMediumLGPD considerations; FX sensitivity
NigeriaTelecompostings, trafficNoMediumPower/connectivity constraints
13

Changelog (method_version)

  • 1.1.0 — Added revision-risk premium in trend bands; clarified SME threshold text. 2025-08-08
  • 1.0.0 — Initial public release: adoption & usage frameworks, range model, QA checks. 2025-07-20
Each dataset includes method_version in metadata and CSV.
Last updated: Aug 2025 • Badge: Modeled + Public Data