Why is my country’s range wide?

Sparse anchors or high disagreement among signals. Ranges narrow as coverage improves.

Methodology - Lyell Data

Q: Is the modeled range a confidence interval?

No. It blends statistical variance with coverage and revision-risk into a modeled band.

Q: How often do you revise datasets?

Minor rolling updates; major updates monthly/quarterly depending on anchors. Changelog maintained.

Q: Can I get raw signal slices?

Yes, via a license where permissible. We ship aggregated, non-PII views by default.

1

Executive summary

What we publish. Country- and industry-level metrics for: (a) AI adoption — “% of firms using AI in ≥1 process”; (b) GenAI usage — e.g., “ChatGPT monthly active users (MAU)”. For each metric we show a point estimate and a modeled range.

What this is not. Not official statistics. Estimates are modeled from public datasets and directional proxies, documented on each page.

How to read a range. Ranges combine signal variance, coverage uncertainty, and revision risk. Bands are clipped to valid bounds (e.g., 0–100%).

2

Scope & definitions

AI adoption

Share of firms that report or demonstrate using AI in at least one business process (analytics, automation, customer service, coding assistance, etc.). Industry pages show SME vs Large splits.

SME vs Large. Unless a national standard exists, SME ≈ 10–249 employees; Large ≥ 250. Country pages state deviations.

GenAI/ChatGPT usage

Population-level or professional user metrics modeled from traffic, app telemetry (where public), job/task signals, and triangulation from multiple panels. “MAU” is unique users per month after de-duplication heuristics.

Industry classes

Industries are mapped to ISIC/NAICS families. Country pages may show local classifications with a mapping table. Example groups: IT Services; BFSI; Telecom; Manufacturing (and splits); Retail & E-commerce; Logistics; Healthcare; Education; Public Sector; Agriculture.

Out of scope

Micro-firms (<10 employees) unless explicitly covered by a survey anchor.
Hobby/individual usage for adoption metrics (still captured in usage/MAU where relevant).
Proprietary vendor telemetry not available as public signals.

3

Data sources (public signals)

We blend multiple open signals. Each source is assessed for coverage, bias, and update cadence.

Source class	What it adds	Coverage & bias	Update cadence
Official releases (national stats, sector bodies)	Anchors, SME/Large splits, baseline adoption	Lagged; sector granularity varies; enterprise-skew	Quarterly–annual
Labor market signals (job postings)	Demand & organizational intent; role mix	Urban/large-firm bias; duplicate posts	Weekly–monthly
Developer activity (public repos)	Capability & ecosystem maturity	Open-source skew; private code unseen	Daily–monthly
Traffic & query signals	Usage direction; consumer/pro adoption tilt	Bot noise; ISP/censorship effects; device skew	Daily–monthly
Sector reports (public)	Context; cross-checks	Method variance; survivorship bias	Ad-hoc

Every dataset page lists the specific sources used and their role in the estimate.

4

Modeling framework

4.1 Adoption score (by country × industry)

Normalize signals. For each signal s_k we compute a regional/sectoral normalization (z-score or min-max) to reduce cross-country scale effects: ŝ_k = norm(s_k | region, sector).
Weighted fusion. Combine normalized signals by learned weights w_k: A* = Σ (w_k · ŝ_k).
Anchors & calibration. Scale and shift using anchor jurisdictions/sectors with official stats: A = α_sector · A* + β_country.
Size split. Apply firm-size offset: A_SME = A − δ_size, A_Large = A + δ_size (δ depends on sector and country structure).
Bounds. Clamp to [0,100].

4.2 GenAI/ChatGPT usage (MAU)

Traffic triangulation. Web/app traffic, device shares, and session heuristics → preliminary user counts.
Deduplication heuristics. Cross-device, VPN/shared IP adjustments, and app/web overlap correction.
Panel calibration. Where available, calibrate to public panel reference points; otherwise regional analogues.
Professional tilt. Adjust for enterprise usage share using postings mix and enterprise signals.

Transparency. Each country page specifies the active signals, anchor sets, and the version of this methodology (see Changelog).

5

Uncertainty & modeled ranges

We reflect uncertainty as a band around the point estimate.

Signal variance. Each signal contributes a variance σ_k² estimated from rolling windows or cross-section dispersion.
Composite standard error (SE). SE = √( Σ (w_k² · σ_k²) ).
Range width. [A − γ·SE, A + γ·SE] with γ in [1.0, 1.4] depending on coverage; widen if coverage < 60%.
Clamping. Clip to valid domain (e.g., 0–100%).
Trend bands. Charts use the same SE + a small revision-risk premium for newer months.

Interpretation. Bands are not confidence intervals in a strict frequentist sense; they are modeled uncertainty ranges combining statistical and coverage considerations.

6

Temporal methods

Smoothing. Light EMA or Holt smoothing for noisy signals; preserve turning points.
Nowcasting. When anchors lag, extrapolate with short-term signals; replace with actuals upon release.
YoY & revisions. YoY computed on revised history; we maintain a changelog of back-revisions.

7

Quality assurance & governance

Automated checks

Outliers: |z| > 3.5 → flag
Trend breaks: |ΔYoY| > 15 pp → review
Coverage < 60% → widen band / lower weight
SME-Large gap sanity: |Δ| > 30 pp → recalibrate

Manual review

Country/sector expert pass (context, policy/regulatory changes)
Source audit: sampling bias and stale anchors
Reproducibility spot-checks (code & parameters)

Versioning & revisions

Semantic versioning: major.minor.patch
Back-revisions documented with reasons (new anchors, bugfix, re-weight)
Stable URLs for CSVs; metadata includes method_version

Ethics & privacy

No PII; only aggregated, public or derivative signals
Respect for national data laws (e.g., GDPR/LGPD); remove or mask where required
Vendor telemetry used only if publicly disclosed or licensed

8

Known limitations & biases

Access bias. Internet connectivity, censorship, and payment rails can under/over-represent users.
Enterprise skew. Job postings lean toward larger firms; SME adoption may be understated.
Language/culture. GenAI prompts and product choices vary by language; cross-country comparability is approximate.
Early adopter optics. Low base countries can show large YoY changes that overstate maturity.

9

How to cite & reuse

Content and CSVs are licensed under CC BY 4.0. Please attribute as follows:

Lyell Data (2025). AI Adoption in <Country> — 2025. https://example.com/ai/industry/<slug>

Academic (APA):

Lyell Data. (2025). AI Adoption in <Country> — 2025 (Version <x.y.z>) [Dataset]. CC BY 4.0. https://example.com/ai/industry/<slug>

CSV schema

Column	Description
`indicator`	e.g., `adoption_overall`, `adoption_sme`, `usage_mau`
`value`	Point estimate (numeric)
`lo`, `hi`	Modeled range bounds
`segment`	SME / Large / Total
`industry`	Industry label (if applicable)
`country`, `year`	Geography & reference year
`source_class`	Always “Modeled + Public Data”
`updated`	ISO date of last change
`method_version`	SemVer of this methodology

10

Authorship & review

Responsible authors

Maarja Veskimägi — Industry AI Adoption Lead Author
Siti Norhayati Omar — GenAI Usage & MAU Modeling Author

Review & feedback

Internal review before publication; external expert review on major updates. Feedback: /contact.

11

FAQ

Is the “modeled range” a confidence interval?

No. It’s a modeled uncertainty band that blends statistical variance with coverage and revision risk.

Why does my country show a wide range?

Likely due to sparse anchors or high disagreement among signals. As coverage improves, ranges narrow.

How often do you revise datasets?

Minor: rolling; Major: monthly/quarterly depending on anchors. See the changelog.

Can I get raw signal slices?

Yes—contact us for a license where permissible; we only ship aggregated, non-PII views by default.

12

Appendix

Formulas (compact)

Normalize:   ŝ_k = norm(s_k | region, sector)
Fusion:      A*  = Σ w_k · ŝ_k
Calibrate:   A   = α_sector · A* + β_country
Size split:  A_SME = A − δ_size ; A_Large = A + δ_size
Uncertainty: SE = √( Σ w_k² · σ_k² ), Range = [A − γ·SE, A + γ·SE]

ISIC ↔ NAICS quick mapping (examples)

Group	ISIC examples	NAICS examples
IT Services	J62–63	518, 519, 5415
BFSI	K64–66	52
Telecom	J61	517
Manufacturing	C10–33	31–33
Retail	G47	44–45
Public sector	O84	92

Evidence table (template)

Country	Industry	Signals used	Anchor?	Coverage	Notes
India	IT	postings, dev, traffic	Yes	High	Export-oriented IT; enterprise skew
Brazil	Retail	postings, traffic	Partial	Medium	LGPD considerations; FX sensitivity
Nigeria	Telecom	postings, traffic	No	Medium	Power/connectivity constraints

13

Changelog (method_version)

1.1.0 — Added revision-risk premium in trend bands; clarified SME threshold text. 2025-08-08
1.0.0 — Initial public release: adoption & usage frameworks, range model, QA checks. 2025-07-20

Each dataset includes method_version in metadata and CSV.