ABSTRACT

TruthForge: AI-Powered
Enterprise Verification Intelligence

TruthForge is a real-time enterprise verification system that aggregates, cross-references, and scores corporate claims using three independent AI engines — SignalForge, GreenwashGuard, and ClaimWire — all powered by Bright Data's web intelligence infrastructure and Groq's Llama 3.3-70B model. The system produces a single composite TruthScore (0–100) enabling procurement teams, fund managers, compliance officers, and analysts to make data-backed decisions in under 30 seconds per company.

Keywords: ESG verification, AI trust scoring, greenwashing detection, corporate intelligence, Bright Data MCP, Groq Llama 3.3, FastAPI, Celery, Supabase, AI/ML API (gpt-4o)

SECTION 1

Problem Statement

Modern enterprises face a critical asymmetry of information. Corporate sustainability claims, financial disclosures, and ESG commitments are largely self-reported with minimal independent verification. The result is a $1.2 trillion annual loss from greenwashing fraud, with 40% of ESG claims found misleading by EU regulators in 2024.

Existing solutions fall short: manual due diligence consumes 40+ analyst hours per company; third-party rating agencies lag months behind real events; and 67% of financial analysts report distrust in current ESG ratings. There is no real-time, AI-native verification layer for enterprise decision-making.

TruthForge solves this by automating verification across three independent analytical dimensions, delivering a live trust score backed by crawled public evidence — not delayed surveys or self-submitted data.

SECTION 2

System Architecture

TruthForge is built on a fully async Python backend using FastAPI + Uvicorn. Analysis requests trigger parallel execution of three independent engines via asyncio.gather(), each powered by Bright Data for data collection and Groq for AI synthesis. Results are persisted to Supabase (PostgreSQL), with Celery + Redis handling scheduled re-analysis and alerting.

# Parallel engine execution — core analysis loop async def run_full_analysis(company_name: str) -> dict: results = await asyncio.gather( signal_engine.analyze(company_name), esg_engine.analyze(company_name), claims_engine.analyze(company_name), return_exceptions=True ) signal, esg, claims = results truth_score = ( signal["score"] * 0.35 + esg["score"] * 0.35 + claims["score"] * 0.30 ) return { "truth_score": round(truth_score), **signal, **esg, **claims }

The architecture is horizontally scalable: additional Celery workers can be spun up to parallelize multi-company batch analyses. The Supabase layer provides row-level security, real-time subscriptions, and automatic backups without infrastructure overhead.

SECTION 3

Engine Design

Each engine operates independently, using Bright Data for data collection and Groq Llama 3.3-70B for AI-powered analysis. Engines are weighted asymmetrically based on signal reliability and verifiability.

3.1 — SIGNALFORGE

Market Signal Engine (35%)

Aggregates news, press releases, analyst reports. Uses SERP API for discovery, Web Unlocker for full-text access. Groq synthesizes sentiment polarity and controversy density into a 0–100 signal score.

3.2 — GREENWASHGUARD

ESG Integrity Engine (35%)

Validates ESG disclosures against third-party audits, regulatory filings, and carbon registry databases. Flags claim-to-reality gaps. Scraping Browser handles JS-rendered ESG dashboards.

3.3 — CLAIMWIRE

Claim Verification Engine (30%)

Extracts specific corporate claims from IR pages and press releases, then cross-references each against independent sources, fact-checkers, and public datasets. Returns per-claim confidence scores.

All three engines share a common interface: async def analyze(company: str) → EngineResult. This uniformity enables future engines to be plugged in without changes to the orchestration layer.

SECTION 3.4

SignalJobs — Multi-Source Hiring Intelligence Layer

Job postings are among the earliest publicly observable signals of a company's strategic direction — typically preceding press releases, earnings guidance, and analyst coverage by weeks. When a company quietly opens ten machine-learning roles while publicly describing an AI initiative as concluded, that discrepancy is operationally significant. SignalJobs captures this layer by aggregating live listings across LinkedIn, Indeed, Glassdoor, and company career pages in a single parallelized query, transforming raw hiring data into strategic intelligence.

Data Collection. SignalJobs uses Bright Data's SERP API to query all four sources simultaneously. Each query is constructed with the company name plus source-specific modifiers, returning result sets that include role title, seniority level, location, and posting recency. The SERP-based approach delivers sub-15-second response times without requiring a dedicated scraper per platform. A raw-URL fallback is implemented: when the SERP layer returns fewer than three results, the engine falls back to direct URL construction for each platform, ensuring results are never empty when the underlying web has data.

AI Layer — AI/ML API (gpt-4o). SignalJobs is the only engine in TruthForge that uses AI/ML API's gpt-4o model rather than Groq Llama 3.3-70B. This is a deliberate architectural choice: gpt-4o's stronger instruction-following makes it better suited for structured data transformation — taking heterogeneous raw SERP snippets from four sources with differing formats and normalizing them into clean, consistently shaped role listings. Groq Llama 3.3-70B remains the AI for all three core verification engines and the TruthScore synthesis step; AI/ML API is scoped strictly to the SignalJobs layer. For each role, gpt-4o also generates a per-role intelligence brief describing what that specific hire signals about the company's strategic direction — for example, a surge in DevSecOps postings suggesting an upcoming compliance certification, or executive CFO searches indicating pre-IPO preparation.

API Endpoints. Two endpoints expose the SignalJobs layer:

# Retrieve structured live listings for a company GET /api/jobs/{company} → { company, total_jobs, sources, jobs: [{ title, company, location, type, source, url }] } # Generate a per-role AI intelligence brief POST /api/jobs/detail body: { "job_title": "Senior DevSecOps Engineer", "company_name": "Tesla" } → { brief, signals, implications }

The GET /api/jobs/{company} endpoint aggregates all sources, deduplicates listings by title and location, and returns them sorted by recency. The POST /api/jobs/detail endpoint invokes gpt-4o with the role title and company context, returning a structured intelligence brief scoped to that hire's strategic implications. Briefs are generated on-demand, so the initial listing load remains fast regardless of the total number of roles returned.

Design Rationale. Separating SignalJobs from the three core engines rather than folding it into SignalForge preserves the TruthScore formula's integrity: job-posting volume is a leading indicator of business activity, but it does not directly measure corporate honesty or ESG compliance. Keeping it as a standalone intelligence layer means it can be queried independently, surfaced in the dashboard without triggering a full analysis, and extended with per-source deep-scraping in a future iteration without touching the core scoring pipeline.

SECTION 4

Bright Data Integration

Bright Data serves as TruthForge's data backbone. All five Bright Data product lines are used, each addressing a specific collection challenge:

TOOLUSE CASEENGINE(S)CALLS / ANALYSIS
SERP APINews discovery, controversy detection, analyst coverageSignalForge~15
Web UnlockerBypass bot protection on corporate and regulatory sitesAll engines~20
Scraping BrowserJavaScript-rendered ESG dashboards, dynamic investor portalsGreenwashGuard~8
MCP ServerDirect AI-to-Bright Data queries inside the Groq pipelineAll engines~5
Web Scraper APIStructured extraction of PDFs, sustainability reportsClaimWire~10
SERP APIMulti-source job listing aggregation (LinkedIn, Indeed, Glassdoor, careers)SignalJobs~4 / query

The Bright Data MCP Server integration is particularly significant: it allows Groq's Llama 3.3 model to query live web data mid-synthesis, rather than working only with pre-scraped static content. This creates a tighter, more accurate analysis loop.

SECTION 5

TruthScore Algorithm

TruthScore is a weighted composite of the three engine outputs. Weights were determined empirically based on signal reliability, latency, and correlation with known ground-truth cases:

TruthScore = (Signal × 0.35) + (ESG × 0.35) + (Claims × 0.30)
Signal: market reputation ESG: integrity score Claims: verification rate

Each engine score is normalized to the 0–100 range before aggregation. The final TruthScore is rounded to the nearest integer. Interpretation bands:

RANGELABELDESCRIPTIONRECOMMENDED ACTION
80 – 100 VERIFIED Strong evidence base, consistent claims, clean ESG record Proceed with confidence
60 – 79 TRUSTED Minor inconsistencies, largely verified, low controversy Proceed, monitor flagged areas
40 – 59 CAUTION Mixed signals, unverified ESG claims, controversy present Request additional disclosure
0 – 39 RISK Multiple red flags, greenwashing indicators, disputed claims Escalate for manual review
SECTION 6

Empirical Results

The following results were produced by the live TruthForge system against four publicly traded companies. Analysis was performed on 2026-05-26 using real-time Bright Data scrapes and Groq Llama 3.3-70B synthesis:

COMPANYSIGNALESGCLAIMSTRUTHSCOREVERDICT
Google 75 72 60 69 TRUSTED
Microsoft 59 60 67 61 TRUSTED
Ola 74 42 44 53 CAUTION
Tesla 74 42 28 48 CAUTION

Observation: Tesla's high signal score (74) indicates strong market presence, but the low claims verification score (28) reflects a high ratio of unverified corporate claims — consistent with public scrutiny of its manufacturing and safety disclosures. Google's balanced profile across all three dimensions supports its TRUSTED classification.

SECTION 7

Conclusion

TruthForge demonstrates that reliable enterprise verification is achievable in real time. By combining Bright Data's web intelligence infrastructure with Groq's inference speed and a multi-engine scoring architecture, the system delivers actionable trust scores without the latency or opacity of traditional ESG rating services.

The modular engine architecture is designed for extension: additional specialized engines (e.g., a FinancialForge engine for earnings claim verification, or a SupplyChain engine for Scope 3 emissions tracking) can be plugged in without changing the orchestration layer.

TruthForge's ultimate aim is to become the verification layer that enterprise software has been missing — a continuously updated, AI-native trust graph for the corporate world.

Ready to Explore TruthForge?

Start the backend and analyze your first company in under 30 seconds.