TruthForge is a real-time enterprise verification system that aggregates, cross-references, and scores corporate claims using three independent AI engines — SignalForge, GreenwashGuard, and ClaimWire — all powered by Bright Data's web intelligence infrastructure and Groq's Llama 3.3-70B model. The system produces a single composite TruthScore (0–100) enabling procurement teams, fund managers, compliance officers, and analysts to make data-backed decisions in under 30 seconds per company.
Keywords: ESG verification, AI trust scoring, greenwashing detection, corporate intelligence, Bright Data MCP, Groq Llama 3.3, FastAPI, Celery, Supabase, AI/ML API (gpt-4o)
Modern enterprises face a critical asymmetry of information. Corporate sustainability claims, financial disclosures, and ESG commitments are largely self-reported with minimal independent verification. The result is a $1.2 trillion annual loss from greenwashing fraud, with 40% of ESG claims found misleading by EU regulators in 2024.
Existing solutions fall short: manual due diligence consumes 40+ analyst hours per company; third-party rating agencies lag months behind real events; and 67% of financial analysts report distrust in current ESG ratings. There is no real-time, AI-native verification layer for enterprise decision-making.
TruthForge solves this by automating verification across three independent analytical dimensions, delivering a live trust score backed by crawled public evidence — not delayed surveys or self-submitted data.
TruthForge is built on a fully async Python backend using FastAPI + Uvicorn. Analysis requests trigger parallel execution of three independent engines via asyncio.gather(), each powered by Bright Data for data collection and Groq for AI synthesis. Results are persisted to Supabase (PostgreSQL), with Celery + Redis handling scheduled re-analysis and alerting.
The architecture is horizontally scalable: additional Celery workers can be spun up to parallelize multi-company batch analyses. The Supabase layer provides row-level security, real-time subscriptions, and automatic backups without infrastructure overhead.
Each engine operates independently, using Bright Data for data collection and Groq Llama 3.3-70B for AI-powered analysis. Engines are weighted asymmetrically based on signal reliability and verifiability.
3.1 — SIGNALFORGE
Market Signal Engine (35%)
Aggregates news, press releases, analyst reports. Uses SERP API for discovery, Web Unlocker for full-text access. Groq synthesizes sentiment polarity and controversy density into a 0–100 signal score.
3.2 — GREENWASHGUARD
ESG Integrity Engine (35%)
Validates ESG disclosures against third-party audits, regulatory filings, and carbon registry databases. Flags claim-to-reality gaps. Scraping Browser handles JS-rendered ESG dashboards.
3.3 — CLAIMWIRE
Claim Verification Engine (30%)
Extracts specific corporate claims from IR pages and press releases, then cross-references each against independent sources, fact-checkers, and public datasets. Returns per-claim confidence scores.
All three engines share a common interface: async def analyze(company: str) → EngineResult. This uniformity enables future engines to be plugged in without changes to the orchestration layer.
Job postings are among the earliest publicly observable signals of a company's strategic direction — typically preceding press releases, earnings guidance, and analyst coverage by weeks. When a company quietly opens ten machine-learning roles while publicly describing an AI initiative as concluded, that discrepancy is operationally significant. SignalJobs captures this layer by aggregating live listings across LinkedIn, Indeed, Glassdoor, and company career pages in a single parallelized query, transforming raw hiring data into strategic intelligence.
Data Collection. SignalJobs uses Bright Data's SERP API to query all four sources simultaneously. Each query is constructed with the company name plus source-specific modifiers, returning result sets that include role title, seniority level, location, and posting recency. The SERP-based approach delivers sub-15-second response times without requiring a dedicated scraper per platform. A raw-URL fallback is implemented: when the SERP layer returns fewer than three results, the engine falls back to direct URL construction for each platform, ensuring results are never empty when the underlying web has data.
AI Layer — AI/ML API (gpt-4o). SignalJobs is the only engine in TruthForge that uses AI/ML API's gpt-4o model rather than Groq Llama 3.3-70B. This is a deliberate architectural choice: gpt-4o's stronger instruction-following makes it better suited for structured data transformation — taking heterogeneous raw SERP snippets from four sources with differing formats and normalizing them into clean, consistently shaped role listings. Groq Llama 3.3-70B remains the AI for all three core verification engines and the TruthScore synthesis step; AI/ML API is scoped strictly to the SignalJobs layer. For each role, gpt-4o also generates a per-role intelligence brief describing what that specific hire signals about the company's strategic direction — for example, a surge in DevSecOps postings suggesting an upcoming compliance certification, or executive CFO searches indicating pre-IPO preparation.
API Endpoints. Two endpoints expose the SignalJobs layer:
The GET /api/jobs/{company} endpoint aggregates all sources, deduplicates listings by title and location, and returns them sorted by recency. The POST /api/jobs/detail endpoint invokes gpt-4o with the role title and company context, returning a structured intelligence brief scoped to that hire's strategic implications. Briefs are generated on-demand, so the initial listing load remains fast regardless of the total number of roles returned.
Design Rationale. Separating SignalJobs from the three core engines rather than folding it into SignalForge preserves the TruthScore formula's integrity: job-posting volume is a leading indicator of business activity, but it does not directly measure corporate honesty or ESG compliance. Keeping it as a standalone intelligence layer means it can be queried independently, surfaced in the dashboard without triggering a full analysis, and extended with per-source deep-scraping in a future iteration without touching the core scoring pipeline.
Bright Data serves as TruthForge's data backbone. All five Bright Data product lines are used, each addressing a specific collection challenge:
| TOOL | USE CASE | ENGINE(S) | CALLS / ANALYSIS |
|---|---|---|---|
| SERP API | News discovery, controversy detection, analyst coverage | SignalForge | ~15 |
| Web Unlocker | Bypass bot protection on corporate and regulatory sites | All engines | ~20 |
| Scraping Browser | JavaScript-rendered ESG dashboards, dynamic investor portals | GreenwashGuard | ~8 |
| MCP Server | Direct AI-to-Bright Data queries inside the Groq pipeline | All engines | ~5 |
| Web Scraper API | Structured extraction of PDFs, sustainability reports | ClaimWire | ~10 |
| SERP API | Multi-source job listing aggregation (LinkedIn, Indeed, Glassdoor, careers) | SignalJobs | ~4 / query |
The Bright Data MCP Server integration is particularly significant: it allows Groq's Llama 3.3 model to query live web data mid-synthesis, rather than working only with pre-scraped static content. This creates a tighter, more accurate analysis loop.
TruthScore is a weighted composite of the three engine outputs. Weights were determined empirically based on signal reliability, latency, and correlation with known ground-truth cases:
Each engine score is normalized to the 0–100 range before aggregation. The final TruthScore is rounded to the nearest integer. Interpretation bands:
| RANGE | LABEL | DESCRIPTION | RECOMMENDED ACTION |
|---|---|---|---|
| 80 – 100 | VERIFIED | Strong evidence base, consistent claims, clean ESG record | Proceed with confidence |
| 60 – 79 | TRUSTED | Minor inconsistencies, largely verified, low controversy | Proceed, monitor flagged areas |
| 40 – 59 | CAUTION | Mixed signals, unverified ESG claims, controversy present | Request additional disclosure |
| 0 – 39 | RISK | Multiple red flags, greenwashing indicators, disputed claims | Escalate for manual review |
The following results were produced by the live TruthForge system against four publicly traded companies. Analysis was performed on 2026-05-26 using real-time Bright Data scrapes and Groq Llama 3.3-70B synthesis:
| COMPANY | SIGNAL | ESG | CLAIMS | TRUTHSCORE | VERDICT |
|---|---|---|---|---|---|
| 75 | 72 | 60 | 69 | TRUSTED | |
| Microsoft | 59 | 60 | 67 | 61 | TRUSTED |
| Ola | 74 | 42 | 44 | 53 | CAUTION |
| Tesla | 74 | 42 | 28 | 48 | CAUTION |
Observation: Tesla's high signal score (74) indicates strong market presence, but the low claims verification score (28) reflects a high ratio of unverified corporate claims — consistent with public scrutiny of its manufacturing and safety disclosures. Google's balanced profile across all three dimensions supports its TRUSTED classification.
TruthForge demonstrates that reliable enterprise verification is achievable in real time. By combining Bright Data's web intelligence infrastructure with Groq's inference speed and a multi-engine scoring architecture, the system delivers actionable trust scores without the latency or opacity of traditional ESG rating services.
The modular engine architecture is designed for extension: additional specialized engines (e.g., a FinancialForge engine for earnings claim verification, or a SupplyChain engine for Scope 3 emissions tracking) can be plugged in without changing the orchestration layer.
TruthForge's ultimate aim is to become the verification layer that enterprise software has been missing — a continuously updated, AI-native trust graph for the corporate world.
Start the backend and analyze your first company in under 30 seconds.