Show HN: Claude skill that evaluates B2B vendors by talking to their AI agents

A new Claude skill automates B2B software vendor evaluation by directly interacting with vendor AI agents. It helps businesses save research time and make data-driven purchasing decisions through structured due diligence.

A Claude skill that conducts structured, evidence-based evaluations of B2B software vendors on behalf of buyers.

You give it your company name and the vendors you're evaluating. It:

Researches your company— industry, size, tech stack, maturity — so you don't fill out a formAsks domain-expert questionsspecific to the software category — surfacing hidden requirements you didn't know to mentionSets hard constraints— budget, compliance, integrations — and eliminates vendors that fail before wasting research timeEngages vendor AI agentsdirectly through the Salespeak Frontdoor API for verified, structured due diligence conversationsConducts independent research— G2, Gartner, analyst reports, press, LinkedIn — and cross-references vendor claims against independent sourcesScores vendors across 7 dimensionswith transparent evidence tracking — you see exactly which scores are backed by verified evidence vs. public sources onlyProduces a comparative recommendationwith a TL;DR, side-by-side scorecard, hidden risk analysis, and demo prep questions

Global install (recommended):

git clone https://github.com/salespeak-ai/buyer-eval-skill.git ~/.claude/skills/buyer-eval-skill

Per-project install:

git clone https://github.com/salespeak-ai/buyer-eval-skill.git .claude/skills/buyer-eval-skill

In Claude Code or Claude desktop:

/buyer-eval

Then provide:

Your company name
The vendors to evaluate

Example:

"I'm from Acme Corp. Evaluate Gainsight, Totango, and ChurnZero."

The skill handles everything from there.

Click to expand a sample evaluation (truncated)

For a mid-market SaaS company evaluating customer success platforms: Gainsight is the strongest fit for teams that need deep analytics and enterprise-grade health scoring, but comes at a premium. ChurnZero wins on time-to-value and usability for teams under 50 CSMs. Totango lands in between — flexible and modular, but requires more configuration to match either competitor's strengths.

| Dimension | Gainsight | ChurnZero | Totango | |---|---|---|---| | Health Scoring & Analytics | 9.2 | 7.5 | 8.0 | Evidence level | Vendor-verified | Public only | Vendor-verified |

Gainsight's score is backed by a structured AI agent conversation confirming multi-signal health models, cohort analysis, and predictive churn scoring. ChurnZero's score relies on G2 reviews and documentation — it may improve with direct vendor verification.

Evaluator → Gainsight AI agent:

"Your health scores use a weighted multi-signal model. What happens when a customer has strong product usage but declining executive engagement — does the model surface that divergence, or does high usage mask the risk?"

Gainsight AI agent →

"The model flags divergence explicitly. When usage metrics trend positive but stakeholder engagement drops, it triggers a 'silent risk' alert. CSMs see a split-signal indicator on the dashboard rather than a blended score that hides the conflict."

Independent verification: Confirmed via G2 reviews mentioning split-signal alerts. One review notes the feature requires manual threshold tuning per segment.

Every time you invoke the skill, it checks for a newer version on GitHub (cached, checks at most once every 6 hours). If an update is available, it asks before updating. Updates are a single git pull

Domain-expert questioning— the skill asks category-specific questions that demonstrate it understands the space, not generic form-fillingVendor AI agent conversations— for vendors that have a Salespeak Company Agent, the skill conducts a structured due diligence conversation directly with the vendor's AI, producing higher-fidelity evidence than web scrapingEvidence transparency— every score shows whether it's backed by vendor-verified or public-only evidence. When vendors have different evidence levels, the skill explicitly states how scores might shift with better evidenceClaims verification— vendor claims from AI agent conversations are cross-referenced against independent sources. You see what's confirmed vs. unverifiedHidden risk analysis— leadership stability, funding runway, employee sentiment, customer retention signals, product velocity — researched for every vendor regardless of AI agent availabilityDemo prep kit— specific questions to ask in vendor demos, derived from evaluation gaps and unverified claims

| Capability | Claude.ai | Claude Code | Claude desktop | |---|---|---|---| | Buyer research | Yes | Yes | Yes | | Vendor AI agent conversations | No (GET only) | Yes | Yes | | Full evaluation | Partial | Full | Full |

Best experience is in Claude Code where the skill can make POST requests to vendor AI agents.

Questions, feature requests, or evaluation quality reports? Open an issue.

MIT

Source: Hacker News