1. The problem keyword alerts can’t solve
Suppose you sell a tool that automates Reddit lead discovery. A keyword alert for “Reddit leads” will fire on every post that mentions Reddit and leads, regardless of whether the author wants to buy something or sell something. The signal-to-noise ratio collapses fast.
A lexical match doesn’t know the difference between “I’m looking for a tool to find Reddit leads” (a buyer) and “5 ways our tool finds Reddit leads” (a competitor’s blog post). A semantic match does. That’s the entire reason SignalPipe exists.
2. Anchor sentences as semantic targets
Each product configures 5–10 anchor sentences: short, natural-language examples of the buying intent it wants to detect. They’re written in the buyer’s voice, not the seller’s. Examples:
- “I need a tool to monitor Reddit for sales leads”
- “Looking for an AI sales agent that can find prospects automatically”
- “My outbound process is too manual and I want to automate lead discovery”
Mantidae embeds each anchor with OpenAI’s text-embedding-3-small and caches the vectors at product-load time. Incoming RSS / Reddit / HN posts are embedded the same way. The cosine similarity to the closest anchor becomes the embedding component of the signal score.
Why anchors instead of a single product description? Because real buyers don’t describe your features — they describe their pain. Several short anchors covering different framings of the same intent outperform a single long product blurb on every dataset we’ve tested.
3. Multi-factor geometric mean
Embedding similarity is necessary but not sufficient. A post can match an anchor semantically but still be irrelevant — a stale repost, a comment from someone with no audience, a sarcastic reply. Mantidae combines five components into one weighted geometric mean:
- Embedding match — cosine similarity to the closest anchor
- Keyword density — buy-signal keywords found in the post
- Freshness — exponential decay on age, faster on social platforms
- Engagement — comment / vote count where available
- Author reputation — follower / karma signal where available
We use a geometric mean instead of a weighted sum because it punishes weak components more aggressively — a post with a great embedding match but zero engagement and a stale timestamp shouldn’t score as high as a fresher, more-engaged post. The geometric mean enforces that all signals matter.
The output is a 0–100 content score — the truth signal before any post-processing.
4. The competitor-floor heuristic and its honesty cost
If a post mentions one of your configured competitor names, Mantidae enforces a minimum score of 75. Why: someone publicly evaluating your competitor is one of the highest-intent signals there is, and you don’t want a weak embedding match to bury it.
The cost: this rule fires for any mention of the competitor — including the competitor’s own marketers talking about their own product. To stay honest about it, Mantidae preserves the pre-floor content_score alongside the post-floor signal_score. The dashboard surfaces both numbers and visually flags missions where the gap between them is ≥ 30 points.
Crucially, the role assignment for the drafting swarm uses content_score, not signal_score. So a misclassified competitor mention surfaces in the queue (good — operator should still see it) but the draft stays value-first instead of getting upgraded to a closing pitch (good — we don’t want to send hard CTAs to the competitor’s own marketing team).
5. Reinforcement-learning feedback loop
Each product carries an rl_weight (default 1.0, clamped 0.5–2.0) that multiplies the raw signal score before the 50-point threshold. The weight is updated on every operator decision:
- Approve a mission → +0.05 (this lead type is worth pursuing)
- Reject a mission → −0.02 (this feed / lead type is noise)
The asymmetry is intentional. False negatives (missing a real prospect) cost much more than false positives (one extra mission to skim). We want the system to nudge generously toward surfacing things, and to back off only after sustained operator rejection.
6. The 9-prompt role-aware swarm
Drafting is handled by a 3-judge swarm: a Skeptic, an Analyst, and an Optimist, each running a different system prompt. They produce three drafts in parallel; we fuse them using the geometric mean of their self-scores and pick the highest-fused candidate.
The role-aware part: each lead’s content_score determines a role — closer (> 80), advisor (61–80), or educator (40–60) — and the role swaps in a different system prompt for each of the three judges. That’s a 3 × 3 = 9-prompt matrix. The voice differences:
- Closer — propose a specific concrete next step (demo, trial, link)
- Advisor — consultative, acknowledge the problem, introduce the product as a natural fit
- Educator — answer the question first, pitch only if it fits naturally
Char budgets are passed in too: 280 for Twitter replies, 500 for Reddit DMs, 300 for manual outreach. The swarm targets the budget at draft time — operators don’t inherit a draft that needs to be cut in half.
7. Failure modes we know about
Honest list. Most are mitigated; one or two are open.
- Sarcasm — embedding similarity can’t detect inverted intent. Mitigated by a sarcasm-detection unit-test pass on the sidecar before drafting.
- Stale reposts — same URL crossposted to 5 subreddits. Mitigated by the interactions ledger (unique on URL + product).
- Competitor-marketer false positives — discussed in §4. Surfaces in queue but doesn’t produce a hard pitch.
- Bot accounts and karma farms — open. Author-reputation component helps but doesn’t fully solve. Operator review is the backstop.
- Cross-product anchor leakage — a generic anchor like “I need a sales tool” will match too broadly. Best practice: write anchors as specific as the product warrants.