SignalPipe buying-intent corpus
An anonymized dataset of buying-intent signals from public Reddit, Hacker News, and RSS feeds, scored by the SignalPipe pipeline. Free to download, free to cite, free to train on.
What’s in it
- ·Hashed post identifiers — SHA-256 of the original URL. No raw URLs, no titles, no body text.
- ·Source platform — rss / reddit / hn (the subreddit / feed name is preserved as a coarse category).
- ·All five component scores — embedding match, keyword density, freshness, engagement, author reputation — each 0–100.
- ·content_score and signal_score — pre- and post-floor.
- ·Assigned role — closer / advisor / educator.
- ·Operator outcome (where available) — approved / rejected / sent / replied / ignored, with the rejection reason if rejected.
What’s NOT in it
- · No raw post text, titles, or URLs.
- · No author handles, follower counts, or any field that could re-identify a poster.
- · No operator-level data (which SignalPipe customer scored which lead).
- · No anchor sentences or product configurations.
Get the dataset
The first release is being prepared for upload to Zenodo with a permanent DOI. To be notified when it goes live, join the waitlist and select “dataset” as your interest, or email hello@signalpipe.io.
Notify me when availableCitation
SignalPipe (2026). SignalPipe Buying-Intent Corpus. Released under CC BY 4.0. https://signalpipe.io/dataset DOI: pending Zenodo upload
License
Released under Creative Commons Attribution 4.0 International (CC BY 4.0). You can copy, redistribute, remix, and build upon the material for any purpose, including commercial — as long as you give appropriate credit and indicate if changes were made.
See methodology for how the scores are computed, or /signals for the weekly aggregate trend digest.