Open data · CC BY 4.0

SignalPipe buying-intent corpus

An anonymized dataset of buying-intent signals from public Reddit, Hacker News, and RSS feeds, scored by the SignalPipe pipeline. Free to download, free to cite, free to train on.

What’s in it

  • ·Hashed post identifiers — SHA-256 of the original URL. No raw URLs, no titles, no body text.
  • ·Source platform — rss / reddit / hn (the subreddit / feed name is preserved as a coarse category).
  • ·All five component scores — embedding match, keyword density, freshness, engagement, author reputation — each 0–100.
  • ·content_score and signal_score — pre- and post-floor.
  • ·Assigned role — closer / advisor / educator.
  • ·Operator outcome (where available) — approved / rejected / sent / replied / ignored, with the rejection reason if rejected.

What’s NOT in it

  • · No raw post text, titles, or URLs.
  • · No author handles, follower counts, or any field that could re-identify a poster.
  • · No operator-level data (which SignalPipe customer scored which lead).
  • · No anchor sentences or product configurations.

Get the dataset

The first release is being prepared for upload to Zenodo with a permanent DOI. To be notified when it goes live, join the waitlist and select “dataset” as your interest, or email hello@signalpipe.io.

Notify me when available

Citation

SignalPipe (2026). SignalPipe Buying-Intent Corpus.
Released under CC BY 4.0. https://signalpipe.io/dataset
DOI: pending Zenodo upload

License

Released under Creative Commons Attribution 4.0 International (CC BY 4.0). You can copy, redistribute, remix, and build upon the material for any purpose, including commercial — as long as you give appropriate credit and indicate if changes were made.

See methodology for how the scores are computed, or /signals for the weekly aggregate trend digest.