DOT Data Labs
Dataset

DatingProfiles-100K

100,000 de-identified dating-app profiles with free-text bios and structured attributes.

100,000
Profiles
180 words
Avg bio length
EN, ES, PT
Languages
NA, EU, LATAM (coarsened)
Regions

A de-identified corpus of 100K dating-app profiles for matching, personalization, and conversational AI work. Each record includes a free-text self-description, a structured attribute set (age band, location coarsened to region, interests, prompt responses), and engagement features (response rate buckets). All PII removed; consent + ToS chain documented.

Tags

DatingProfilesPersonalizationMatchingNLP

Delivery formats

  • JSONL
  • CSV
  • Parquet

License

Commercial AI training license, perpetual. PII removed at source; no re-identification.

Data sample

What a record looks like

Sample profile record

JSONIllustrative — full sample available under NDA
{
  "profile_id": "prf_8a3f...",
  "age_band": "25-34",
  "region": "NA-Northeast",
  "bio": "Engineer who loves bouldering and bad puns. Looking for someone who picks the wine.",
  "interests": ["climbing","travel","cooking"],
  "prompts": [{"q":"Two truths and a lie","a":"..."}],
  "engagement": {"response_rate":"medium"}
}
← Back to all datasets