DOT Data Labs
Dataset

MeetingHours-100K

100,000 hours of consented business meeting audio with verbatim transcripts and speaker labels.

100,000
Hours
~180,000
Meetings
4.2 per meeting
Avg speakers
EN (primary), ES, FR, DE
Languages

Long-form business meeting audio recorded with explicit participant consent. Each meeting ships with verbatim transcripts, speaker diarization, role labels (host / presenter / participant), and meeting-type metadata (sales call, standup, customer success, board, etc.). Useful for meeting AI, summarization, action-item extraction, and ASR fine-tuning.

Tags

SpeechMeetingsDiarizationTranscriptsASRSummarization

Delivery formats

  • WAV
  • FLAC
  • JSON transcripts
  • RTTM diarization

License

Commercial AI training license, perpetual, with consent on record per participant.

Data sample

What a record looks like

Sample meeting transcript segment

JSONIllustrative — full sample available under NDA
{
  "meeting_id": "mtg_2024_q3_18204",
  "meeting_type": "sales_discovery",
  "duration_sec": 2415,
  "speakers": ["S1","S2","S3"],
  "segments": [
    {"t":12.4,"speaker":"S1","text":"Thanks for jumping on. Can you walk me through your current pipeline?"}
  ]
}
← Back to all datasets