Spoken Archive Engine

Project Overview

The Spoken Archive Engine is a grounded research interface for long-form spoken-word collections. It makes audio archives searchable by meaning and returns answers supported by selected transcript evidence.

The problem

Long-form spoken-word archives — radio programmes, oral histories, public hearings, lectures, and interviews — contain large amounts of knowledge locked inside hours of audio. Traditional search often fails because the material is not easily searchable by meaning, theme, or public concern.

The result is a gap between what the archive contains and what researchers, journalists, institutions, and the public can practically retrieve.

What the system does

The Spoken Archive Engine turns spoken-word collections into evidence-aware research archives. A user asks a plain-language question. The system retrieves relevant passages, generates a grounded answer, and displays the evidence behind that answer.

The product is designed to feel like consulting a careful research archive, not like using a chatbot or generic search tool.

The first collection

The first public collection is Down to Brass Tacks, a Barbadian radio call-in programme. The broader source archive spans 2021–2026 and includes roughly 1,177 episodes.

The current public queryable collection covers Down to Brass Tacks episodes from 2024/5 to early April 2026. Full-collection expansion remains ongoing.

How it works

Audio goes in. Raw episode recordings are stored in cloud storage.
Transcription and processing. Episodes are transcribed, normalized, segmented, and enriched.
Meaning is indexed. Each segment is embedded and loaded into a semantic retrieval layer.
Questions receive grounded answers. Retrieved passages are used to generate answers with visible archive evidence.

Why the architecture matters

Traceability. Answers are connected to retrieved evidence rather than free-form generation.
Repeatability. The same pipeline can process future episodes and future collections consistently.
Scalability. The first collection is Down to Brass Tacks, but the platform is designed for additional spoken-word collections.
Cost discipline. Infrastructure choices are guided by cost and operational practicality.

What makes it different

The system is grounded, collection-aware, and archival in tone. It avoids chat-style interaction and avoids presenting generated answers as definitive truth claims. Retrieved passages are called evidence, not citations, because the product’s purpose is to show what the archive supports.

Who it is for

Researchers and journalists working with public discourse.
Institutions and analysts examining civic, economic, or policy themes.
Archives and media organizations making spoken-word collections searchable.