Meta has released an “open” implementation of the viral generate-a-podcast feature in Google’s NotebookLM.
Called NotebookLlama, the project uses Meta’s own Llama models for much of the processing, unsurprisingly. Like NotebookLM, it can generate back-and-forth, podcast-style digests of text files uploaded to it.
NotebookLlama first creates a transcript from a file — e.g. a PDF of a news article or blog post. Then, it adds “more dramatization” and interruptions before feeding the transcript to open text-to-speech models.
The results don’t sound nearly as good as NotebookLM. In the NotebookLlama samples I’ve listened to, the voices have a very obviously robotic quality to them, and tend to talk over each other at odd points.
But the Meta researchers behind the project say that the quality could be improved with stronger models.
“The text-to-speech model is the limitation of how natural this will sound,” they wrote on NotebookLlama’s GitHub page. “[Also,] another approach of writing the podcast would be having two agents debate the topic of interest and write the podcast outline. Right now we use a single model to write the podcast outline.”
NotebookLlama isn’t the first attempt to replicate NotebookLM’s podcast feature. Some projects have had more success than others. But none — not even NotebookLM itself — have managed to solve the hallucination problem that dogs all AI. That is to say, AI-generated podcasts are bound to contain some made-up stuff.