Playing with Notebook LM
10/10/2024 1:38 pm | : 6 mins. | Share to:
So, I have been playing around with Notebook LM (requires Google account) recently. The idea is you can give it a series of files and it can answer questions and do things specific to the input documentation. The real "wow" is that it can generate a conversation "podcast" to let you listen to a simulated conversation about the material. The AI voice gen isn't perfect, but it's lightyears ahead of most text-to-speech, even managing to insert filler "ums" and also adjust the tone of the speakers to a certain degree.
My first exploration with it was to upload some worldbuilding and plot documentation from a fantasy world of mine and see what conversation it generated. It did a summary of the world and some of its key features etc. It lost the plot on the larger storyline I had crafted, but it was entertaining to hear.
This morning I conducted an experiment, the version linked above is the Google version. There is also a non-Google implementation, making use of a Llama driven backend for processing the source file, called Open NotebookLM. For this, I took a PDF of a recent article on GQ about the restoration of the roof of Notre Dame.
First, here is the audio generated by Google's Notebook LM:
Now, here is the output of Open Notebook LM:
Open Notebook's audio is noticeably worse, and the length limitations from the freely available online via huggingface.com, definitely make Google's implementation better. But it's still quite listenable though it feels more like a fluff piece compared to the article, to me. Also, I think it is interesting, Open Notebook pronounces Notre Dame correctly, while Google may as well be talking about the university.
A few years ago I wrote a python tool that pulled my "Watch Later" YouTube playlist and converted them into an audio podcast feed, which was useful for commuting and consuming lecture or podcast style videos from YT. I don't use it anymore since I no longer have the same commute time. But I can imagine a similar tool which pulls unread articles from my to-be-read Wallabag and runs one through Open Notebook LM, then uploads and creates the RSS feed for the generated file.
I don't know how much I'd use it, and I am not yet confident it would be enough content from the source that I would feel I got informed enough. But I can easily believe the quality and depth will be coming in the near future. We'll see.
I was also toying with the idea of generating a personal daily overview/summary and feeding that to the system. Have a sort of personalized morning show that discussed your schedule for the day, headlines, last night's sports scores, upcoming appointments, etc. I might write a concept of what could go into that file and see what Open Notebook spits out, that might make the project more interesting to explore.
Lastly, writing entry forced me to do some Glowbug code changes to allow it to properly handle when I upload audio files to the blog. Up to now it's been strictly for uploading images. While not ideal to have writing stopped for the need of programming, I do appreciate being able to build what I need and have it immediately put to use.