Skip to content

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

License

Notifications You must be signed in to change notification settings

souzatharsis/podcastfy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Podcastfy.ai πŸŽ™οΈπŸ€–

An Open Source API alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

podcastfy.mp4

Paper | Python Package | CLI | REST API | Web App | Feedback

Open In Colab PyPi Status PyPI Downloads Issues Pytest Docker Documentation Status License GitHub Repo stars

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, images, YouTube videos, as well as user provided topics.

Unlike closed-source UI-based tools focused primarily on research synthesis (e.g. NotebookLM ❀️), Podcastfy focuses on open source, programmatic and bespoke generation of engaging, conversational content from a multitude of multi-modal sources, enabling customization and scale.

Star History Chart

Audio Examples πŸ”Š

This sample collection is also available at audio.com.

Images

Image Set Description Audio
Senecio, 1922 (Paul Klee) Connection of Civilizations (2017) by Gheorghe Virtosu Senecio, 1922 (Paul Klee) and Connection of Civilizations (2017) by Gheorghe Virtosu πŸ”Š
The Great Wave off Kanagawa, 1831 (Hokusai) Takiyasha the Witch and the Skeleton Spectre, c. 1844 (Kuniyoshi) The Great Wave off Kanagawa, 1831 (Hokusai) and Takiyasha the Witch and the Skeleton Spectre, c. 1844 (Kuniyoshi) πŸ”Š
Taylor Swift Mona Lisa Pop culture icon Taylor Swift and Mona Lisa, 1503 (Leonardo da Vinci) πŸ”Š

Text

Content Type Description Audio Source
Youtube Video YCombinator on LLMs Audio YouTube
PDF Book: Networks, Crowds, and Markets Audio book pdf
Research Paper Climate Change in France Audio PDF
Website My Personal Website Audio Website
Website + YouTube My Personal Website + YouTube Video on AI Audio Website, YouTube

Multi-Lingual Text

Language Content Type Description Audio Source
French Website Agroclimate research information Audio Website
Portuguese-BR News Article Election polls in SΓ£o Paulo Audio Website

Features ✨

  • Generate conversational content from multiple sources and formats (images, websites, YouTube, and PDFs).
  • Generate shorts (2-5 minutes) or longform (30+ minutes) podcasts.
  • Customize transcript and audio generation (e.g., style, language, structure).
  • Generate transcripts using 100+ LLM models (OpenAI, Anthropic, Google etc).
  • Leverage local LLMs for transcript generation for increased privacy and control.
  • Integrate with advanced text-to-speech models (OpenAI, Google, ElevenLabs, and Microsoft Edge).
  • Provide multi-language support for global content creation.
  • Integrate seamlessly with CLI and Python packages for automated workflows.

Built with Podcastfy πŸš€

Updates πŸš€

v0.3.6+ release

  • Generate shorts or longform podcasts!
  • Generate podcasts from input topic using real-time internet search
  • Integrate with 100+ LLM models (OpenAI, Anthropic, Google etc) for transcript generation
  • Integrate with Google's Multispeaker TTS model for high-quality audio generation

See CHANGELOG for more details.

Quickstart πŸ’»

Prerequisites

  • Python 3.11 or higher
  • $ pip install ffmpeg (for audio processing)

Setup

  1. Install from PyPI $ pip install podcastfy

  2. Set up your API keys

Python

from podcastfy.client import generate_podcast

audio_file = generate_podcast(urls=["<url1>", "<url2>"])

CLI

python -m podcastfy.client --url <url1> --url <url2>

Usage πŸ’»

Experience Podcastfy with our HuggingFace πŸ€— Spaces app. (Note: This UI app is less extensively tested than the Python package.)

Customization πŸ”§

Podcastfy offers a range of customization options to tailor your AI-generated podcasts:

License

This software is licensed under Apache 2.0. Here are a few instructions if you would like to use podcastfy in your software.

Contributing 🀝

We welcome contributions! See Guidelines for more details.

Example Use Cases 🎧🎢

  • Content Creators can use Podcastfy to convert blog posts, articles, or multimedia content into podcast-style audio, enabling them to reach broader audiences. By transforming content into an audio format, creators can cater to users who prefer listening over reading.

  • Educators can transform lecture notes, presentations, and visual materials into audio conversations, making educational content more accessible to students with different learning preferences. This is particularly beneficial for students with visual impairments or those who have difficulty processing written information.

  • Researchers can convert research papers, visual data, and technical content into conversational audio. This makes it easier for a wider audience, including those with disabilities, to consume and understand complex scientific information. Researchers can also create audio summaries of their work to enhance accessibility.

  • Accessibility Advocates can use Podcastfy to promote digital accessibility by providing a tool that converts multimodal content into auditory formats. This helps individuals with visual impairments, dyslexia, or other disabilities that make it challenging to consume written or visual content.

Contributors

contributors

↑ Back to Top ↑

About

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Languages