Today in Generative Media
AI divides SXSW; Suno is ChatGPT for music; YouTubers must label AI content
A Tale of Two SXSWs: An AI Divide So Wide You Could Drive a Film Industry Through It (IndieWire)
A ChatGPT for Music Is Here. Inside Suno, the Startup Changing Everything (Rolling Stone) (More details in this X thread, if you don’t wanna subscribe.)
Hey YouTube creators, it’s time to start labeling AI-generated content in your videos (CNN Business)
Apple Is in Talks to Let Google Gemini Power iPhone AI Features (Bloomberg)
Musk’s Grok AI goes open source (VentureBeat)
Story.com: Everyone Has A Story. What's Yours? Storytelling Meets AI
✨ For the last few months I have been reverse engineering Magnific AI's famous upscaler. It uses MultiDiffusion, ControlNet tiles and details LoRas. In true AI spirit, I am open sourcing it for everyone to use for free in your apps. (X) Code on GitHub. API on Replicate.
MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai (GitHub)
DragAnything: Motion Control for Anything using Entity Representation (project page)
FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model (project page)
MusicHiFi: Fast High-Fidelity Stereo Vocoding (project page)
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis (project page)
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models (project page)
Introducing Stable Video 3D: Quality Novel View Synthesis and 3D Generation from Single Images (Stability blog)
Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple. (X) Paper on arXiv.
Text -> Image -> 3D -> Retexturing with https://cube.csm.ai (X)
I developed a workflow that allows you to render ANY 3D scene in ANY style with AI! (X)
A timeless reminder from Antonioni: “A film that can be described in words is not really a film.” (X)