Today in Generative Media

AI divides SXSW; Suno is ChatGPT for music; YouTubers must label AI content

Mar 19, 2024

A Tale of Two SXSWs: An AI Divide So Wide You Could Drive a Film Industry Through It (IndieWire)
A ChatGPT for Music Is Here. Inside Suno, the Startup Changing Everything (Rolling Stone) (More details in this X thread, if you don’t wanna subscribe.)
Hey YouTube creators, it’s time to start labeling AI-generated content in your videos (CNN Business)
Apple Is in Talks to Let Google Gemini Power iPhone AI Features (Bloomberg)
Musk’s Grok AI goes open source (VentureBeat)
Story.com: Everyone Has A Story. What's Yours? Storytelling Meets AI
IP Composition Adapter: This adapter for Stable Diffusion 1.5 is designed to inject the general composition of an image into the model while mostly ignoring the style and content. Meaning a portrait of a person waving their left hand will result in an image of a completely different person waving with their left hand. (HuggingFace)
✨ For the last few months I have been reverse engineering Magnific AI's famous upscaler. It uses MultiDiffusion, ControlNet tiles and details LoRas. In true AI spirit, I am open sourcing it for everyone to use for free in your apps. (X) Code on GitHub. API on Replicate.
MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai (GitHub)
DragAnything: Motion Control for Anything using Entity Representation (project page)
FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model (project page)
MusicHiFi: Fast High-Fidelity Stereo Vocoding (project page)
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis (project page)
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models (project page)
Introducing Stable Video 3D: Quality Novel View Synthesis and 3D Generation from Single Images (Stability blog)
Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple. (X) Paper on arXiv.
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews (arXiv)
Text -> Image -> 3D -> Retexturing with https://cube.csm.ai (X)
I developed a workflow that allows you to render ANY 3D scene in ANY style with AI! (X)
AI dubbing is getting scary good (X)
“How many GPUs did Jensen promise you?” (X)
A timeless reminder from Antonioni: “A film that can be described in words is not really a film.” (X)

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts