Today in Generative Media
Meta AI to get celeb voices; Fake AI podcasters; Will AI bust? States vs. election AI
News
Fake AI “podcasters” are reviewing my book and it’s freaking me out (Ars Technica)
Will A.I. Be a Bust? A Wall Street Skeptic Rings the Alarm. (New York Times)
Half of U.S. states seek to crack down on AI in elections (Axios)
NC governor candidate cries AI fabrication as defense for racist porn forum posts (Ars Technica)
New Cloudflare Tools Let Sites Detect and Block AI Bots for Free (Wired)
AI can generate recipes that can be deadly. Food bloggers are not happy (NPR)
Civitai Gen-AI Makes Its Move (Forbes)
ByteDance unveils 2 new video-generation AI models to narrow gap with OpenAI’s Sora (South China Morning Post)
Software
PDF to Audio Converter This code can be used to convert PDFs into audio podcasts, lectures, summaries, and more (GitHub) Demo on HuggingFace.
We've increased the size of our NVIDIA A100 fleet for paid users by around 2x, and for the last several days we've seen 100% success rate for users requesting A100s. (Google Colaboratory on X)
Diffusers Outpaint now allows for infinite zoom-out with a resize input size + "use as input" button (X)
Welcome to OpenMusic, a next-gen diffusion model designed to generate high-quality music audio from text descriptions! (HuggingFace)
Research
V3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians (SIGGRAPH Asia 2024, project page)
PortraitGen Portrait Video Editing Empowered by Multimodal Generative Priors (SIGGRAPH Asia 2024, project page)
MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting (SIGGRAPH Asia 2024, project page)
Colorful Diffuse Intrinsic Image Decomposition in the Wild (project page)
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors (project page)
Synergy and Synchrony in Couple Dances (project page)
Are Pose Estimators Ready for the Open World? STAGE: Synthetic Data Generation Toolkit for Auditing 3D Human Pose Estimators (project page)
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation (project page). Code on GitHub. Demo on HuggingFace.
SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality (project page)
MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting (arXiv)
Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects (arXiv)