This weekend in Generative Media

Penguin's copyright page forbids AI; Midjourney plans web AI editor; An AI feature film

Oct 21, 2024

News

Penguin Random House books now explicitly say ‘no’ to AI training (The Verge)
Midjourney plans to let anyone on the web edit images with AI (TechCrunch)
‘Where The Robots Grow’ Is AI’s First Feature (Forbes)
Ted Sarandos Says After “Lots Of Hype” About AI, Key Question Remains: “Can It Help Make Better Shows And Films?” (Deadline)
The AI opt-out models Meta, Musk's X, and the UK gov are proposing are simply not a good enough way for us to protect ourselves from data scraping (PC Gamer)
This Prompt Can Make an AI Chatbot Identify and Extract Personal Details From Your Chats (Wired)

Software

Research

Look Ma, no markers: Holistic performance capture without the hassle (SIGGRAPH Asia 2024, project page)
One-Step Diffusion via Shortcut Models (project page)
UniCon: A Simple Approach to Unifying Diffusion-based Conditional Generation (project page)
FlexGen: Flexible Multi-View Generation from Text and Image Inputs (project page)
ControlMM: Controllable Masked Motion Generation (project page)
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation (project page)
SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing (project page)
MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields (arXiv)
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices (arXiv)
Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting (arXiv)
GlossyGS: Inverse Rendering of Glossy Objects with 3D Gaussian Splatting (arXiv)
RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models (arXiv)

Misc

Tutorial: Rectified Flow (Qiang Liu, University of Texas Austin)
gpt-4o-audio-preview audio input is currently 1,066 times more expensive than Google Gemini 1.5 Flash 8B audio input! (X)
ChatGPT can now create Mind Maps. Here’s how to do it for free in a few seconds (X)
I can confirm that with system prompt engineering and a high temperature, OpenAI's new gpt-4o-audio-preview model can be instructed to generate voices and any vocal style. (X)
Jailbroken, R-rated NotebookLM is definitely something different. But it's interesting to listen to. (X) [NSFW language]

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts