There's still a lot of work to be done. It's good at making short individual scenes but when you start trying to string them together the wheels start to come off a lot. This [0] pretty basic police raid leads to shootout video for example turns to mush pretty quick because even in the initial car ride the interior of the car's size and shape warps pretty drastically.
Feels like there's going to be a dichotomy where the individual visuals look pretty good taken by themselves but the story told by those shots will still be mushy AI slop for a while. I've seen this kind of mushy consistency hold up over the generations so far, it seems very difficult to remove becasue it relies on more context than just previous images and text descriptions to manage.
Feels like there's going to be a dichotomy where the individual visuals look pretty good taken by themselves but the story told by those shots will still be mushy AI slop for a while. I've seen this kind of mushy consistency hold up over the generations so far, it seems very difficult to remove becasue it relies on more context than just previous images and text descriptions to manage.
[0] https://www.reddit.com/r/ChatGPT/comments/1kru6jb/this_video...