> A creative director simply comes up with high level instructions (like the one above) THEN the ultra hard part is making that high level instruction a reality. That is HARD and as of now it looks like both skills are being replaced by AI.
This is the opposite of what I see in reality right now. The hard part is to make the artists do exactly what you want, down to the subtle but meaningful details. That happens because the high-level instructions don't have enough capacity to deliver the full meaning. This is the same with generative models: extremely hard to control with the "prompt engineering" gimmick, it's only fine when you are OK with random output. Besides, they are purely functional and lack feedback mechanisms the creative director usually employ with artists.
That's why people are trying to make techniques more complex - large animation houses experiment with training their own model architectures and software. This is the way 3D CGI was born - simple at first, and yes it got plenty of doomposting at first (we are getting replaced by computers!), until it was clear that the field becomes extremely technical and complex, so it even has dozens of specializations inside it.
All entertainment is ultimately based on novelty, as human brain is really good at distilling meaning from the ocean of information. If you start with little meaning (your example - "a cowboy with blue skin with a laser cannon riding a unicycle through the grand canyon with a dwarf chasing him"), people get bored - no matter how much randomized artsy-looking stuff you add around it.
If you think that AI can generate all meaning that is relevant to people, I don't buy it, as it doesn't have the same training material. It's trained on the result and is forced to reverse engineer what moves people; reverse engineering is a fundamentally more costly and opaque task.
Train the AI exclusively on well lauded texts. Then you can get a generative AI that moves people.
The problem with "reverse engineering people" is that you don't need to, the things they like in any era follows predictable and generic patterns. These patterns can be encoded into AI provided we find enough curated training data.
> The hard part is to make the artists do exactly what you want, down to the subtle but meaningful details.
This is because the artist can't read your mind. It has nothing to do with your skill or the artists skill level. That feedback loop you are alluding to is more a reflection of the unclarity of your instructions or your imagination.
You thought what you imagined looked good but the artist in following your directions created something that showed you how flawed your imagination was.
> The problem with "reverse engineering people" is that you don't need to, the things they like in any era follows predictable and generic patterns. These patterns can be encoded into AI provided we find enough curated training data.
That only works to a certain degree. For the infamous example, try making GPT output the correct number of asterisks, you will get mixed results. You don't have any problem with typing exactly 1589 asterisks because you run the stateful counting algorithm in your head. GPT has no idea about the algorithm - it has to reverse engineer it from the text, and can only extract the vague correspondence between a number and a string about this or that length. You don't give humans examples to reverse engineer, you teach them to count.
This is a simplest example, it might even learn to count eventually, as it's far more capable in certain aspects. But as the dimensionality of the task grows, the amount of resources and training data required to reverse engineer it grows much faster.
Sure, it can spot some patterns and that can look good, but some things are just plain invisible in the result - you will have a hard time making it learn higher level concepts because they highly depend on hardwired things like the dumber part of neural circuitry and biochemistry in humans, which the model doesn't have.
It's like trying to make a photo in a dark room - no matter how you improve the sensitivity of your camera, you might not have a single photon in it.
> This is because the artist can't read your mind.
Yes, this is what I mean by the limited capacity of a simple textual description. It's a fundamental limitation - natural language is just poorly suited for the detailed conceptualization. A sketch, or a conceptual diagram, or other higher order control methods have far more capacity to explain your intent, and that's the direction those models move to. At which point their usage is nothing like "type something simple and receive the result".
The asterisks thing is another issue. LLMs don't need to do this to replace directors.
>Yes, this is what I mean by the limited capacity of a simple textual description. It's a fundamental limitation - natural language is just poorly suited for the detailed conceptualization.
Except LLMs can accept sketches as input. The higher order methods of communication are covered by encoders.
This is the opposite of what I see in reality right now. The hard part is to make the artists do exactly what you want, down to the subtle but meaningful details. That happens because the high-level instructions don't have enough capacity to deliver the full meaning. This is the same with generative models: extremely hard to control with the "prompt engineering" gimmick, it's only fine when you are OK with random output. Besides, they are purely functional and lack feedback mechanisms the creative director usually employ with artists.
That's why people are trying to make techniques more complex - large animation houses experiment with training their own model architectures and software. This is the way 3D CGI was born - simple at first, and yes it got plenty of doomposting at first (we are getting replaced by computers!), until it was clear that the field becomes extremely technical and complex, so it even has dozens of specializations inside it.
All entertainment is ultimately based on novelty, as human brain is really good at distilling meaning from the ocean of information. If you start with little meaning (your example - "a cowboy with blue skin with a laser cannon riding a unicycle through the grand canyon with a dwarf chasing him"), people get bored - no matter how much randomized artsy-looking stuff you add around it.
If you think that AI can generate all meaning that is relevant to people, I don't buy it, as it doesn't have the same training material. It's trained on the result and is forced to reverse engineer what moves people; reverse engineering is a fundamentally more costly and opaque task.