There's at least three "levels" you can consider with image generation: composition, facial likeness and style. Prompts are pretty weak at composition and are the strongest point of controlnets - they do a great deal to make up for the weakness. But there are some compositions SD can't find even when given detailed controlnets.
Style generality is frequently lost in fine-tuned models. The original dreambooth tried to get around this by generating lots of images of the class to retain generality, but it's time intensive to generate all the extra images (and ideally do some QC on them) and train on them too, so it's not often done.
Prompts seem to be a new type of camera, lens or paintbrush.