The Hard Limits of visual AI - it's all in the nose
There's been much discussion of whether image-generating AIs, such as MidJourney or DALL-E 2, will render creativity obsolete. They generate visually-appealing images based on text prompts, and there's an obvious application in replacing stock photos or background art.
I've been experimenting with several of these AIs recently. What I've learned is that they can't initiate new ideas. As their detractors claim, they are merely recycling existing images and their output is what I'd describe as 'consensus art'.
But what is consensus art? Well, it's art that's the average, or the consensus, of all the images fed into the computer model to train it. So, if you show the AI thousands of pictures of dogs, then you can train it to respond to the text prompt 'dog' with something that looks recognisably like a dog. However, it's kinda the smoothed average of every dog the AI has ever seen.
Why is that a problem? Well, because the AI can't deal with edge cases, i.e. statistically-unlikely imagery. So, if the AI sees hundreds of poodles, and then you ask it for a pug, then it will tend to generate an awful lot of lanky pugs with puffy tails. So, if you actually want a pug, and not just a generic doggie thing, you're still going to need a human artist.
Now let me illustrate my real-life example of the 'consensus art' problem with MidJourney, starting with a bit of background about me.
My inward-facing eye
I have mild hyperphantasia, which is a neurological condition where you have an extremely vivid imagination. It's a trait apparently shared by about 2.5% of the population, and is associated with synesthesia (I have grapheme-color synthesiasia) and some mental health conditions (which I don't have).
Hyperphantasia is a bit like having an inward-facing eye. I can see the real world through my 'front' or 'outward-facing' eyes, and I see my thoughts as images inside my head. When I write fiction, I see the story unfolding like a movie, and my job as a writer is to communicate some vague shadow of my mental imagery onto the page.
Yep, I had a teenage crush
Aged 14-16, I wrote a science-fantasy novel called Syer Roth: Fire Empress. I loved this novel and, reading it back, it's heavily inspired by films. I could literally see the events of the book like a movie, including my characters, especially my lead villain, Tirim Kraygor. As a teenager, I was crazily in love with this unrepentantly evil man, and kept drawing pin-ups that I could hang on my bedroom wall!!
Trouble is, I'm a TERRIBLE artist and I could never replicate on paper what I had inside my head.
Realising a teen dream
Skim forward nearly thirty years... I've decided to replot and rework Syer Roth: Fire Empress, and I recently read about Tor's AI cover art controversy. Then, I saw several indie genre authors posting AI pictures of their fictional characters that they'd generated in MidJourney.
So, I decided I was going to use AI to produce a photorealistic image of Tirim Kraygor. As this truly odious person has been hovering about my imagination for the last thirty years, I figured having a picture would finally put an end to my hand-wavery 'oh, he looks a bit like Tirim Kraygor'.
So, I joined a newbie group on the MidJourney Discord and entered at the text prompt a laundry list of facial features (e.g. long pointed nose, dark unkempt hair...). What I got from this looked remarkedly like John Travolta in Pulp Fiction drawn in the style of Vincent van Gogh.
I realised visual AIs aren't great at working with written lists of facial features, and I'd need to use a visual prompt. I already a LENSA AI image of my husband that 'looked a bit like Tirim Kraygor' (yep... whatever you're thinking... you're probably right).
I knew it was best to use photo-related keywords, such as 'taken with a CANON SLR' or 'in the style of' a famous photographer, to get photorealistic images. As my novel is set in a fantasy world with World War II-era technology, I decided I'd make Tirim Kraygor look like he'd been photographed using a 1930s camera.
So, I plugged the LENSA AI image into MidJourney with the prompt 'wearing a nehru-collar jacket in the style of Edward Malindine'.
Interestingly, MidJourney decided Tirim Kraygor was probably Japanese or Korean by ethnicity (he's Belorni). Initially, I thought this was something to do with the nehru-collared jacket, but... well, more on that later.
At this point, I started to realise MidJourney simply wasn't very good at giving me (or anyone else) something specific. It's great at generic images (e.g. give me a blond happy girl), but less so at depicting individuals - even ones where it already has a picture.
Not put off, I decided to quit MidJourney and using the LENSA AI image. Instead, I decided to turn my thirty-year-old pencil cartoon of Tirim Kraygor (see above) into a 'real boy'.
Enter yet ANOTHER visual AI... Artbreeder.
As per the video, pencil cartoons of people are at the limits of Artbreeder's capabilities. Once I imported the cartoon, it stopped looking like Tirim Kraygor (badly-drawn or otherwise) quite quickly.
However, as per the YouTube tutorial, I spawned some new faces from the pencil drawing. These were, again, interestingly, mostly of people who looked like they might be Japanese.
It was at this point I realised that, as AIs are blatantly racist, the reliance on training data means that they associate facial features with ethnic groups. Tirim Kraygor's appearance was, I realise, inspired a little by bishonen male anime characters - probably from the Nintendo games I had as a kid.
The AI had picked this up and was now insisting that Tirim Kraygor was Japanese.
Having realised the pencil image wasn't going to work, I decided to merge it with the LENSA AI image using the Artbreeder splicing tool. I don't have any pictures of the intermediate steps, but it was here that I started realising the extent of the ethnicity problems with Tirim Kraygor's facial features.
Artbreeder's portrait tool is controlled by sliders. These cover broad facial characteristics, such as amount of facial hair, gender, eye and hair colour, and... well, ethnicity.
The problem here is that, once the AI had decided Tirim Kraygor's facial features suggested east asian ancestry, it was not giving up. Every time I tried narrowing his face, by tweaking his gender/facial expression, it would promptly shrink his nose and give him epicanthic folds.
I got very frustrated that I just couldn't stop the AI shrinking his nose. I tried everything. I tried to merge other portraits already on the website, but no one had his nose and everyone was conventionally beautiful. I even tried turning the 'white race' slider up (my goodness, this is getting very Nazi very quickly), but that just gave him stubble and a broad jaw like Brad Pitt (really? I mean WTF!).
Eventually, I pulled the asian slider heavily into the negative, and then boosted the middle-eastern slider right up. And that got his face broadly correct... except, of course, for his nose.
The trouble with a nose...
You'd think this wouldn't matter. However, the 'real' Tirim Kraygor (i.e. the picture in my head) has a very distinctive nose. It's long and pointed, and turns up at the end. I think his prominent nose originally came from the Mule in Second Foundation (I wear my influences proudly), but he eventually evolved a long upturned nose to stare arrogantly down at people.
What's interesting here isn't whether Tirim Kraygor, a fictional character, has a large nose or not. What's fascinating is that these two AIs can't deal with statistically-unlikely facial features. And, as such, they refuse to create anything that is distinctive or unusual.
Tirim Kraygor is a Belorni duke, so you wouldn't really expect him to have the facial features common to Earth ethnic groups. But, less flippantly, he's an example of a statistically-unlikely human.
He has a bone structure that may be statistically more common in pictures of people with middle-eastern or east asian ancestry. His nose, however, is not from these ethnic groups. In fact, after scanning some rhinoplasty websites, he has a 'ski-jump nose', which is apparently more common among people of European ancestry (I have been reading about racist beauty standards in nose shape for 30 minutes in the nail bar, and now just feel desperately sad :( ).
Although multiracial people are the norm in my home borough in south-east London, this is less the case elsewhere in the world.
Moreover, after perusing rhinoplasty websites, I'd actually imagined Tirim Kraygor with a 'low radix', which is where your nose starts below the line of your forehead - a feature customarily viewed as unattractive. Also, his nose has a slight snub at the end. He has also, on inspection of my pictures, a facial feature called ptosis - or droopy eyelids, often caused by congenital nerve issues in the upper eyelid.
In short, Tirim Kraygor is not a statistically-average man. He's not visually stereotypical of an Earth racial group and he's not conventionally attractive either. Nor was he ever intended to be - even in the original book, he's not regarded by other characters as especially handsome. He's certainly striking-looking, but that's rather different- it's about a person's unique character, not about the symmetry and conventional attractiveness of their face.
And that's the long, and short of it really. The two visual AIs, Artbreeder and MidJourney, can generate 'images', but they can't generate 'specific' pictures. AIs like them can generate people who don't exist, but not a specific person who never existed. They can recycle fantasy images and real objects in endless permutations, but can't originate something that's within the mind of a creator unless it's been seen before - and, even then, not without a shedload of help.
And, for that reason, folks, I will be resorting to the Photoshop Liquefy function. I need a human artist (muggins here) to 'unfix' Tirim Kraygor's nose...