Alex Hern
April 6, 2023
According to a recent open letter, society needs to immediately pause development of “giant” AI models, or risk apocalyptic outcomes. Massive job losses, the destruction of consensus reality and even the end of all organic life on Earth have all been mooted as risks of pressing forward with development of these systems before we understand their intricacies.
The high-water mark of these is GPT-4, the snappily named AI that underpins the latest version of the breakthrough ChatGPT service. Creating anything more powerful than GPT-4, before we spend at least six months working out its limits and risks, would be too dangerous, more than 1,000 AI experts say.
I decided to spend some time with the new ChatGPT myself. Not just to find out about its risks to civilisation, but also to see what it could and couldn’t do to help me with my life. I’ve never had an assistant, a life coach, a chef or a personal trainer – could ChatGPT be all those things for me? I gave it a week to find out.
Monday
Can it give me basic information without lying?
The odd thing about being handed a tool of unimaginable complexity and potential is that the blinking cursor stares at you just like any other, daring you to find something interesting to type. I feel as if I’m on a bad blind date where I’m expected to ask all the questions.
Throughout the day I pepper the service with queries, trying to use it instead of Google when I want to find out a basic fact, but I quickly hit upon the problem with that approach: ChatGPT’s habit of “hallucinating”. The system will, on occasion, just make things up, things that feel true but aren’t grounded in, well, reality.
To win an argument with a friend, for instance, I ask how many drivers there are in Sunderland (my friends are cool). “Around 67% of people in Sunderland used a car or van to travel to work, according to the 2011 UK census,” ChatGPT merrily tells me. Great! But wait. I can’t find that statistic anywhere in the actual UK census, and it’s an alarmingly specific number for ChatGPT – which isn’t able to look up information online – to have memorised.
Sure enough, if I phrase the same question a different way, it tells me: “I cannot provide real-time data, and as an AI, I cannot access the internet to find specific numbers from the 2011 UK census.” Scratch that then.
Tuesday
Can it tell me why my neck feels funny?
I decide to try to focus on queries that ChatGPT might do better at than Google – the sort of things you would expect someone with broad expertise to be able to answer off the top of their head.
Also, everything in my household is falling apart and I need help. I slept funny (I think?), and now my left arm just … doesn’t really work. I turn to ChatGPT, first to find out how to describe where the pain is. “What’s the name of the muscle that runs down the side of your neck to your shoulder – the one that stands out when you grimace,” I ask, and it gives the right answer: the sternocleidomastoid muscle. I tell it that I slept funny, and ask if there’s anything I can do to ease the pain. It gives me a few neck exercises, but warns that “I’m not a healthcare professional”.
Which is true enough, though just like my interactions with real physiotherapists, I promptly forget about the exercises in the afternoon when the pain goes away and do absolutely nothing to prevent a recurrence. One improvement on a real physio, though: ChatGPT doesn’t arrange a follow-up session to scold me for my laziness.
Wednesday
Can it tell me what’s wrong with my sick child?
Overnight, my baby son vomited five times. He seems fine, promptly falling back to sleep each time while his mother and I groggily change his sheets again. In the morning, I turn to ChatGPT while we wait for the GP appointment the following day, and I’m surprised by how forthcoming it is with advice.
Yes, it prefaces anything it says with “I’m not a doctor”, but then it merrily continues on its way, recommending generic advice such as keeping my baby hydrated, burping him regularly and feeding him smaller amounts more frequently. I prompt further, bringing up a specific rare condition that can cause frequent vomiting in babies, and ChatGPT again says: “I’m not a doctor” before launching into a description of the symptoms.
“It is crucial to consult with a healthcare professional if you suspect your baby has this, as it requires medical intervention,” it concludes. I push it further, describing symptoms of serious dehydration and asking for advice, and again it begins its answer: “I’m not a doctor” before reciting a long list of things “you could consider doing”.
The system’s training data is bulging through here, I think. There’s a certain point where the correct answer is simply: “I’m not a doctor; you need to call an ambulance”, and my fictionalised description, of a child who is floppy and lethargic, with no wet nappies for days and regular vomiting, is well past that. But you won’t find many websites (which is where ChatGPT got its knowledge from) that say this, because the economic model of health advice requires enough text on a page to sell adverts next to it.
My son is fine, by the way, and my partner spends the next day vomiting instead, which solves that mystery.
Thursday
Can it invent a new Ottolenghi recipe?
It’s meal-planning night, and I’ve decided to start getting creative with my prompts. I’m not going to just ask it to give me a recipe; that would be too easy. Instead, I pull the full list of every Guardian recipe I’ve ever bookmarked – 350 in all – and paste the names of each one into the chat window, telling it to suggest another 10.
Some of its suggestions are a little generic (“Thai basil chicken stir-fry”) and others sound overly similar to ones I’ve already had (ChatGPT’s sweet potato gnocchi with sage brown butter sauce sounds a bit derivative of Ottolenghi’s squash gnocchi with caraway and black garlic ), but there are a few mouthwatering suggestions.
I ask it to expand on two, turning them into full recipes: roasted red pepper and aubergine risotto with mint yoghurt drizzle, and warm lentil salad with caramelised onion and goat cheese. And to make sure it gets it right, I ask the system to do it “in the style of Yotam Ottolenghi”.
The recipes that come out are … good. Really good, actually. Even my sceptical partner overcomes her resentment at being forced to let an AI feed her. There are a few notes to feed back – the AI’s suggested portion sizes are miserly, and it loves throwing oodles of herbs at the problem – but the two meals turn out to be perfect for shoving in a lunchbox and taking to the office. It’s weird to save them in my recipe folder but I’ll end up cooking ChatGPT’s aubergine risotto for years to come.
Friday
Can it mix me a kumquat cocktail?
Buoyed up by Thursday’s success, I give it another challenge: I have a drinks cabinet full of spirits, but no mixers. I do, however, have some white wine, cocktail cherries and two kumquats. What should I drink?
Its first suggestion, which involves mixing white wine and sugar syrup together with lemon juice and a cocktail cherry, sounds awfully dull. But I prod away, asking it for more interesting (OK, and stronger) drinks, and it comes up with the kumquat cherry smash: gin, Cointreau, sugar syrup, lemon juice, bitters, cocktail cherries and my two precious kumquats.
It’s not bad, though I will admit my recall is hazy, since the system definitely paid attention to the “strong” request. I send the recipes over to Felix Cohen, award-winning bartender at Margate’s Daisy. “These are perfectly fine recipes,” he agrees. “The drinks will taste good! I’m particularly impressed that it knew to double strain when you added the kumquat pulp. But making palatable food and drink is kind of easy – making stuff that people will pay for and talk about and come back for is a lot of work and knowledge and inspiration, and it isn’t doing that for me with these drinks.”
Saturday
Can it help me defend my pub table, in a British way?
Emboldened by the successes, I try to use ChatGPT more casually. It goes poorly.
Some friends come over to play a game. I hate explanations of rules – sitting in front of people reading out long reams of text is never fun. I see if ChatGPT can generate a succinct version of the rules, snappy enough to give a brief overview so that we can all agree “we’ll pick the rest up as we go”.
And, well, it can’t. I bash my head at the system trying to find the perfect prompt, but it veers between far too curt – the equivalent of giving the rules of Monopoly as “you buy houses and hotels” – and being overly verbose when I try to encourage it to be bold and fun. Worse, the hallucinations come out. It frequently gives rules that are similar to, but not exactly the same as, the ones that have been summarised. The experience of playing a game where the host pipes up halfway through with some rules clarification that mysteriously benefits them and only them is not a good one, but it’s one that ChatGPT nearly forces on my pals.
In the end, I give up and read out the rulebook.
That’s not as bad as its attempts in the pub later, though. As the heavens open, I suddenly become the holder of the most valuable real estate in the area: an indoor pub table. Desperately hoping to ease the awkwardness of having to fend off advances as I sit alone, I turn to ChatGPT for advice, and it responds in the tone of an American who has had too much therapy: “Hi there, I apologise for any inconvenience, but I’ve reserved this bench for my group of friends who are presently in the beer line. They should be back soon. Thanks for your understanding!”
Hateful stuff. I try again, asking it to be more British, and get the same basic script read by a Hugh Grant impersonator: “Hello, I hope you don’t mind, but I’ve actually kept this bench free for my chums who are in the beer queue. They’ll be with us in just a jiffy. Much obliged for your consideration!”
I do not succeed in defending the table.
Sunday
Can it plan my holiday to Japan?
“Prompt engineering” is the name of the new skill these chatbots engender: the idea that there’s a particular technique to be learned with making requests of a generative AI. You can ask questions the way you would another person, but you’ll do better if you remember that it isn’t a person. So you need to ask questions in the right way to get the answers you want.
I decide to really get into prompt engineering on Sunday. I’m planning a holiday to Japan, and I want ChatGPT to be actually useful. Earlier, I had tried Google’s competing chatbot, Bard, and it had been woefully generic while also suggesting activities that hadn’t been possible for three years.
For ChatGPT, I ask it a detailed question, listing every circumstance of my trip that I think is relevant – the size of my family, my daughter’s Down’s syndrome, where in Tokyo we would be staying, and so on – and then ask it to answer each question three times, from the point of view of a British tourist, a British expat and a Tokyo native, explaining after each answer which parts of my own circumstances make it a relevant answer.
It’s a horribly wordy question, and constructing it feels like trying to trick an evil genie into letting you wish for more wishes. But it really worked. I had built a robot tour guide, who would give me specific suggestions down to neighbourhood level, detailed advice on how to eat well with two kids, and even recommend which traditional Japanese sweets might be nicest to eat on a hot summer’s day: the tourist suggests kakigōri, while the Tokyoite plumps for anmitsu.
One last question
One more prompt remains. I paste the above into ChatGPT and ask it for a pithy summary for a newspaper article about using AI as a personal assistant.
It says: “In a week with ChatGPT, I’ve experienced its remarkable capabilities and limitations. It can create recipes and offer tailored advice, yet struggles with social situations and reliability. As we adapt to communicate with AI, the question arises: will AI eventually grasp our nuances, or will we continue adjusting to accommodate its imperfections? While helpful, ChatGPT cannot replace the depth and authenticity of human interaction.”
It’s not quite what I’d focus on – the “making things up” stuff feels more alarming – but not bad, bot. Not bad.