As we enter 2024, one technology seems to be looming large over many facets of society. Back in the 1960s, the idea that a “machine” (“computers” were actually people who operated calculating machines) could “think” for itself and give “intelligent” answers was the stuff of science fiction.
Television shows like StarTrek and movies/books like 2001 popularised an ever-present voice-controlled assistant that could be hailed, and asked questions or given instructions. Most of these were benevolent (2001’s HAL being a notable exception).
Fast forward nearly 80 years, and we now have voice assistants from major technology vendors like Amazon (Alexa), Apple (Siri) and Google (“OK Google”). Microsoft tried to jump in on this too with Cortana in Windows 10, since removed. Alexa and Siri are allegedly bleeding their parent companies’ income as the novelty wears off… and so these technology firms are starting to look at what’s next.
The latest gold rush seems to be generative AI. This has been brewing for some time.
Many moons ago I recall mucking around with a markov chain plug-in that was embedded in Perlbot on IRC (no_body
on the old Freenode network). Very crude, but it sometimes did generate somewhat coherent sentences. It was done for fun, ran on the scrap CPU cycles of an old PIII 550MHz server that also hosted this blog and acted as a web server. Nothing huge by any stretch of the imagination. No GPU in sight.
A few years ago, we started seeing articles about an AI system that could generate imagery. Fore-runners of the likes of DALL-E. Ask it to generate a beach scene, you’d get some weird psychedelic image which vaguely looked like a beach if you squinted right, but with odd things merged together, like a seagull merged into a railing or building. Faces were badly distorted, nothing looked “right”.
Unfortunately, I cannot recall where I saw the image I’m thinking of or what keywords to search that will summon it. Otherwise I’d show an actual example. (I think it was either on The Register or Ars Technica… most likely pre-pandemic.)
Fast forward to 2021, and yes, it could generate a vaguely believable image, but it still struggled with human anatomy. A good example of this is the faked Donald Trump arrest photo that was doing the rounds:
This was a big improvement on what came a few years before it, but it still had lots of visual defects.
This time last year, ChatGPT v3 was available to the general public, and it could passably converse with people. For a statistical model, it did a remarkable job of appearing “intelligent”, but ask it to perform some basic tasks, and it soon fell apart. Yes, it could generate code, but you’d constantly have to massage the prompt to get code that even compiled, let alone functioned the way required.
The big rub with all of this, is the extreme amount of computation required to render the result of a simple prompt. Whether the output be text, an image, audio or video… generative AI is often highly computationally expensive, requiring vast data centres crammed full with GPUs and special-purpose ASICs much like the cryptocurrency rigs of a few years ago. There are some small models that can run on your local computer. A top-of-the-line Raspberry Pi can just cram in some AI models with some trade-offs in accuracy, however you cannot train an AI model with such modest hardware.
Generating the models is the real sticking point: it requires vast compute resources, and in addition, lots of data. It’s Johnny 5 on steroids! Where is that data sourced from? More often than not, it is scraped from websites without authors’ consent. While some content is public-domain, there are examples where copyrighted material was used.
Yes, we can point and laugh when an AI hallucinates a watermark, but for the copyright holder or would-be user, this is really no laughing matter. Microsoft is already facing a lawsuit from The Times over Bing Chat (now Copilot) spitting out big chunks of copyrighted articles.
A human usually has a vague idea where they learned something, even if they can’t find it later… and based on that knowledge, they might have some idea whether such content can be legally used in some given context, or can at least ask. AIs typically do not tell you what source material was used in the construction of the output, nor is there any consideration given to whether you can legally use that material.
Some vendors try to make that your problem, MailChimp recently added an AI feature to its mailing list offering, but then made the user responsible for checking up whether the content it generated was appropriately licensed… and decided that your user-generated content was appropriate to feed the training of said AI engine.
It has been ruled in various courts that as purely AI generated content is not “human generated”, it is not eligible for copyright protection. (This ruling is why I was able to include the “Trump arrest” image above despite it not being “my work”.)
This is not the last we’ll see of this technology. AI is actually a very old term dating back to the very early days of programmable electronic computers, from ELIZA (which really was a testing ground for pattern matching, not AI at all!) and PARRY (which was the same idea expanded a little). It includes tools like expert systems. Anyone that’s dealt with open-source software will have seen one very famous expert system: make
.
Having a system that can inspect a photo and then describe what is in the image along with reading out what text might be important, would be a game changer to the visually impaired. In this case, it’s simply describing what is there.
Having a text to speech tool that could be trained on recordings of the voice of someone who has lost their ability to speak (e.g. motor neuron), that the person could then use to communicate, would be a very noble use of generative AI.
The surviving members of The Beatles recently did this with the song “Now and Then“, taking old recordings of John Lennon’s demos, and basically doing some sophisticated signal processing to separate out the components so that a studio-grade recording could be produced.
The technology does have good uses. In both the latter cases, we’re not “putting words into the mouths” of these people, it’s their words, they chose them.
However, I think this year we’ll likely see its dark side, if we haven’t done already. Stephen Fry got a rude shock when he came across an audio book apparently “read” by him, except it was a book he had never actually read: it was the product of generative AI. Someone had trained a text-to-speech model on his voice, then fed this book into it.
Imagine someone using tools like that to dupe a work colleague into resetting a password and enrolling a new 2FA token over the telephone? Depending on where you work, that could have disastrous consequences.
For this reason, I’m particularly leery of systems that take audio or video as input. My workplace used to use Atlassian’s HipChat as a communication tool originally, and when that shut down, we migrated to Slack. At the moment it’s privacy policy and terms of service make no mention of the use of such tools. Zoom was forced to back down on AI use after a biiig user backlash. Microsoft won’t say how it is training its models, but seems hell-bent on jamming its Copilot everywhere it can cram it. They’re even talking of a new keyboard button dedicated for it.
For this reason, I flatly refuse to touch Microsoft Teams. Last time I used it (in my browser), it was for one particular meeting a couple of years ago… it picked up I had a headset, and used that for speaker audio, but when it came to the microphone, did it use the same place? Noooo… the line-in socket connected to an old Sony ST-2950F stereo tuner was more interesting!
Since then, it too has gotten the AI treatment, with little transparency on what that AI is trained on and what its functions are. It’s not clear what it is being trained on, and what the resulting data sets are used for. Furthermore, we’re to trust them to store such training data responsibly? The same mob that wrote code that accepted an expired and incorrectly signed digital certificate as an access-all-areas pass?
That said, the snake-oil salesmen are out in force, and the investors are going wild. We’re seeing ChatGPT-powered sales and service bots appear on all kinds of websites now (until they’re caught out). There are also lots of sites with AI-generated screed polluting search engine results. It’ll likely play a big part in the upcoming 2024 US Federal election. We’re in for a wild year I think.
I for one, do not use ChatGPT or its elk in my day-to-day work, and refuse to do so. My position on AI-infused tools like Microsoft Teams remains the same until such time as the AI feature is removed or its role better clarified.
I still have code up on Github as it was there prior to Microsoft’s purchase of that service: I don’t like that my code may be being used in this manner, however the worst case scenario is copyright infringement — removing my code from Github does not prevent this. I regard video and audio differently: as this can be used for impersonation, I am not going to willingly supply such a feed directly into a tool that may be training itself on it for purposes unknown to me.
Right now, LLMs (large language models) are approaching the “peak of inflated expectations” phase of the Gartner hype cycle. I figure they will die off before long before their actual utility comes to the fore. They may improve the accuracy of machine language translations, specialised ones might be able to give domain-specific advice on a topic (much like a fancy expert system), and they may be able to fill in the gaps where a human can’t be there 100% of the time.
They won’t be replacing artists, journalists, programmers, etc long-term. Some of us will possibly lose jobs temporarily, but once the limitations are realised, I have a feeling those laid off will soon be fielding enquiries from those wishing to slay the monster they just created. It’ll just be a matter of time.
Recent Comments