GPT-4o Will be Omnipresent

On May 13, 2024, OpenAI presented GPT-4o. It writes about it on its website: „GPT-4o (‚o‘ for ‚omni‘) is a step towards much more natural human-computer interaction – it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time (opens in a new window) in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.“ (Website OpenAI) The company is addressing the weaknesses of the previous model: GPT-4 „loses a lot of information – it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.“ (Website OpenAI) It is precisely these weaknesses that the new model can overcome. „With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.“ (Website OpenAI) A video shows how the language model processes video data. With this function, the Be My Eyes app could take another leap forward in its development. In the coming weeks and months, there will certainly be many interesting application possibilities. GPT-4o will be omnipresent.

Fig.: GPT-4o processes video data (Image: OpenAI)

 

Grok Wants to be the 21st Century Hitchhiker’s Guide to the Galaxy

xAI’s new language model Grok was unveiled to a select group of users on November 4, 2023. Elon Musk’s startup released a statement the same day: „Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy, so intended to answer almost anything and, far harder, even suggest what questions to ask!” (Website xAI) The Hitchhiker’s Guide to the Galaxy is a fictional guide book for hitchhikers in the universe written in the form of an encyclopedia and the namesake of the famous novel by Douglas Adams. „Grok is designed to answer questions with a bit of wit and has a rebellious streak, so please don’t use it if you hate humor!“ (Website xAI) These capabilities are reminiscent of DeppGPT, the chatbot of the German satirical magazine Postillon. „A unique and fundamental advantage of Grok is that it has real-time knowledge of the world via the 𝕏 platform. It will also answer spicy questions that are rejected by most other AI systems.“ (Website xAI) This statement seems particularly important. It makes sense for chatbots of this type to follow moral rules, i.e., to be shaped by machine ethics. However, chatbots like ChatGPT refuse too many requests and are therefore of limited use. Not only that, but they also refuse to acknowledge essential areas of human existence, such as sexuality. The word „Grok“ itself comes from another famous science fiction novel, „Stranger in a Strange Land“ (1961) by Robert A. Heinlein.

Fig.: Fictional cover of „The Hitchhiker’s Guide to the Galaxy“ (Image: Ideogram)

Speaking with Harmony

There is great media interest in the new book „Maschinenliebe“ (ed. Oliver Bendel), which was published in October 2020. Several review copies were sent out. The title means „Machine Love“, „Machines for Love“, or „Machines of Love“. Three contributions are in English. One of them – „Speaking with Harmony: Finding the right thing to do or say … while in bed (or anywhere else)“ – is by Kino Coursey (Realbotix). From the abstract: „Doing or saying the right thing in response to circumstances is a constant problem, especially for embodied personal companions like Realbotix’s Harmony. In this paper we will describe the Harmony system, how it finds the right thing to say or do, and how recent advances in neural network-based natural language processing and generation will be integrated into next-generation systems. These advances will allow the transition from pattern-oriented responses to dynamic narrative-oriented response generation. Future systems will be able adapt to their situation much more flexibly, and allow a wider range of role-playing and interaction.“ The focus here is on the use of large language models such as GPT-2. More information via www.springer.com/de/book/9783658298630.

Fig.: Speaking with Harmony