Subscribe to Intero and get updates delivered right to your inbox.

  • Content rich
  • Relevant
  • Insightful
  • Authentic

Attention is the Beginning

For those short on time:

I get it. AI is a potential threat, another thing to learn, spend money on, pay attention to, and think about, in addition to everything else you are doing, juggling, and thinking about.

So, if you feel that way, you probably missed what happened this week. You can thank Sam Altman for making you pay attention.

OpenAI introduced GPT-4o, a faster, more capable, and more efficient model that will be free to all users and cut developers' building on the API costs by 50%.

This new model has vision, voice, and intelligence capabilities that can almost do things like Jarvis (from Ironman).

Sam Altman teased magic. And it is magic.

The announcement has been criticized for the rollout's confusion and lack of clarity. Additionally, some say they are disappointed or not impressed with the features.

I disagree. Giving everyone GPT-4 level intelligence for free and providing all these modalities is incredibly impressive. OpenAI wants to stay the leader in this race, and you can benefit from it.

You now have a choice. Stay ignorant, stay idle, improve your life and work, or redesign how you live and work.

 


 

Congratulations, You Are Now Smarter.

 

I believe Monday’s announcement from OpenAI was nearly as significant as the original launch of ChatGPT or GPT-4.

You wouldn't get that impression from news articles, reactions on social, search volume, and hype.

Whether you are unaware of what was announced or don’t fully grasp the impact on life and business, I will try to catch you up in this blog.

I will explain the impact, detailing what was introduced and demonstrated and its implications.

Mira Murati, OpenAI's CTO, walked viewers through the updates in a wide-ranging demo showcasing what they are offering (I include short clips of the demo in this blog):

  1. GPT-4o, the most intelligent model across voice, text, video, and image
  2. Removing friction in the availability of the model
  3. A laser focus on user experience

 

My initial thoughts were, “Is that it?” However, after rewatching the announcement, thinking about it deeply, and hearing others' opinions, I realize these updates are far more impactful than what is portrayed or overlooked.

As Cassie Kozykrov, former Chief Decision Scientist at Google, said (I put this in my last blog, which I would suggest reading), "The AI revolution is a UX revolution."

 

 

OpenAi is changing the world with, yes, more intelligent models but, more importantly, how we interact with those models.

It all started with a paper from Google researchers titled "Attention Is All You Need." I say, "Attention is just the beginning." We're at a point where ignorance and staying idle won't cut it. You need to take action. (For more technical readers, I know that "attention" means a mechanism in the transformer.)

 

It all started with a paper by Google researchers titled "Attention Is All You Need." I say, "Attention is just the beginning." We're at a point where ignorance and staying idle won't cut it. You need to take action. (For more technical readers, I know that "attention" means a mechanism in the transformer.)

 

 

The Capabilities of GPT-4o

The launch of GPT-4o is a big step forward in AI technology.

This model is faster, more efficient, and better at simultaneously handling tasks involving text, images, and audio.

Unlike earlier models that used separate systems for speech, text, and images, GPT-4o combines all these modalities, making it quicker and more accurate.

Using DALL·E within ChatGPT led to outputs that didn't make sense and a bit of lag time. Using Whisper for voice had latency that did not feel "immersive," which was the term Mira used in the demo.

This improvement means users can use GPT-4o to generate higher-quality outputs, have a better user and learning experience, and open many possibilities.

By making AI more powerful and easier to use, GPT-4o offers users more than a tool. It is a partner in completing almost any task.

 

Voice Function

I am most excited about the upcoming launch of GPT-4o's voice-to-voice functionality. This feature lets you chat in real time without the "hassle" of voice-to-text conversion, giving you a truly hands-free experience.

The funny thing is that it wasn't a hassle. I have been using the voice-to-text-to-voice functionality in the ChatGPT app for months, and it has completely changed my work. If you or your team spend any part of your day behind a windshield or walking, your life is about to change.

Imagine the possibilities: commuting, driving to meetings or sales appointments, doing household chores like laundry or dishes, or simply taking a break from sitting at your desk. These are all opportunities to leverage voice to work or strategize with whatever expert you need ChatGPT to personify.

This is how AI can give us time back. The key is how we choose to use that time.

ChatGPT's current voice functionality operates on a voice-to-text-to-voice basis. This means a slight delay and an issue with achieving a completely hands-free experience. You can't interrupt it with dialogue. You have to click on the app.

Imagine being able to do work while driving home from a meeting. Before you get on the road, connect to Bluetooth or CarPlay and click the conversation button on ChatGPT. You can draft follow-up emails, create to-do lists, update your CRM, and strategize on the next steps from your meetings.

When you get home, you have more time to play with your kids, spend quality time with your spouse, work on a side hustle, or enjoy your hobbies. You get your time back.

This would have changed my life when I was selling on the road. I would not have to update my notes when I went home or spend time deciphering my voice memos and handwritten notes. I would have acted on the information while it was fresh.

In the past two weeks alone, using ChatGPT by voice, primarily while walking, I have accomplished the following:

  • Developed the outlines and first drafts of five blog posts
  • Revamped my website content, leading to something I am finally happy with
  • Built the idea and initial framework of a virtual focus group GPT to evaluate all my sales and marketing. I have 10 target personas evaluating everything I do.
  • Drafted email frameworks for my colder outreach
  • Created a draft of a strategy for a client
  • Discussed and documented the impacts of GPT-4o after listening to the spring update (that became this blog).

 

I am about to test how much work I can do while playing golf... :)

All of that doesn't even consider that it can detect emotion and tone in your voice or mimic other voices to talk to you.

The utility of this technology is endless, from mental health to being a companion for those isolated because of a physical ailment. The healthcare industry needs to be all over it.

 

 

Vision Capabilities

Computer vision is wild to me. The advancements in this area are groundbreaking and keep moving forward.

One of my most popular posts showed what ChatGPT could do with a picture of my grocery receipt. With GPT-4, I could do budgeting and get recommendations and recipes from a photo of my receipt. Now, this tech is going to be even better.

GPT-4o has taken this further by allowing it to see and recognize emotions and understand your surroundings while you talk to it.

You could troubleshoot a piece of equipment, document inventory, or get visual assistance with daily tasks.

Imagine walking through your warehouse, and GPT-4o can automatically recognize and catalog items, making inventory management more efficient and accurate. Attach this to a drone, and wow...

This can save countless hours of manual data entry and reduce the risk of errors.

As I said, if you pair vision with voice in healthcare, you can truly save and improve lives.

 

Real-Time Translation

"Imagine getting into an Uber in Paris and being excited to attend an Olympic event. The streets are bustling, and as your car inches forward, you realize a traffic issue might make you late. Your driver speaks only French, and you don’t. This language barrier could have led to confusion and frustration in the past.

But now, with GPT-4o, you pull out your phone and have a translator in your pocket. You speak into your phone in English: “Can we take a different route to avoid the traffic?” Instantly, the app translates your words into French and speaks them aloud to the driver. The driver nods, understanding perfectly, and reassures you with a plan to navigate the congested streets.

This seamless interaction reduces stress and enhances your travel experience, demonstrating the transformative power of real-time translation. With GPT-4o, the world becomes smaller, and communication barriers dissolve, making international travel smoother and more enjoyable."

What is in the quotes above was written by GPT-4o without editing and a single prompt. I gave it context and said to write a story showcasing the new translation feature.

Do we all need to stop learning languages? No, but imagine a world where everyone can communicate. A general understanding is not a barrier. Cultural differences, context, and nuance still exist, but getting a general sense is possible.

This is big time if you are a business that needs translation or works across multiple languages.

 

Coding Assistance

ChatGPT’s coding assistance capabilities are noteworthy. Coding is not my area of expertise.

My area of expertise is using data analysis features and writing "code" to get the most out of other applications, such as Excel and other low-code apps. You are missing the point if you aren't using ChatGPT or other-generation AI to help you do more with your data or perform better in your technical conversations.

Again, I said this is not my area of expertise, so here is what Perplexity says about it.

 Article content

 

Availability to Everyone

Most users have only experienced the free version of ChatGPT. Many don't know there is a more intelligent version; if they do, they don't understand why they should pay for it.

ChatGPT 3.5, the free version, is still very impressive and is approaching its second launch anniversary.

Many stop using 3.5 because they don't know how to prompt it to get what they are looking for or try to use it as a search engine... which is a big problem.

So, the model with all the abovementioned capabilities is free—yes, free. OpenAI decided to launch its flagship model, GPT4o, for free. Yes, it is the most intelligent model in the world that costs millions of dollars to run daily for free. It is stunning; that is why I said free so many times.

This isn’t just about a new model; it’s about making advanced AI accessible to everyone. They are running the largest and most successful product-led growth strategy, and now everyone gets something that could easily cost $100s per month per user for free.

Stop and think about what that means. With an internet connection, you have so much power at your fingertips.

Now, that does bring up risks because employees are more likely to use these tools if they are free. It also raises the need for responsible AI use, policies, and training.

 

Pricing for the API

IGPT-4o is a game changer for developers. It is 50% cheaper and offers five times higher rate limits than GPT-4 Turbo, making it more cost-effective for building and deploying AI applications.

Technology tends to get cheaper over time due to efficiencies and advancements. AI models that need less energy and perform tasks more efficiently is a great step.

Some businesses woke up to having a significant operating expense cut in half while receiving an upgrade.

As AI technology evolves, operational costs will continue to drop, making AI increasingly attractive to those seeking to take advantage of it.

 

A Focus On The User

Speed

Why does responding faster matter? People complain about the pace at which AI responds all ready. This meme I created says it all.

You need it to feel like magic for interest and adoption to occur. Increasing the speed does this, providing a quick win.

The "pretty good and fast" reaction encourages people to keep trying and testing what is possible, increasing adoption rates.

 

Article content

 

The Desktop App

Taking accessibility a step further, they then announced a desktop application, allowing it to partner with everything you do right there without having to go into a browser to use it.

The emphasis on reducing friction for users to access ChatGPT is a significant step. The announcement of a desktop version makes it even easier to integrate ChatGPT into daily workflows.

 

No Login Access

You can access the most intelligent model without creating login credentials. They made this update before this release, but pairing this with free access to GPT-4o is crazy.

Just like using Google or checking the weather, you can access some of the most powerful technology in the world.

Article content

 

UI Refresh

The new UI is good. Usually, UI refreshes are lacking. The way the chat now resembles iMessage is simply brilliant. The feature allowing you to hide previous chats is so nice, especially when you need to focus without distractions.

Additionally, getting outputs from all three models for comparison and contrast is like the cherry on top, or as they say, "the chef's kiss."

 

GPTs

The GPT store, which isn't a store, has fallen flat. GPTs are too easy to build; without proprietary or specific data, they are just less effective versions of ChatGPT.

The worst part is that you couldn't share them with non-paying users. This is changing. Now, all users can access GPTs, leading to high-quality, valuable GPTs.

It also allows people to build GPTs for others and sell them access without listing them on the marketplace. However, please be careful when buying GPTs.

Article content

 

Criticism of The Announcement

I have seen many responses about how this announcement was not impressive, helpful, confusing, or other such adjectives.

Additionally, because of this lack of hype, fewer people know what OpenAI announced. It has not hit mainstream media like it should with this type of announcement. LinkedIn News, CNN, Fox News, MSNBC, and their affiliated business channels had articles, but they were very subdued.

Ethan Mollick, who rarely criticizes anything in the AI space and seems always to give the benefit of the doubt, posted this:

The entire rollout of GPT-4o features are confusing down to the terrible name (why, OpenAI?). Right now image output uses DALL-E but it won’t soon - which will allow really impressive image control. Image input apparently is GPT-4o multimodal already but you wouldn’t know, as the voice input is still whisper. Conversation mode only uses the old voice system, but will upgrade at some point to the new one. And GPTs seem to run on GPT-4?

 

Is the criticism unfair? It is a bit. OpenAI is doing things with a smaller team that Google, Amazon, Apple, and many others couldn't have dreamed of doing. They are playing with a cheat code. They are hitting their projects out of the park. Sam Altman was dubbed marketer by the year by the Marketing Against Podcast and they are doing all of this at scale and fast.

If the rollout isn't as clean, I give them a pass.

 

Here is what OpenAI has on their website about the availability:

Article content

 

The Future

I have written before about the fallacy of predictions. We are all bad at predictions. While some are a bit better than others. We all still struggle to make accurate predictions.

So, I will not make predictions here, but I will make more educated guesses about where this is going. I probably will be incorrect, but this is my feeling.

First guess: OpenAI will make a deal with Apple and replace Siri with the most helpful and intelligent assistant in history. The promise of Siri and Alex

As The Marketing Against the Grain guys said, the Siri team should be fired. They had all the resources, and they got beat. Now, Apple needs to make a deal with OpenAI or risk another phone using this tech.

Second guess: OpenAI will release "GPT5" or whatever they call it before Thanksgiving. OpenAI is pushing the frontier of what is possible. It has been a while since their GPT-4 model was released, and everyone is working to match it or incrementally surpass it. This announcement of GPT-4o is the beginning of a major year in advancements.

Third guess: The rollout of GPT4o will cause crashing and bandwidth issues. This has been an issue, which is why many people originally upgraded to the paid version. Once it goes "viral," what can you do with GPT-4o? You will see more and more adoption, causing the service to crash and time out.

Fourth guess: Custom GPTs will be the basis of future AI agents. AI agents or AI working on your behalf are the most likely future. However, the basis for that will be the mini apps with the most utility. Today, you can string together mini apps to function like an agent. If you want to get ahead, start building GPTs.

Fifth guess: This technology will mesh well with augmented reality. Imagine a future with GPT-4o capabilities, speech, and vision, with glasses that allow you to interact with screens connected to your phone, watch, and other devices. It's scary, exciting, and sci-fi, but you can see where this comes from if you connect the dots. You can do your entire job from the golf course, a boat, or walking in a park.

 

Conclusion

I genuinely believe what I said at the beginning. Attention is just the beginning. You have had almost two years to pay attention. The thing is, most haven't. Why?

  • Too much going on
  • Lack of interest
  • Fear and uncertainty
  • You tried it and didn't see the magic
  • You don't know where to start
  • It feels daunting
  • You dislike AI

 

Whatever the reason, now is the time to start paying attention and move into action.

How? Listen to some podcasts, watch a couple of YouTube videos, start following some people talking about this, get training, bring it up in discussions, and most importantly, play with the tools.

If you fear being replaced by AI, the worst thing you can do is ignore it. Like anything else, it is too late once it is too big to ignore.