AI and the Future of Interaction (Part II)

Let's use AI to make our lives better.

Feb 05, 2025

Welcome back to Jab’s Lab. Today’s post is a follow up to Part I, talking about why ChatGPT feels incredibly powerful and still leaves us yearning for more.

I believe Part II today comes at a very interesting time. With their announcement of Deep Research this week, OpenAI is trying to make ChatGPT ubiquitous and synonymous with “doing work.” However in my mind, the value they are trying to sell is the ability to continuously outsource your tasks to AI instead of leveraging it as a thought partner and collaborator.

Instead of simply outsourcing our work to AI, we should be using AI to make our lives easier. Last week, I argued that ChatGPT should not be the main way we interact with AI in the future. This week, I go over some ways that AI should be used in the future.

Let’s dive in.

AI as a technology should make your life easier

I’ve often heard technology defined as the practical use of knowledge for solving problems. A tool we can use to get things done in a sustainable way. From ancient technology like wheels, pulleys, and hammers to modern technology like cars, computers, and smartphones, these technologies provide us tools to get things done.

People have not changed much in the last thousand years. We want technology that will make our lives easier, whether it’s carrying a bison back to the village more efficiently in 10,000 B.C. or ordering a Starbucks coffee as we grumble getting out of bed in 2025 A.D.1.

Since the early days of the internet, designers and developers have been optimizing interfaces for human interaction, making it as easy as possible to do things on the web. Countless development hours, product meetings, and funding cycles later, we’ve arrived at the pinnacle of technology: where only half the tools work, and still feel like they get in our way more than help.

Why do modern tools keep getting in our way, and how can we use AI to fix them?

Why our current tools keep getting in our way

The tech built in the last decade has made it possible to do almost anything online. But it’s often incredibly tedious. With great power comes lots of clicks.

Let’s say you want to reorder the Starbucks coffee you had yesterday. You pull out your phone. You know exactly what you want to do: order coffee. So you do the following:

Find Starbucks app on your home page, and click the app icon.
Click the “Order” tab.
Click your local store, then click “Order here”.
Go to the “Previously ordered” tab and scroll until you find “Americano” (or god forbid you have to search the menu for it).
Click “Add to cart.”
Click the “Cart” icon.
Place order.

The Starbucks app, specifically designed and built to help you order a coffee, often gets in the way of you ordering coffee.

Another example of knowing what you want to do, but technology getting in the way: “I want to book a table at a restaurant for dinner tomorrow.”

Yes, we have OpenTable. But have you tried using OpenTable lately? Even if you know the exact restaurant, the exact time, and number of people, the app will still kindly ask you to click at least 8 buttons to see the availability. Only to realize that you clicked on Friday night instead of Saturday night. My mom would probably tell me to call the restaurant, but all they are going to do is look at OpenTable on their side and tell me that there are no tables available. What a fun system.

I use Google Flights pretty religiously when trying to find a flight, because it’s way easier to type “flights Chicago to NYC on February 4th” than to go through the painstaking process of filling out one of these forms (sorry Kayak). The 5 clicks to select a destination and date is way too tedious when I already know what I’m intending to do.

Even with an interface optimized over the course of 20 years, I still have to fill everything out again after clicking One-Way instead of Round-Trip.

Now, imagine all of these tedious interfaces being automated away. This is my hope for the future of AI. You have full agency. You are in the driver seat. And you will be able to do anything you want online, easily.

How AI will fix our tools today

How would I rather order coffee? “Siri, order me an Americano from Starbucks.” Then I place my order. How would I rather find a table? “Find me a good mediterranean place to eat on Friday at 6:30pm with 4 people”. Then I book my table. I know exactly what I want to use my phone for. Now I just need the tech to help me do it.

These are exactly the types of interactions AI can unlock. To take my speech, transform it into an “intent,” and be the launching point for getting things done once again. By allowing AI to transform my English language into “intents” and drop you into a pre-filled experience, we can remove the tedious clicks that we have had to put up with in the last two decades.

I’m pretty confident we will look back at the web tools and apps of this early age of the internet and laugh, thinking “why did I ever spend an hour shopping for groceries one by one online?”

As humans, we communicate with language, and this has been optimized for far longer than web-based interfaces have been, on the order of tens to hundreds of thousands of years2. Most of the futuristic technology in movies depicts people like Tony Stark talking to AI like J.A.R.V.I.S.3, typically telling it to do something for us. I think the future is pretty similar, powered by AI tools.

So how do we go from where we are today to a more J.A.R.V.I.S.-like technology? There are two components to this. The first step is getting the flow right for specific intents. The second step is having a concierge layer that understands your intent and directs it into a specific tool.

AI Intents and Tools

Our future: Intent → AI Input → Visual Confirmation → Done ✅

Let’s break this flow down into its parts:

Intent: I want to do something on the internet. I know exactly what I want to get done, but I may not know the best or fastest way to do it.
AI Input: Speaking or typing your intent into existence, such as “Find me flights on Feb 20 to Dallas”. In this way, we are outsourcing some work to the AI. It has to infer our intent, and leverage its existing knowledge of tools to route our request to the right tool. It may have some existing context about you, like your current location, and infers some things to basically do all of the clicking you would normally do on Kayak.
Visual Confirmation: We are visual creatures. We want to verify what’s being done. We want agency. And while we may not trust an AI to book us a flight yet, we can still use its ability to find us the exact flight we should book. Rather, find us a suggested flight, and show us all of the other options it considered when finding it. And if we can see what it did, verify it was correct, and then intervene if we need to, that will be sufficient for 99% of all use cases. The agency remains with us to confirm our action.
Done: That’s it. Simple as can be4.

Why do I believe that AI apps and tools should use the flow described above? Because it’s what we already do in the real world, except we do Step 2 manually. In a grocery store, I know what I want to shop for, typically on a list I’ve already compiled (Step 1: Intent). I walk through an aisle scanning rapidly through the shelves, knowing generally what I’m going to find. We are very good at parsing a lot of visual information rapidly, just as AI is very good at parsing a lot of text and data. I see the green beans I need, add them to my cart. I continue through the rest of the aisles, and one by one add the items to my cart (Step 2: Input). Before checkout, I have my full cart and check my list one more time to make sure I didn’t forget anything5 (Step 3: Visual Confirmation). And then I pay and move on with my day (Step 4: Done).

So why does shopping on Instacart feel so tedious? We are not as good at Step 2 when the screen is a few inches wide and we have to scroll through groceries in a two-by-two list. But AI is great at this step in the digital age. AI can weave in and out of the digital aisles in milliseconds instead of minutes, and build your cart for you easily.

Instead of searching one-by-one for grocery items on your phone, imagine you give your AI agent a list of groceries you wanted, along with a profile that says “I prefer organic vegetables, McCormick spices, and meat from the butcher’s counter instead of pre-packaged” and it builds your cart for you. You still have your Step 1 intent, and you still do Steps 3 and 4, but you offload the tedious Step 2 to AI.

For these types of interfaces to work, Step 2 has to be transparent: you have to know what the AI is doing, and you have to maintain agency. Harkening back to last week’s post, the AI has to take you on the journey with it, instead of going on the journey alone and reporting the results back to you.

One example of this process done well in early AI applications is Cursor for development. It is the new application a lot of developers use nowadays to write code. It is amazing because it follows the pattern I described above: Intent → AI Input → Visual Confirmation → Done. I trust it to write a lot of code not only because it’s very good at it, but also because I can verify its work before committing to it. I think this pattern will become ubiquitous when building AI powered applications.

AI Tools and Concierges

As I mentioned, there are two components to getting to J.A.R.V.I.S. First, the tools I described above will have to be built in a way that will better leverage AI to remove the tedium of interacting with the technology. Second, we will need a concierge layer that understands your intent and directs it into a specific tool.

Just as a hotel concierge searches their vast knowledge of the city and directs you to the best restaurant given your search, an AI concierge will do the same for apps and tools.

Already, some developers are building with the new Apple Intents API for iOS. This will enable developers to leverage Siri to be the concierge for their applications. You already know what you want to do (your intent). Siri will enable you to do it, easier. While I’m yet to see an iOS app that leverages this framework well, the pattern is clearly there and can be extremely powerful if done right.

Some AI companies like Anthropic are also starting to build toward the concierge as well with Model Context Protocol (MCP). It will allow developers to build their individual tools, and tell an AI concierge how it can use them. Similar to how a restaurant would call a concierge to ask to be included in their recommendations. Each individual tool or app can be great, and having a ChatGPT6 or Claude being a concierge will be an amazing way to link them all together, and use language as our natural entry point to a system.

The AI concierge is not a replacement for existing apps, tools, or websites. Rather, it is the new entry point. Text and voice will be the gateway for your intent, and the AI concierge will transport you to optimized screens that reflect the work that the AI did in the background. Whether it’s having AI populate a Starbucks order, or an Instacart cart, we should be able to see and verify this output using interfaces that are familiar to us.

Over time, everything you input on the web may become a text or voice-based prompt, with a visual confirmation of the work that it did in the background. It will bring you on the journey with it, and then get out of your way to confirm the action.

Let’s build better with AI

If we leverage AI properly, it will be magical. Finally, we will have technology that gets out of your way and just does the thing you want it to do.

If you are building tech, and you are not yet using AI to build better products, I would encourage you to think about your customer’s intents when they are using your app. If you want to talk about how to incorporate AI into your product, feel free to schedule some time with me and we can talk about how we can use AI to make your product better for your customers.

I’ll leave you with a question that I’m curious to hear your answers to: How else can we use AI to make our lives better?

Let’s go build an exciting future.

-Cory

It’s a real tough life we live in this modern day and age.

ChatGPT told me that language began tens of thousands of years ago. Google sent me to research papers and then a fascinating deep dive on Reddit that made it clear there is no one answer. But still likely 200,000 years ago or so. This exercise raises the question, what happens when we accept the “one answer that AI gives us” without being able to validate it on our own through multiple sources? Do we cease to have agency? Do we blindly accept the answer or viewpoint? I think that having a variety of models for this case is of the utmost importance to ensure that no one model has information dominance. However, because this post is more about the future of how we interact with AI rather than the broad complex political debates surrounding AI, I digress.

I learned today that J.A.R.V.I.S. stands for "Just a Rather Very Intelligent System".

You may wonder how what I am describing differs from “AI agents”. The flow I described above is exactly what an “AI agent” does. I personally dislike the term “AI agents”. It has an unnecessarily ominous connotation, making it feel like Hugo Weaving is going to replicate into infinity and we have to fight him using kung fu. This is way more scary than what an AI agent actually should be: a tedium automator. I hope the term “AI agent” doesn’t stick long-term.

As kids, were we all traumatized when our mom left us in the checkout line to grab one more item saying “I’ll be right back,” or was that just me and my siblings?

ChatGPT plugins were incredibly exciting for me back in 2023, since you could select a list of “apps” you could have the AI use inside of ChatGPT. This was along the right lines, but ultimately not the right interface, and they discontinued them in favor of people integrating the AI into their apps. The Instacart ChatGPT plugin was incredibly cool, but I think that Instacart can probably leverage AI better in its own native app than it could through ChatGPT, i.e. it can build a better tool using AI.

Jab's Lab

Discussion about this post