Prefer to listen rather than read? Hit play on the podcast version:
Rene Ritchie: I’m Rene Ritchie and this is Vector. Vector is brought to you today by thrifter.com, fussily, carefully, considerately selected all the best deals from across the Internet, from Best Buy or from Amazon, from everyone, all day, every day. If you’re looking for something, just go to thrifter.com and check it out. Thanks, Thrifter.
Brian Roemmele, welcome to the show.
Brian Roemmele: Great to be here, Rene. Thank you so much.
Rene: I’ve really enjoyed chatting with you on Twitter. Just now that I Vector going again, I really wanted to actually chat with you in person because it’s so much more fun.
Brian: Thank you. I appreciate it. I’m a big fan of your work, so excited to be here.
Rene: Likewise. When we first started chatting, it was mostly about Apple Pay and the advent of contact list and e-payments, and now we talk a lot about voice first. Would you tell us a little bit of your background and what you’re into and now, you got to be into it?
Brian: I’ll try to give it as short as I can.
Rene: Sure. [laughs]
Brian: I grew up in central New Jersey, the Princeton area. I grew up in an era when Bell Laboratories was the most innovative place on the planet. Of course, Bell Laboratories was doing very early voice recognition and even some early AI research, but not really. Mostly voice recognition, a little bit of intent extraction.
As a young kid whose friends’ parents worked at Bell Laboratories, we got to go in there and see the work. It just captivated my imagination, and I said, “You know, humans are primarily built around speaking.”
In fact, when you look at the phonological loop and Broca’s area, and Wernicke’s area, and all the different parts of the brain, there’s so much brain power and energy dedicated to communication via voice.
I said to myself, and this is back in the ’80s…I said, “You know, we had to adopt an arcane method to try to communicate with computers using syntax, using programming, using punch cards, keyboards, all of this stuff for one primary reason. The computer couldn’t understand us.”
I did an Einsteinian thought experiment, being in Princeton. I looked at the future going backwards. I imagined a point in the future and I said, “Would there ever be a point in the future where the computer understands our intent and our context deeply?” The answer was, of course, yes.
To the arc of time, I don’t know how many decades it would have been, but I always thought it would be about 2030 to 2050. I was off by a little bit.
What I imagined was that AI would be strong enough for us to be able to extract the intent of our words, not just speech-to-text, but the actual intent of those words. I knew enough about AI even in those early days, and I later learned a heck of a lot more, that machine-learning AI over time, we will solve the context problem.
Context is what you really need to solve with humans, not so much being able to answer any question, that the Turing test is an example of a fallacy. One does not need a Turing test in the world, because we’re not trying to trick a human that they’re speaking with another human. What we’re trying to do is extract context of what the human wants to do.
All we ar,e are tool-builders. That’s all humans have ever been, and we use tools to make machines to try to work a lever to get work done. That work that we get done today is, when you distill what we do at a computer, is we try to find basic information. Not even facts, we want general information.
Like is the population of Portugal greater than 12 million or less than 12 million?
Rene: Where can I get a good steak tonight? [laughs]
Brian: Exactly. That intrigued me, so that started. I had an early background in programming. I thought I was going to be a physicist when I was living in Princeton. In high school, we had access to the university as a high school student, so I was in a program where I was taking university-grade physics classes.
I got into programming. I programmed a point-of-sales system, which to me was just a database. Turned out the company that asked me to do that was very interested in having credit card acceptance in there. I got enamored over the idea of electronic payments. That became one of my lifelong themes is, payments took over a bit of my last three decades on and off.
I had to wait for my dream of AI and machine learning to get good enough to where it’s useful. That date was about the birth of Siri out of SRI International. This was about two years before it was released and about three years before Apple acquired the company, I got to see it very early days.
Some of the early Bell Lab researchers I knew actually went to SRI after Bell Labs was basically disintegrated in the divestiture anti-trust action. They invited me in, and my mind was blown. I said, “We’re here. We’re here, and, and it’s early 2000s. This is great.” As we all know, as Apple fans, the last act as a CEO acquisition of Steve Jobs was to acquire Siri. I can tell you that he saw Siri as the most important future for Apple.
To some people, he confided that it was more important than the iPhone, the iPad, and the Mac combined. That’s how big he thought voice was going to become. Again, it’s not just voice recognition because that went on in the ’80s and nobody liked it. I’m not talking IVR, the annoying aspect that we all know about phone trees.
What I’m talking about is voice mediated AI. That’s being able to say to a computer, “Go and book a restaurant,” or “Go and get an Uber.” Those are the easy things. What’s the weather like? How’s the traffic? You start working it up Mazlow’s pyramid to the things that really, we want to get done in throughout the day.
As the context got better and it knows us more, meaning that we’re giving up a lot more information than we ever have to make this thing work…Maybe we’ll talk about the privacy issues that really concern me about this, but it’s inevitable. Steve saw that. I think Steve saw that and he said, “People don’t need to be in front of screens all the time.”
That was a detour. We shouldn’t be pounding away our thumbs on a screen. That was a detour. We should be able to tell our systems what work we want to get done and it brings back the pictures that we want, or the videos that we want, or the interactions that we want. Now, is it voice only? No. I call it Voice First.
That means that we’re still going to type. We’re just going to do it less. We’re still going to gesture. We’re just going to do it less. In the AR world, or the VR world, you’re not going to be waving your hands around, especially walking down the street. I mean, it’s already bad enough you got these big goggles on your head with your hands flailing around.
Brian: I think it’s going to make sure that there’s no reproduction ever to take place in human history after enough guys like us walk around with those things, you know? Anyway…
Rene: It’s a tangent, but I’ll put a link on one of the preview shows we had, a former Apple Siri design user experience lead, talking about how they had to adjust context depending on how much screen you had in front of you, everything from an iPhone you are looking at to a car, to a television, and just how much more or less a verbose they had to make the voice part of that, just to adjust for the context.
Brian: That comes from a philosophy. We’ll cover my divergence in the philosophy Apple has had versus Alexa and Google. There’s a big divergence and it’s becoming immensely obvious, post CES 2018. To put a little end cap on my little subterfuge here about my interest in voice, it started at a very young age on the Commodore 64 VIC-20.
I made the very first sound card for voice. It had a voice synthesizer. We built that out of my garage and it’s all a haze, how many we sold. I was young and we were soldering into the night and that’s when we didn’t know solder was probably not a good thing to be breathing.
Brian: That’s my early hardware and software experience. I got into payments, merchant processing, banking, electronic payments, online payments, tablet based payments. I became an advisor to a lot of companies that you may be familiar with in payments, and always found it interesting. My background is in commerce. My background’s in technology.
What I call the Voice First revolution, the technology that’s going to be really making this pay for itself isn’t Pay Per Click ads, it’s going to be voice commerce. It kind of falls right in line with my background in how payments will become almost invisible to the experience. One might call it an uber experience where you really don’t feel the payment aspect.
In an Apple Pay experience…As we know, I’m a big fan of Apple Pay and I’m not a fan of how it was promoted, but I’m a fan of the idea. That’s how I arrived at this point. It was when Alexa finally hit the market, 2014. Excuse me, Alexa, stop.
Rene: You just ordered a dollhouse. [laughs]
Brian: Yeah, I think so. I don’t know what I ordered, but it’s big. I said to myself, this is it. I had some early notice about the talking Kindle book. I knew about it because I was flying around people, going to meet-ups and seminars of AI researchers and voice researchers. There was a rumor. That’s all I can say at this point.
There was a rumor that they were working on the talking Kindle. I’d already been down the road of the talking Kindle. I said, “This is amazing. This is great, if only they had voice command on it.”
Of course, when I saw Alexa came out, we actually had it within a couple of weeks of its announcement. We were one of the first families to get it. It occupied the same place in our kitchen since. My kids grew up around it. I watched how they became so acclimated to having a voice in the room that it confirmed my early suspicions and how voice would permeate our life.
I dusted off what I called my “Voice Manifesto,” which I wrote. I think the last type-written pages was in ’89. I had created a lot of work product over the years but I didn’t link it. It was just I didn’t want to go back to the pages. I purposely typed it for a lot of psychological reasons. It’s over 900 pages.
I started saying, “This is time to start thinking this.” Since then, I’ve just said it’s time that I decloak about my views on this and hopefully add whatever I can to build an ecosystem around it. I think it was Malcolm Gladwell.
I don’t know if I’d buy into it, but after so many hundred thousand, or ten thousand hours…I mean, I’ve been thinking about this stuff since the 1980s, really kind of consistently. I’ve been down every one of the paths.
When it came time to start advising folks about what voice is going to represent to their company, to their startup, their brand, their legacy brand, it was really second nature to me, especially the commerce background.
To be able to say, “What does your brand look like when your logo is no longer present? What does your brand look like when they, say, order paper towels, or we order paper towels?” They’re not brand specifying, you know, these types of quagmires.
Finally, Google said uncle. About a year ago, the chief for Google Pay Per Click VP said, “The days for the Pay Per Click ads are over when the voice first world. We need, as a company, to shift into something else and that something else is commerce.” That’s the end cap of my commerce and voice intertwining.
Rene: It’s interesting that both those technologies matured at almost the same time. The big Apple Pay and Google Pay, Siri, Google Assistant and Alexa, they all seem to be coming into fruition at the same moment.
Brian: And Amazon Pay, right? Amazon pay is huge now. History’s going to be very strange when it looks at these convergences. It almost looks like everything that’s fell into place at all the right moments because prior to that, the way we were doing payments was just bizarre. I mean, it was ancient.
You had to put in a CVV2 number and there was no trust. You had to go and jump through all these hoops. Guess who changed that? The One-Click System. A guy named Jeff Bezos filed a patent a decade ago. It’s already expired. His name is on a patent.
Here’s the same guy reinventing what I call voice commerce. He’s got 12,000 people in his army just working on Alexa. That’s more than Google, Apple, Microsoft, everybody is working. That’s maybe 3X more than all those folks are working.
Rene: You heard this. People were saying. They would talk about what it took to make an iPhone or an Android phone. You had to have the advent the mobile data getting much, much faster, the microprocessors getting smaller, and the chipsets had to be of a certain kind.
It all came together and suddenly, we have iPhone and Android phones. This always felt similar. You had to have all the ingredients by themselves wherein enough, they had to fall into that primordial stew at the right time to spark life and whatever comes next.
Brian: It’s amazing because when those conditions are right, it explodes. We can see the explosion pattern of the adoption of what I call voice first devices, what we might call Alexa or Google Assistant.
Rene: Let’s go back for a second because I’m so excited. Let’s go back for a second. Siri was an app and then Apple bought them. They integrated it into what became the iPhone 4S. The two big breakthroughs, at least back then, that people talked about that were interesting with Siri was what you mentioned, context awareness.
You could say words and it would sort of try to divine what you meant, and also sequential inference so that you could talk to it more like you talk to a human who, if you asked for something, it remembered what you asked for. You could ask for the next thing without having to go back and sort of redo the chain the whole time.
What did you think when you first saw that? You’ve been so interested for such a long time and then here it was in sort of a mainstream product.
Brian: Wow, Rene. That’s a great question. It was revolutionary to me. It felt like the same moment when I first touched the iPhone 1. I mean, the little hairs went up on my back and I said, “I’m interacting with something that is historic.” I remember just testing it out. Again, I saw it before it was an Apple product.
In some ways, Siri was more powerful as a standalone system than it was when Apple integrated it.
Rene: Far more integrations, right?
Brian: Yeah. You were able to order a table at a restaurant, book a flower order.
Rene: …get a taxi, [laughs]
Brian: Yeah, taxis.
Rene: …all the stuff it took Apple five years to give us back. [laughs]
Brian: Yeah, and we all had great anticipation at the moment it was being acquired. Again, we didn’t know Steve was not going to be around when it was acquired but there was rumor that Steve took this more serious than anything in his entire career. I can tell you, from insiders, that was what was transmitted to make this acquisition happen.
They did not need to sell it. SRI International spoke this off, a military contracting company, primarily. This was the results of a decade of military contracts. It was like a NASA. This is like a NASA project. SRI said, “We’ll help get you funded for a decade to make this work. This is great technology.”
There was a lot of behind the scenes promises made to those folks that built Siri, that they’re going to take it serious, that it’s going to be its own platform. It will not be an appendage. Now, this is an important thing. Platform versus OS appendage, it’s a philosophical construct that’s really hurt Apple at this point.
When I first saw it, I just said, “This is the future.” Obviously, Amazon was not even close to doing anything. Siri owned the world. They had at least a five-year head start. Then, we went through the Dark Ages.
Rene: Before we get in Dark Ages, what made Siri miraculous to me is that back then, my god kids were really, really young. They were like three and five, or three and six. They could read or write very basically but they could never use iMessage with a keyboard or anything like that.
I walked in on them and they had iPod touches back then and they were sending and receiving iMessage with their mom through entirely using Siri. They were just dictating their messages, having Siri read the messages to them, and having these conversations.
If you look at Apple’s history from mainstreaming computers, making them increasingly accessible and easy to use, that, to me, was just the golden moment. You made computing accessible to people who would never otherwise be able to use them.
Brian: Oh, my god. This is exactly what I saw and fell to my life. I said, “This is a paramount moment for Apple.” Man, if they just take this and ran with it, they have created the ultimate lever.
All humans are tool builders and we’re just trying to make the bigger and bigger lever to try to move bigger and bigger work, if you will.
This idea of always having to use our thumbs, when you think about it, we think in a voice in our head. Anybody that’s trying to type something, they have to first put it into a voice in their head and then they type. It isn’t until somebody tells you to actually examine that that you realize, “Holy cow. I’m actually transcribing my inner voice.”
Rene: And almost translating it because you have to go through a process to make it into words that’s not necessary when you’re just speaking.
Brian: It’s a throughput process. You have to mechanically try to find each letter, and of course, there’s “muscle memory” but that’s still a cognitive load to try to type it up.
Rene: A formalization that you have to elaborate that you don’t just have when you speak which is much quicker often.
Brian: It’s more nuanced. Our conversation is a lot more interesting, I hope…
Brian: …that when you hear it, that when you read the transcription…The transcription is great to breeze through, but humans are so adept. Evolution has given us this power to use our brain. The phonological loop is a big part of our brain. Our prefrontal cortex, all of our creativity drops right into the phonological loop.
If I was to take Broca’s area out of your brain which is the voice that you hear when you’re reading and typing, you could never type anything. Literally, you could never really type anything. You might be able to read things because Wernicke’s area is still in there but you wouldn’t really be able to understand what those words are. Our brains have developed this power.
The computer has, for the last 56 years, we’re not smart enough to understand us, we had to take the sidestep. As what Steve knew, and as what a lot of deep researchers that have really looked at this from a practical standpoint, not a sci-fi. I don’t come at this because of Star Trek although it’s interesting.
Rene: Yes. [laughs]
Brian: I don’t come at it from a nerd point of view that, “Oh, it’s just cool to sit in my chair and spark out commands.” Although, that’s cool, too. I come at it from a humanistic point of view, that that’s were designed for.
We have only been typing for about 200 years and we’ve only been typing up, we’re using our thumbs, primarily for about eight, nine years. There’s power over the ability to say something. We don’t know that. The important things we want to say to somebody that’s important to us. Hopefully, you don’t want to text it to somebody.
Brian: The youth cohort — everybody says millennials, I just say younger folks — they’re actually doing what you saw taking place on the iPad. They are actually saying what they want to say into Siri, translating it into an Apple message, and then they’re reading it back.
I think Apple may have released this officially, I hope they have. In that cohort, it’s over 60 percent of text messages are composed that way and this is between the ages of 8 and 16, 17 years old.
Rene: I know we’ll get into it more but I almost always use Siri for everything. I only don’t use Siri when I have to not use her. [laughs] It’s just so much easier, that way of interacting.
Brian: This will tie into another thing we have to cover, hopefully, and that is what I call peak app, the idea that voice is going to be the ends of apps. Apps have already reached a sort of peak. The concept of an app and voice is going to pretty much make sure that it ends and something else comes along.
Rene: It allows you, and again, we’re going off on a cliff tangent, but the way the web became unbundled into HTTPs services. You don’t have to use websites anymore, you can use API.
Rene: Voice enables you not to use apps anymore, you can just use features and functionality regardless of the app bundle.
Brian: That’s why I was so excited when Apple acquired Workflow because Workflow is the ultimate real-time construction system for AI.
If your voice AI, or Siri, doesn’t know how to do something, it would find, through metadata, through taxonomies and anthologies that would be built into the modern new apps, that it just needs to download or let’s call them cloud apps, if you will, to access different aspects.
You might say, “Book me a ride on Uber, I’d like to order flowers on the way there, and book restaurant at eight o’clock with Luigi’s.” You don’t have any of that on your phone and the workflow type of system, and Workflow can do this right now, it will find those apps, pipe into those data points, and make those things happen in real-time on an OS level.
Then, there are apps, but they’re really aren’t apps, they’re anthologies and taxonomies that the voice-mediated AI is accessing. That becomes an entirely different developer community, what I think is a much richer developer community, both in the ability to perform work and financially. I think it’s going too far…
Rene: We have Extensibility in place which lets all these apps surface functionalities, regardless of the app itself anyway…
Brian: Exactly because we don’t even know the functionality of most apps because we don’t even get down that low into the apps architecture. It’s an opportunity, but that’s the problem inside of Apple.
Rene: Let’s go back to that. You saw Siri and then what happened between Siri and the first time you saw Alexa?
Brian: I cried. My heart was broken.
Brian: I saw Siri die in a vine, and I saw some of its pest minds leave that company, and I said, “What in the heck is going on with my Apple? My Apple that I love.” I love these guys. Anybody who’s reading my stuff knows that I’m not an anti-Apple. I’m a pro-Apple to a fault. I still own 1980s and 1990s Apples in my museum. Even during the bad Quadra years…
Brian: …I still have the Quadras sitting around. I believe in rainbow but I also am a realist.
Rene: Like Greg Clausen left and some of the Siri program managers left and…
Brian: Dag and the main Siri people left and they started Viv. Apple had the opportunity to buy Viv, and I’m going to be nice, some idiot out in the executive level decided that Viv was not of any value and gave it to Samsung.
What in the heck were they thinking? Their chief competitor. The most powerful AI tool that I’ve seen in my life is in Viv, and they had the ability to buy that.
I don’t know what kind of thinking was going on other than a philosophical divide within a company that’s aging, and I hope is always innovative, but everything gets old, everything ages, and you have to reinvent yourself. I don’t know how you do that in a post-Steve Jobs world.
Rene: Is that what you mentioned earlier? It’s that is seeing Siri as an appendage and not a platform?
Brian: Yeah. It’s a philosophical problem inside the Apple. The Apple apologists, I don’t mean to hurt anybody’s feelings, they will go out there and they will parrot out, “Oh, Siri’s no big deal. Nobody’s really using it.”
“Oh, yeah, Alexa, it’s exploding. It’s the fastest growing platform in the history of humanity.” “Oh, but that’s not a big deal. It’s all going to end.” “Oh, but, hold it. Jeff Bezos cannot be that crazy. He’s got 12,000 people just working on Alexa.”
“Oh, but Apple is going to…And do an end run with Home Pod.” “Oh, Home Pod is not coming out.”
[Alexa speaks on the background]
Brian: I know, Alexa, you don’t have that.
Brian: Alexa’s answering that.
What happened? What happened is you drink a little too much of your own cool ad and you start believing that the future is always going to look like the past.
You think that Surfaces and something you carry around in your pocket, that you’ve gotten very used to, and you’ve gotten very rich and maybe really fat on — this is where your protein source has been coming from — you don’t want it to go away. It’s classic Clayton Christensen.
Even though we know we’ve reached peak app, and nobody wants to say that because it is, in a sense, another shot over the bow of Apple, you can’t redesign the App Store enough, you can’t pull out “junk apps” enough. The average person has downloaded less the three apps last year. That’s peak app.
Whereas in the early days, people were downloading 20, 30 apps. Were they using them all? No, but there was exposure.
Rene: There was excitement?
Brian: Yeah, there was excitement. Discovery is broken for apps, it’s broken miserably. I don’t believe that the new App Store has really improved Discovery all that much. Developer ecosystem is constraining. People are siloed inside their social media and the social media silos are becoming their own ecosystems, very much like what we see in Asia.
Brian: Yeah, and it’s happening in the US within Facebook, Instagram. Now we know what’s going on with Snap, it’s not looking so well with the cloning of Snap into Instagram.
Now, what happens? If you are Apple, and your vision is thinner, faster, more feature-rich devices, and somebody wakes you up one day and says the device is going to disappear and most of your work is going to be done via your voice, then the advantage you had by your OS being beautiful, looking beautiful, acting functionally beautiful in comparison to Android, no doubt.
Having a device that’s functionally more beautiful, thinner, just more seductive to play with, with the ability to read your facial expressions and all that kind of stuff, all of a sudden you start saying, “No. I don’t want that world. We need a device. Yeah, voice is interesting, but people are going to type because that’s what they did in the past.”
The reality is, that’s not how history has ever worked out. Some people say humans are lazy. I don’t know if I want to use that definition. I say humans are always tool builders and they’re trying to make their life more productive, even though we might, and analyze wasting time on a social media…
Brian: …is maybe not productive, but let’s assume that most of the stuff that we’re doing, we’re trying to get to an answer.
Rene: You were the one to tweet as efficiently possible regardless of whether you think tweeting is productive or not. [laughs]
Brian: Exactly. When you really analyze work to be done — that’s how I see this through the lens of how humans will do the work of accessing of a computer — is that we’ve become the machine of an end result of a nine million result Google search.
We sit here and we go, “Oh, man. We’re so modern. We have this instant access. We have all the information in the world. Look, Google just gave us nine million results. What are those really sketchy three results at the top that says ad next to it?”
Brian: Then, you start having to say, “Hold it. I just spent an hour sifting through this powerful nine million search result. Have I gotten really that far? But Google’s algorithm gets better all the time.”
No, it really doesn’t. Even though it knows what is in your Gmail, even though it knows a whole lot about your contacts which you would freak out if you knew it knew, it’s still isn’t good enough because it’s not deeply contextual to you in a way that a personal assistant would.
That’s what we are ultimately going towards is the personal assistant, and none exist today, in the modern incarnations of Siri, Alexa, Cortana, and Google Assistant. They are not personal assistants. They are voice front-ends to AI. It’s what they are right now.
Rene: I want to get into that but I want to ask you first, what was the difference when you saw Alexa compared to Siri? Was did Amazon get right?
Brian: Do you mean what made Alexa become what it is today, in a sense?
Rene: Yeah. People who are not predisposed to Amazon would just say, “Amazon is like the Google of assistants, or like the Android of assistants.” It is a commodity system that anybody can license and embed and you’ll always have a market for free.
Other people might say, “No. It’s functionally superior,” or, “They were smart enough to add integrations,” or, “Yes to all of those things.” [laughs]
Brian: Rene, I’ve lived through the PC versus Mac era. I’ve lived through UNIX versus PC.
Brian: I’ve lived through iOS versus Android. We are in a new world where these analogies actually don’t even fit anymore. I think that’s why a lot of the very, very smart people, who are in the Apple part of the fence thinking that Alexa is just a waste of time and a little toy.
Every year scratch their head and wonder why does it keep getting bigger and why is Apple keep going further behind, especially after the CES, a lot of very notable analysts are starting to come around and saying, “Apple is glaringly behind. They made maybe a very, very bad mistake not taking Siri as a platform seriously.”
Why is it not the same analogy? It’s because basically they’re a different way to access a computer than what we’ve ever known before. In a sense, what we’re doing is we’re cherry picking the easy things.
When I first got my computer, I’m looking at it right now as a Sinclair ZX 80. I soldered it together and I had to get a magazine to get programs. I could program something on my own, but my very first “Space Invaders” game was in a British magazine that I got for $25. I’d go, “A magazine for $25?” All the import duties, whatever.
I literally hand-coded because I didn’t have my tape drive yet. Every time I wanted to play that game, it was in basic. We’re not even in that phase of the Voice First revolution.
We are literally setting timers, we’re playing music, we’re doing very rudimentary things. The context that these systems have for better or for worse is so light that it’s still serving functionality in people’s life.
Obviously, you can’t argue with the growth in the numbers. People are not only buying new things. They’re buying more of them. The average person now has 2.3 Amazon Echo devices in their home. That doesn’t mean they’re not using them.
The people that are sitting there as a [inaudible 32:27] on the wall, never using the devices themselves, saying, “Oh, yeah. They buy them but they don’t use them. Or they’re just listening to music.” They’re not living in the real world. They’re not actually doing the research. They’re just sitting there, I don’t know, drinking Kool-Aid.
The bottom line is people are using them. They’re buying more of them. The fastest-growing sector within Amazon’s sales outside the Eco Dot was buying them by half a dozen. They sold a lot of kits by half a dozen.
That means people are sticking them in basically every room in their home. That does not belie a reality where people buy them and they don’t use them. Or they just want a speaker that they can listen to while they’re in the bathroom or in the kitchen.
It’s not only that. It’s also a social network. It’s a communication tool. There’s a lot more into this. Again, that’s what the computer became, too. When Steve first started — Steve in the garage — what would they tell the world?
This will be on everybody’s kitchen table. Why? The reason was very simple — to manage your checkbook and to manage your recipes. You can actually go back and look at Steve giving seminars in early Apple events where he’s saying, “Yeah, everybody’s going to have it to balance their checkbook and do recipes.”
I argue that almost nobody bought these computers — Apple II and the first Macs — to do that. That’s what people are saying they’re buying voice-first devices for — to listen to music and to set timers.
A few people are doing that, but they’re actually getting things done. Once you start talking to folks that really use them and they tend to be outside the tech sector, this is like the average person saw the adoption pattern before the tech world did, which is funny.
It’s the first time this has really ever happened. That’s why it’s sandbagged a lot of people. That’s why some get arrogant about it.
Rene: I think it wasn’t intuitive, too. You would expect this from Google, for example, because they’re big on AI. Amazon didn’t have the systems and services that Apple or Google or Microsoft had.
They didn’t have their own email, their own messaging, their own operating system. I think that’s part of what surprised people is that the expectation was that Google would be where Amazon is.
Brian: That’s a good point, Rene. I’ll tell you why I think this happened. It was built by a merchant. It was not built by an engineer. It was built by somebody who sell things to people and has to satisfy people in real time.
When you’re a merchant…I learned this from 30 years. I’ve been schooled with the PhD of merchants. If they don’t sell stuff, they’re out of business. They wake up at four o’clock in the morning and make our donuts and our bagels. If they don’t do it the right way, a couple of weeks, they ain’t there anymore.
They don’t have the luxury of sitting there with somebody massaging their back and coding and saying, “I’ll try this out.” There is a rationality to this, and that’s what drove Steve. Steve was a merchant.
When Steve got on the stage, he was doing a sales seminar. He was doing a classic circus-comes-to-town, carnival barker sales seminar. It was beautiful and people loved it. We don’t have that.
Jeff Bezos is about as close as we get to that sort of idea, because there’s a rationalism. People have to prove it with their wallet. Steve was always number two also. He was always fighting a bigger company, so he had to make sure that he was satisfying people and delighting people to a level that was beyond their expectation. We forget that.
On the other side, you couldn’t even get a job at Google unless you answer some asinine test of how many tennis balls would fit in a car on a hot day going down a hill in San Francisco.
It’s like you built a company you deserve. If you, in fact, believe that the defining thing that’s going to make your future as an organization is engineering-only talent, good luck with that.
Yes, you’re going to get caught surprised. You’re going to make Google Glass. You’re going to sell the best robotic company on the planet — Boston Robotics — and not realize that you made one of the biggest mistakes.
I love Google by the way but I also realized what Steve realized. What a lot of other folks to follow Apple realized is that if you look at the world purely through an engineering-only lens — I’m an engineer. I could say this and I’m not putting down engineers — you need to have the balance of the real world.
The reason why Steve did so well with walking into the Xerox Palo Alto Research Center is because of one reason. He walked into an engineering-only operation. That computer was done. The Alto was done. It was ready to go but the engineers wouldn’t let go of it.
Steve goes, “I only saw 3 things and I should have saw 10. Those three things gave me the Mac.” He said it wasn’t ready and he’s saying, “What the heck are you talking about? I’m going to slap them together and put them out. It’s ready.”
You need somebody that transcends engineering. They understand it. Maybe Steve wasn’t an engineer. Maybe he was. I happen to think he was in a very practical sense. He said, “Let’s go with it. Let’s ship it. It ain’t perfect but it’s better than what’s out there.”
Where is Palo Alto Research Center now? Where is Xerox? What happened? If you live and breathe by the engineering culture, you have a problem. There is where Google is.
Google is sitting there saying, “Boss, I don’t want to give it a name. If we give it a name, we’re going to have to give it a gender. We have to give it a company of origin. Us engineers engineered around this idea. We don’t want to do anything wrong to upset folks, so let’s just call it Google. Ah, it sounds good.”
Rene: …too. Going back to my experience watching others with Siri and now with Amazon is they treat it almost like a Pixar character. They seem like they have a relationship with it, and that’s part of the bond. You don’t have that when you’re talking to a computer.
Brian: That is so astute and that is why the future graphic artists…Steve liberated the graphic artist into the computer. It was heresy. I remember being a Comdex is. They would say, “How dare you take my CPU cycles and run around pretty pictures on the screen? Give me a command line. These pretty pictures will never beat the command line.”
Does that sound familiar?
Brian: Yeah, it sounds what the voice thing is today. I have the same arguments with people. Give me my thumbs. I’ll book my thing and I’ll do this and I go, “I can do that in three seconds just by doing a voice command.”
Who are the graphic artists of the future? I tell you who they are. They’re the storytellers. They’re the writers. They’re the psychologists, psychoanalysts. They’re the philosophers. Those are the people that are going to shape the future of this interactivity.
If Steve was around today, he would have a division within Apple that is full of all these beatnik poets and crazy people that you picked up out of Berkeley. It would look like Apple in the 1970s. That’s what his vision was.
Now, that’s obviously not what’s going on. I’m not putting the blame on Tim Cook or anybody. I’m just saying that when you’re getting disrupted by an interface that does not allow you to showcase the greatness of your company, you don’t want to accept that reality.
You don’t want to think that everything you do is going to be a disembodied voice. I’m not saying everything, but that’s what some people are starting to get scared by and then saying, “If all it is going to be is a disembodied voice, then what’s the struggle going to be?”
It’s not going to be the android versus iOS struggle. It’s not going to be the PC versus Mac. I’ll tell you what it’s going to be. The personal assistant that bonds with us better, the personal assistant that understands us better, personal assistant that we trust more.
It is locked down our privacy in such a way that we have no doubt in our mind that it’s not sitting up in the cloud and being harvested so that somebody can sell us a new toaster when we’re least expecting it.
Who’s in a better position to do that? I can tell you who that company is, and that’s Apple. Apple just doesn’t know it yet, because there’s nobody galvanizing this experience in that side of Apple.
You have layers of divisions and you have apologists outside of Apple saying, “Atta-boy, Apple. Siri’s no big deal. Don’t let that Amazon thing get you down. Keep going. It is an aberration.”
Those folks are doing Apple with disservice like they did in the 1970s, ’80s, and even the ’90s. They did a disservice because they’re trying to say the world is always going to look like a Quattro 477 computer or something like that.
The company needs a reset. It needs to look at voice, which is their natural province to own. I’m not saying it’s all over for Apple. I’m saying that if leadership rises up through this quagmire they’re in and says, “This is its own platform,” it’s going to mediate everything Apple does but it needs to have Siri OS.
It needs to have an entire development team and I better get a lot of these people out of the market before Amazon sucks them all up. There are not enough experts left in the market, and we’re not going to be able to produce them.
Amazon employs most of them and people who have what I called…Let’s call it expert. I don’t like the word expert. I see myself as a student, but there are probably about 25 Voice First experts on the planet, and most of them are gravitating to Amazon.
You’re not going to organically make these folks. These are people that have disciplines of psychology background, philosophy. They know Maslow’s hierarchy. They know [inaudible 42:21] and archetypes.
They know all these different things that you need to make these things work. They need to control the AI scientists. They’re trying to prove to the world that they’re going to invent general AI, or the Turing test is going to be proven.
I don’t give a crap about the Turing test. I’m not trying to make people believe they’re talking to another human. I want to see people be able to have their context extracted so that they can basically make a command and have a lot of work be done with that simple command. That’s the future.
Rene: I want to get into the future because I think that’ll be a good place for us to cap it. What’s the state of the market? How do you feel the state of the market is right when you compare Siri to Amazon’s Alexa, to Microsoft’s Cortana, to Samsung’s Viv, to Google’s Assistant? Where do you see them on now in the market?
Brian: That’s a great question. Now, there’s two ways to look at this. One is the functional electronics, and the other is the actual speech recognition, and then finally intent extraction or the otherwise known as AI machine learning aspect.
Functional electronics. Apple is in the worst possible sense because none of their functional electronics is far-field voice recognition. If you look at the ring around an Amazon device, you notice that there’s eight microphones on a radial circle and one in the center.
This is all echolocation, it’s noise cancellation, and it is incredible technology. It’s designed…I don’t know if you’ve ever done this but I challenge anybody to lower the volume on a Ramones song — that’s how I test my AI devices — as loud as it can go and to lower the volume. It does. It hears my voice through.
What some people would say, “I want a piece of bacon.” [laughs] [inaudible 44:08] . The thing is optimized for the far field. Now, try that with Siri. It’s got maybe two microphones in a more modern device. It’s more designed to get your voice to go over a cellular network, so it sounds good to another human ear. That is exactly what you don’t need for intent extraction and natural language recognition.
Rene: I don’t think it was public, but Craig did a HomePod demo in a music-blaring speaking in a whisper. You couldn’t hear him next to you, but the HomePod heard you. That’s what you will hear.
Brian: HomePod is the beginnings of Apple to show the world from a hardware perspective that they’re gotten the science down for that, but the fact is it may not be enough. That’s not the experience that people are going to…
Whispering in and of itself is another technology, and Apple’s got three patents that are related to actually whispering to communicate to these devices. It’s another modality of communication. It’s between typing and barking out your commands in public, which people think.
Everybody’s going to sound like they have Tourette’s in public, and it’s not like that. That’s not what I’m talking about. I never said voice-only. You hear the text when it’s appropriate but you’re going to be texting a lot less and gesturing a lot less because you get more work done with a few words.
Now, we get into natural language recognition. I would say…
Rene: I’m sorry. Where are the others with the hardware side?
Brian: What’s that?
Rene: Where are the other competitors with the hardware side?
Brian: I would say Amazon is by far, what’s in the market today, the best. I did test HomePod and I loved what I saw under the test conditions but I can’t honestly say that is the best at this point. It felt like it did. It felt like it was the best.
Then I got sandbagged when some idiot decided to take a device that has a processor equivalent to an iPhone 7 and make it functional-less unless you have an iPhone around.
That’s what the HomePod was being advertised to be. It had no intelligence unless your iPhone was around. It had basic intelligence. That to me said, “Somebody who has no clue about what the future looks like won the argument inside of Apple and said, ‘This is just an appendage to an iPhone folks. Nothing to see here.
We’re going to dumb down this processor even though it could literally operate circles around what’s in the market, because it’s a powerful processor. We’re just going to dumb it down, because it won’t work unless you got your iPhone tethered to it.'” What the heck. What is it thinking? Anyway, I had to get that off of my chest.
Rene: Sure. [laughs]
Brian: I’m sorry if you’re the idiot apple that’s listening to it. Take a shower, wake up, you made a bad decision. Move forward because history is not on your side on that decision.
I don’t think it will hit market that way, by the way. I think it was just getting folks. It does everything we wanted it to do without a phone. If it does not do that, it will fail miserably in the market. If it does have its own power, it will do pretty well.
Hardware-wise, Google is doing all right, but they did not commit to the microphone technology to the level that Amazon did. There’s some patents that Amazon has that Google couldn’t get around.
Google’s best device I think has four microphones. I think Amazon’s best device now has 10 microphones. I’m losing track in somebody’s newer devices that’s recently came up.
Rene: They just keep coming. [laughs]
Brian: Does the microphone technology matter? Yeah, because it’s got to hear your voice. That is the resolution technology, if you will, or the keyboard technology because it’s an input technology.
Then we have the mechanics of the AI of speech to text. I would say that Google probably has the best in that regard, but the problem is we don’t really get to experience it very much.
They don’t showcase it, because again they’re living within an engineering culture where they’re in fear of being able to use the power that they have in their hands. Again, I’m an engineer. You have engineering. I have lots of engineers listening to this podcast. We’re going to be too careful.
In this use case, it might break. You need a leader to say, “I don’t care. We’ve made something beautiful. We’re shipping it. We’ll fix it later.” Every product finally needs a leader to say, “We’re shipping it. It’s never going to be perfect. This is good enough. Every Apple product, we’re shipping it. We’re done.” Sometimes, they made a good decision. Sometimes, they didn’t — Apple Maps.
Rene: Every artist needs somebody to pull the paper away from them and say, “You’re done.”
Brian: I come from a songwriting background. I would tell artists all the time, “All right. No more words. Now, we have to throw away words, because you have too many. No more chords, no more lead guitars, no more drum cells.”
Second to that is Siri. Siri could have been number one. The only reason they aren’t is because they lived on a technology which was not really their own. They’re borrowing technologies from other companies and internally.
I won’t get into all the companies they were borrowing technologies from, but let’s just say it’s all over. It was that company, one of them, that stymied the entire Voice First revolution, because they owned all the patents and they invented IVR.
Those folks are the people you want to get mad at when you think of press one for this and hearing these really verbose responses where there’s no psychology being used, where there’s no poetry. I’m not saying…
Rene: There’s no nuance, ha-ha. [laughs]
Brian: There’s no nuances. They decoupled from them, but the Siri teams would’ve told them in a heartbeat, “Hey, we need to get rid of these folks. Let’s start hiring. Let’s build it ourselves. By the way, the platform we made was a temporary platform. We need to rebuild it from the ground up. It needs to be able to be self-programming.”
The Siri team said to the Apple folks, “This is just a demo platform. We need to make a self-programming platform.” What does that mean? The AI starts writing its own code. That’s what we’re really talking about. This entire conversation is really about self-coding AI, and we’re just using our voice to mediate that.
Workflow as a beginning concept of that. People say, “Well, that sounds sci-fi.” It’s already being done. It’s the future, it’s the right now, and it’s where Viv is heading.
Rene: …I keep going off tangent, but it’s just funny. When I was talking to the machine learning people about programming phase idea of all things, the language they used did not sound like coding a machine. It sounded like training your pets.
After a while, I like, “Yeah, the Batman machine that’s defending you and the Joker machine that’s trained to not be fooled, we don’t know what they’re doing anymore. [laughs] They’re basically just working on their own.
Brian: This is exactly where all of this is going. The whole idea of coding an apps are radically going to change. We’re not going to be coding. I started coding in hexadecimal. When I started doing higher order language like Forth, Forth fried my brain, does this reverse Polish notation. It was fun though.
Then I started going to higher C and BASIC, obviously, and all the stuff. I said, “This is twice.” I was into the machine level. I could control the processor. The people that are coding iOS apps today are going to freak out when they realize that basically an iOS app that they coded can basically be built in real time, as somebody speaks.
It literally is like the train laying the tracks in front of it. That’s not future. That’s doing right now. That’s what Viv is already doing. This is building its own ontologies and taxonomies. It’s the same, it’s not actually building code.
It’s like once you have a routine, you just pop in whatever the operation is for that routine, and then this operates on it. That’s where this is all heading to. Again, that’s a functional problem. Philosophically inside of a company that built the iOS Store and the whole app eco system, what if building an app is your kid talking to it and they build it in real time, then what do developers work on? What is your future looked like?
These are all existential problems that I know where they’re heading. I mean I see where they are heading and they’re solvable. All I am saying is nobody has job security in the future. Let’s put it that way. It used to be learn the code, you have a job forever. I know. Now, you’re going to be coding something else.
Yes, it is ultimately AIs like teaching a child. The payoff is like having a child. It learns. You nurture it. It gets bigger. It gets stronger. It gets better, and it learns more about you. You start asking the question, “What about my privacy? How is it going to be safe?”
That’s the secret that Apple has. They can literally dominate this by running the line of privacy very clearly around all of these data and letting people feel more secure about getting closer and letting this AI get closer to them, because that data is not going to be used in a way that one could not imagine.
Rene: That’s a great bridge. We chatted briefly about this on Twitter. There’s three or four areas where I feel like there’s still huge opportunities and huge leaps that need to be made. One of those is the actual learning. Right now, it learns the natural language syntax to better understand me, but it doesn’t learn what I’m doing in my behavior, so it can’t predict me.
Rene: It’s all very reactionary. The second one to me is multi-personal, where if you and I were roommates, being able to really make sure that if I say, “Messages,” it gives me mine and not yours, the base level security layer.
The third is exactly what you’re talking about, and that is to be able to ingest enough information about me. There are concerns, like Google Assistant always says, “Can I track your Web, and can I track your apps?”
I say, “No,” and it says, “Well, then you can’t use me.” Apple, I wouldn’t have that. I would have certain qualms, because if you duplicate my data, that means there’s two places it can be stolen from. I’d get over that quickly. If it doesn’t work…
Brian: You should be running Apple division right now. You just logic’ed out the most important aspects of Apple right there. It’s very clear, and any of us Apple fans see this. In fact, you want to know something? This is where people misunderstand me. There is near-field and far-field Voice First.
Apple owns the near-field Voice First. They owned it with AirPods. Phenomenal device, powerful device, and they hobbled Siri on it. They made it into, again, an appendage that barely did anything. There are certain things you don’t want barked into a room for everybody to hear.
If Apple knows you have an AirPod in one ear, it will whisper in your ear, essentially, in saying, “Oh, yeah, you know, y-, y-, y-, yeah, you know, that stock you wanted to buy, or that you’re…”
“Yeah, you’re, you’re gonna bounce a check,” or whatever stuff you don’t want anybody to hear in a room. A lot of folks think that this is limited because how do you want everybody to hear everything in a room. It’s echoing around.
No, it’s going to be in your ear, and Apple, again, owned this by almost a year, and because they flubbed it, and they didn’t give the Siri teams and the VocalIQ teams…
Apple acquired VocalIQ. We talked about self-programming. The VocalIQ team in Cambridge, go and search. Go and look at what the CEO was demonstrating four years ago before Apple acquired them. He was onstage programming in real time by talking.
It wasn’t equivalent to Viv, it was a different tact on the way they did it, but it was real-time, contextual programming. Let’s call it a tokenization of ontologies on taxonomies in real time. It was powerful. I sat, and I said, “Oh, yes! Finally, they got VocalIQ.” These guys are geniuses.
I flew out just to see one of those seminars, and I was floored. This was way before Apple acquired them. I said to my friends at Apple, “Boy, you ought to acquire them in Viv and you would own the market.”
They took one part. Now, what? We don’t see the results of that. The Cambridge Group, by the way, where Vocal IQ is…
Automated Voice: [off-mic comment]
Brian: There’s another voice system in the background.
Brian: The Cambridge Group is across the street. Vocal IQ Group is across the street from Amazon. They have a building that’s about a hundred times larger and it just looms.
Every day, these people kind of walk across the street and there’s a blaring sign that says, “You want to make 3X, 4X what you’re making over at Apple? Come across the street and work at the 12,000-person army, building the Alexa tools.” How long does it take, Rene? How many years does it take for you to get depressed and saying, “All the fun is across the street?”
I would say to anybody that listens to you that’s an Apple fan, “Open your eyes. Look around you. Be honest and say, ‘did Apple make a mistake?’ and if they did, be honest about it and help them. Write about it. Talk about it. Stop apologizing for it. Stop saying Siri’s an appendage to an OS and let Siri have its rightful place as its own platform.”
Let it grow and do whatever it’s supposed to do in the world. If so, be it, it ends the iPhone, well then, it was supposed to end. Does it work on the iPhone? Yes, but it works disembodied through anything. We have this rich and vital developer ecosystem. Apple, give me 10 minutes. I’ll fix this for you.
Developers right now, they’re coming to me. I mean, I’m a lightning rod for Voice First. They go, “I love Apple but there’s only five or six taxonomies and ontologies that can work under.”
I go, “Yeah, and it doesn’t look good. IT doesn’t look like this next WWDC, they’re going to open up maybe another 10. It’s wide open for all the other platforms. You’re a developer. You believe in Voice. Who are you going to develop for?”
You know, Ben Bajarin, a great researcher of strategic…
Rene: Creative Strategies, yeah.
Brian: He wrote what I think is the definitive turning point. He walked out of CES 2018 and he said, “The new works with iOS is Alexa ready, or Alexa enabled.”
Rene: The way I try to look at this is, I try to figure what’s going to come next. Phones, they’ve been the defining thing of our era. If you fast-forward, it seems to me that, before we get to things like implants, [laughs] eventually, we’ll all be cyborgs. [laughs]
Brian: [laughs] That’s another thing entirely. I want to go down that one.
Rene: Before we get to that, eventually, we’re going to just need a little marble or a little box that, all it does is authenticate that we are who we are and establishes a connection with the world around us. That is going to need to be controlled.
Yes, there will be some aspect of AR where when you need physical interactions, you can have them. It’s going to need to be controlled by what we say before it can be controlled by what we think. How, within your company, are you going to get to be making that device, be successful when that device is the norm?
Brian: Exactly. We are going to have images. I’m not saying this world doesn’t have images anymore. They’re going to be contextual, situational, and ephemeral. The images will appear in front of you when you need to see them, and it will disappear when you don’t.
Rene: We’ve talked about tactile interfaces. There will be all sorts of things but they’re not going to be primary anymore.
Brian: They’re not going to be primary because you’re not going to be waving your arms. You don’t need a surface. Your voice is a much more powerful tool than your fingers ever will be. That’s just the reality of life. That’s what evolution has given us. As much as we want to pray for the singularity, it ain’t going to happen.
Rene: They’re also multifunctional. It’s why do I love audio books. I can do something else while listening and I can’t do something else as easily while reading.
I used to read all the time because I can be driving, have an idea for an article, and just start dictating it. Otherwise, I’d have to stop, get out a device, not be able to do what I’m doing. This lets me be a multifunctional person.
Brian: This is exactly it. What a critical point in time. You have shareholders in the company of Apple saying, “Apple, we have screen addiction problems, not with just the youth, but everybody. It is literally a screen addiction problem. How do we fix that?” I’ll tell you, I’ve seen I with my own children.
When they get voice enabled, when they start being able to talk to their devices, they expect all the devices. Children expect. I’m going to tell you two things that children are going to expect that group up with iOS devices. This is a big problem for Apple. They’re going to expect every screen to allow your fingers to manipulate it.
This philosophical bullshit that Apple has that you can’t touch a laptop screen is solved by my then 12-year-old child. My 12-year-old child at the time said, “Dad if the iPad came before the laptop, there would be no debate about the laptop screen having a touch capability.” End of story.
Now, all of them apologists for Apple need to see the world through the eyes of a child. They don’t know the philosophy of, “Well, my fingers at a weird angle. It smudges the screen.”
They don’t want this philosophical debate. They want to be able to go up to a laptop screen and move something. Now, if Microsoft did it first, bite the bullet and do whatever you need to do to get it done, but you fix that.
The next thing is I expect every computer to not only hear them, but to understand them, and to talk back to them. Every device, in real time, and it doesn’t need to press a button, and it doesn’t need to open files.
The failure of the very first voice interface was this stupidity that we believed, and I was one of them, that we needed to manipulate the computer through our voice. Nobody wants to do that. “Open file this.” “Move file there.” That’s what some people debate.
When they use the straw man debate with me, saying, “Brian, do you think people are going to moving things around the screen that way?” I go, “No. I never said that.” “But that’s what it means.” I go, “No. You’re not going move anything around the screen. It’s going to present to you what you want.”
Rene: I know some people don’t like it. I use Siri on the Mac all the time because I can keep typing while I say, “Convert this between decimal and imperial,” or…
Brian: [laughs] I love it.
Rene: “…What is the thing that…” I just do research. Otherwise, I’d have to change. Go to a web browser. Humans are terrible at context switching. I’d forget what I was typing. I’d just ask it for information and then I keep writing while it gives this to me.
Brian: When I go in a frenzy of writing, I’m using Siri, I’m using Cortana, Alexa, I’m using anything around me to help me, “What about this? Look this up.”
Rene: See, I should say voice, instead of Siri. I just mean voice in general.
Brian: Yeah, it’s all around me. People that see me the first time doing this, they’re like, “I didn’t know you can do that.”
By the way, I’m writing something else. I even transcribe while I’m typing my other thoughts. I might have side notes on typing the main story, and I’ll start transcribing my side notes.
Now, are we really multitasking? No. There’s no such thing in human…We’re task switching. Is it perfect? No, but I’ll tell you what it does. It does increase your productivity if you use it the right way.
Rene: Yeah, absolutely.
Brian: That’s what I think is missing from the arguments. I want to see Apple succeed. I want to see Siri succeed.
I think if you’re an executive at Apple, or you’re a fan of Apple, and you look at what just took place at the largest consumer electronics show, and then you look at what’s going on in the world, and in China.
You look at developing countries, there are developing countries where people are not really ever going to touch their phone, they’re just going to talk to them.
Rene: The same way they never had copper cables.
Brian: Exactly. Did I make this world? No. Am I relishing in the future? Yes, because that’s what you do as a scientist. You let the empiricism of what the world is, the natural gravity of events, and you go in that direction.
You become an observer, and then if you have any ability to see the future by looking in the past, you see that there’s a way of things, and that is humans want to simplify their lives.
Now, what are they going to do with this extra time that they get? I don’t know, but you are going to be looking at screens less, ultimately, because you’re going to be searching for the right answer, not nine million results.
The big quagmire is we don’t realize that we’ve become the sifting and sorting system for Google Search. 90 percent of what I see people do, and I’ve done this research for an AI. I’d sit there as a scientist and I go, “What are you doing today? Let me follow you around.”
When you distill it down, 90 percent is sifting and sorting junk that your personal assistant wants to know who has high context about you, would say, “Is this what you wanted?” “Yeah, that’s it.” Now, what’s that? That’s an hour and a half or two hours of sifting and sorting.
It sounds like it’s a natural thing for Google to take, but they don’t see it that way. They still see this as an appendage of the search arm. See, Google’s got their own problem. Apple sees it as an appendage to the OS, and Google sees it, too, an appendage of Search.
Rene: Everything is a nail, right? They all have hammers and everything is a nail.
Brian: Yeah, and Amazon is saying, “I don’t care. I just hope that people buy more paper towels and other things.”
Rene: My thing is still this. They all get better and better at understanding when I say I want a Coke, but they don’t get better at learning that I want Coke instead of Pepsi.
Brian: That’s exactly it. That’s why it’s an interesting time. In fact, I think this will be seen as the most exciting time in technology and here’s why. The future is open to the entrepreneur in a way that it never has been before. This is where a lot of AI researchers get really mad at me.
Brian: The work that they do, is going to become the electricity. Everybody didn’t know what electricity was going to be used for beyond lights. Most of it’s used to operate computers and other technology, and mine Bitcoin.
Brian: Let’s look at it from this point of view. All of the hard AI, machine learning, will become one chip at some point. Then, the question is, what is the abstraction layer that you and I build on top of that? Those abstraction layers that Steve built on top of the phone system, could we have predicted…
Everybody said, “Steve, you need to buy a cellphone company.” He had the wisdom to say no. “I’m going to build abstraction layers on their dumb pipes.”
The dumb pipes of AI is going to be natural language recognition, general to medium intent extraction and all the other stuff. The entrepreneur, the creative technologists, they’re going to look at it and say, “My God, I can build an abstraction layer on here that just fuses together all these different ideas.”
I happen to think it’s like neuron building, what we’re going to be building in the future. These ideas that apps are going to be replaced by neurons and memories and interactions and you’re going to connect to other people’s interactions and neurons. That’s going to be the next social media, the next social networks.
There’s ups and downsides to all these, Rene, and we never really can probably dive into the privacy that deep other than the fact that, yeah, you better believe I’m worried about it. I talk about the great things but everything I’m talking about…
Have it clear in your mind, I understand what we’re doing. We’re putting an open microphone and an open video camera in front of everybody, 24/7. That’s what this means.
The AI is going to be looking at your emotions. That’s why Apple acquired Emotient. In fact, a lot of people don’t realize, an emoji is just rebroadcasting emotional intents that had been extracted from your view.
They’re not mirroring your image. They’re saying, “Oh, that’s a smile. Generate a smile inside that pig.” That’s all it’s done.
Rene: The big thing about ARKit, a lot of people say that they don’t really care about ARKit because they don’t want to put a troll in their living room.
Rene: The big deal to me is the ingestion of the world so that the computer understands it.
Brian: Exactly. I think when the next generations that are coming up that have lived through voice around them all day, all the time, their view of how this mediates their life and the value is going to be seen in their work.
It’s interesting that the two cohorts are the youngest and the oldest people in the United States, and probably around the world but I have US data more, are using voice to the higher degree. Older folks, they don’t touch apps anymore. They just say, “Open this up.” They get to what they want.
Maybe they’re visually challenged. Maybe they’re mechanically challenged. They just don’t want to fodder with it. They’re like, “I don’t care if I see the app open up and make a nice, little, fancy thing on the screen. I just want to get to my news. I want to get to my browser.”
Rene: I just want to send this message. I don’t want to necessarily navigate through apps to do it.
Brian: That’s right. When you really cognize what that means as an entrepreneur, as a VC, as a technologist, as an executive running Apple, take wisdom from this. There’s something being told to you about what the world’s going to look like.
If you’re an Apple fan person and a Voice First denier, deal with the realities. I didn’t make this world. Don’t debate me about it. Just look at it. I think it’s self-apparent.
Rene: If you think about just the chain, like if I just say, “Text Brian,” it’s a very simple chain. If I don’t do that, I have to pick up the phone. I know I want to talk to you but I first have to find an app that can do it.
I have to open the text app, then I have to remember that you’re the person I wanted to contact in that app because we context shifted again. I have to find our conversation where I have to type out your name to start a new conversation. Only then can I get to the message. That is laborious compared to saying, “Text Brian.”
Brian: Cognitive and mechanical load, I would tell you that the mechanical load alone is probably about three and a half minutes, the mechanical load. The cognitive load is equivalent to like 15 minutes of brain work. People say, “Oh, what’s the problem?” You just articulated it.
When you start doing that enough throughout your day, and it works…I’m not talking about it working half the time. If it only works half the time, you’re not going to use it. You got to make sure it works. That’s a word to Apple about getting better microphones for Siri in a far field situation. It works great on AirPods but not everybody’s going to have one.
Once you have that power, you’re on doing other things. Those other things are going to be in those abstraction layers I’m talking about. That’s the largest opportunity I think we’ll ever see, or we have ever seen in technology. I think it’s going to create new Google sized and Apple sized companies that start from nothing.
People, we don’t even know their names today, are going to come up through this system and be the new Zuckerburgs, and the new jobs, and new Wazniaks.
Rene: I know this is really small compared to what you’re talking about, but just basic things was…I love that I can say, “Remember this,” and it’ll use the continuity features to basically bookmark anything on a phone, but I want to be able to say, “Copy this. Read…” just give Voice the ability to understand “this” and then operate on “this,” “this” being whatever I’m currently working on at the time.
I think those are sort of building blocks we need to get to.
Brian: Exactly. I think if you really start using this to any degree, if it’s taken away from you, you realize that it’s something that you really are missing. You have to have it back. I’ve watched people who have had their Echo devices taken away for about two weeks. They get angry. They get ornery. Some things…
Rene: I’m moving and I took my stuff down. I’m building out a lot of HomeKit stuff and I had to pack up to move. Everything just went offline and I had to figure out how to turn off my lights again. [laughs]
Rene: It sounds dumb but I’m so used to talking to them.
Brian: I’ll bring up Ben again, Ben Bajarin. He said, “The whole thinking, HomeKit versus works with Siri.” Brilliant. Brilliant. That articulates philosophy right there. Nobody really understands what HomeKit is, but they will understand that you could tell Siri to turn on a light. Amazon is dominating that space and that space is getting larger and larger after CES.
You have every appliance is ultimately going to just take a command from you. I don’t want to sit there in front of my washer and dryer and figure out a new menu structure. I don’t want to figure out some interface that I don’t want to deal with. I don’t want to download an app to try to get to it.
It sounds like a more Apple solution to it but I’m sick of downloading apps to try to get something done. I just want to say, “I got dirty, white socks in here. Make them clean,” and then walk away.
Rene: Yeah. Figure it out. [laughs]
Brian: That’s what Viv is working on. People are saying, “Oh, everybody’s going to be talking into a device.” Damn straight. In fact, if you start looking at the medical equipment that Samsung produces, sometimes they have to go through menu structures that are 39 levels deep in some of these MRI machines.
I saw a voice interface using a Viv type system where they can just say the command. Of course, it’s confirmed and it’s not going to go and burn somebody. Everybody’s saying…
Brian: Of course, it’s very authenticated. Let’s get that out of the way. They say the command and they can literally set up an MRI system in 2 minutes that used to take 20 minutes.
Once you see that as a manager running a hospital, and you know that you need to get more patient through the MRI, you don’t sit there and play with philosophy. You don’t sit there and say, “Is this philosophically the direction we need to go in?” You just go and do it. That’s why Viv is dominating that.
Rene: For the last question I wanted to ask you, let’s say you got to write the script for Siri at WWDC 2018. What would you like to see?
Brian: I would do it as a notable executive at Apple, which I would gladly do. I would literally pay them to do it. Now, I need the money, but anyway…
Brian: I would say this. I would get teams across Apple together internally and say, “We now have Siri OS. It’s its own platform. It’s going to live and die on its own, but it’s going to touch everything that we do.
I’m going to pipe all of the teams inside of Apple together on an AI blood system, if you will. AI’s going to mediate everything that we do from now into the future.” What Siri OS is about, it is an AI mediated OS. It connects all of these different ontologies and taxonomies that we’re building.
Mac OS is going to tap into it. iOS is going to tap into it, but primarily, our voice is going to mediate it. It sounds like a contradiction but there’s not enough time to go into the details. Trust me. I know where this is going.
The next level would be, we need to open this up to a developer community to a level that no other system has ever been opened up, a voice space system. We need to be able to allow developers to, in real-time, build what workflow promises. This real-time ability to build solutions based upon what the intents are from the user.
To be able to, in real-time, pull from the cloud, I ultimately think that all the apps are going to be in a cloud anyway, whatever that means. I’m not saying iCloud. That’s another thorn in Apple’s side. I think the idea of downloading an app and invoking an app is ultimately not going to last on the arc of three to five years.
Definitely, by five years, the idea of downloading app is going to be so antiquated. It would be like buying music was, right? As we dive into these different ontologies that these apps “in a cloud” represent, we need to be able to have the glue in our OS to carry these across into a cohesive context and continuity.
The OS creates the context and the continuity. What did the person just ask me? Is this in the same context of what they just asked me? Is it a continuity of what I just did? That’s where the low-level OS really functions. Now, a lot of people in AI don’t work from this standpoint. They don’t see it through this standpoint.
The beauty of what Vocal IQ is doing and what Viv was doing, and it’s definitely not what Amazon’s doing…It’s absolutely not what Google is doing. Google is doing continuity but not in the way I’m saying and certainly, Siri isn’t. You’re essentially carrying along the conversation wherever it goes.
This does not mean it’s general AI. It does not mean that it knows everything you’re saying. It just knows that the tracks laid in front of it is leading in a direction. If you keep leading those tracks, it keeps following you along, threading the context of the ontologies that you need and solving the work or the problem that you need.
That means that once that neuron…Let’s call that a neuron. It’s the steps of how context is built. It is now yours and you don’t have to build it again. It now knows that if you invoke it through the same set of commands, the same contexts, or the same dialogues, however you want to word this, it will already be there and it doesn’t have to build it again.
It grows over time because developers add to these neurons. It has new abilities and it tells you this, so it becomes very organic. We can do that by WWDC 2018. We can start building the tools where developers can literally make anything, not in a silo.
OK, you can only do payments, or you can only do buying flowers, or you can only do this ontology. Come on. That’s ridiculous. Let me tell you the fallacy of Amazon. The idea of using skills and keywords is a dead end. Right? Let’s look at the domain system of the web. After all the great domain names were taken up, people got on and got depressed.
Then, we said, “Well, there’s a .net, .org.” Then, they started inventing all these other domains. Now, there’s a confusion because who owns the right domain? Is it an IO domain, or is an AI domain, or is it comp? There’s only one weather domain on Alexa. There’s only one flower domain, or Uber domain. That’s a brand, but let’s say, taxi. Let’s say pizza.
Alright. Who owns a pizza domain? The first person that wrote the pizza app. Is that the best app? No, but they were there first. Should that dictate what should own the domain, pizza? No.
OK, then we’re going to take it away from the developer who worked their ass off to get that pizza app, which was maybe the best that they can do, and what? Sell it and give it to Pizza Hut or Dominos? Is that fair?
The idea of domains, this type of system…I’m talking about a different domain system so I don’t want to get confused. A domain is a physical aspect of AI ontology taxonomies of how you build these ideas, structures, and intents.
The domain of a skill is the actual word, or the invocation word is really the proper thing of what Alexa calls it. We know it’s a dead end, so how do you deal with that? The only way you can deal with it is, you’ve got to walk down that one-way road backwards and say, “Oops. We shouldn’t have gone down here. It’s a dead end and we have to redo the whole thing.”
Apple’s got the advantage today to do it the right way. I don’t think that they have the people inside of the company telling them that this is a problem. I, unfortunately, think that the debate is still whether it’s a platform or not.
If you’re at this kindergarten, preschool, actually, debate of whether Siri is a platform, then I don’t think you’re going to get to the idea of how neurons need to be built in a reasonable amount of time before the market just swishes around you and other people get it.
I would use it as a motivational tool. I would say, “Listen. Look at the folly of building these exclusive domains.” How do you solve it? You’re going to have to pay me a lot of money to solve it, by the way, but I’ve solved it.
Brian: There’s three different ways to solve it. There may be more. I’ll challenge any AI researcher to come up with ideas. I’ve worked in this industry for a very long time. It ain’t easy and it’s not the usual suspects. Let’s just say this. Pizza, to me, is something radically different than pizza for you, right? That’s where you start.
You always start with high context. When you hear the debates that AI is about big data, you’re talking to somebody who’s got a 1990s mindset. AI is about small data, the smallest data possible, your data, your highly contextual data. What does pizza mean to you?
I will learn over time and then in the future, your pizza is your pizza. It’s not my pizza. What does flowers mean to you? What does Cindy mean to you? Maybe it’s the name of your wife, your girlfriend, your sister.
All, over time, this context becomes quite aware of you and then you’ll realize the power. The power is, this is a glove that fits your life. It’s not a universal Swiss Army knife. We’re not building Swiss Army knives, we’re building something to solve work for you and tools for you. You might use this AI in a different way.
I would say to Apple teams, this is what we’re building. We’re building the future of how people are going to interact with the computers and they’re going to be some things that stand and fall. Visuals are going to be there, of course. Keyboards are going to be there. How many people use mice anymore? The mouse is still there but it goes away.
I’m saying this to Apple, to my teams at Apple, if we open this up, why it is possible to developers, then we don’t have this problem that, “Oh, we have to be really safe about people and what they learn.” No. As long as you protect all the data, encrypt that data nobody can ever get access to it, then open it up as wide as possible.
Let people define what is important, let people define what is needed inside of their life, and then it becomes the tool that you and I have always dreamed off when we were little boys growing up and little girls. It’s like, “Oh, I can’t wait to this computer can do the things that I want it to do.”
It will make what has come before look like toys. We set our screen color, we set our fonts, we set Night Mode and all that. No, man, this is something entirely on a different level and a level that we can just utter. Like we do with our significant others, we can say two or three words to our family and they’ll know exactly what we mean, and we’ll be able to say that to our AI, our computers.
Rene: The assistance will actually be personal. [laughs]
Brian: Yeah, it will finally be personal.
Now, what’s the future from there? You and I and everybody listening is going to invent it. What I’m telling you, it ain’t going to be something we carry around and stare at all the time because if that is the future of humanity, we’re just going to pump all of these screens into our retina, or worst, into our brain, you can have that future. I don’t want it.
Brian: That’s not what I started working on the computer for. I wanted to get things done. I think that’s where we need to start as a society. I get little philosophical on this end, as a society we need to grow into this direction and mature, that these are new bubbles and toys that we got enamored with.
If Steve is around, I know he would have seen this. He wouldn’t have his own children on Surfaces when they were younger. He saw the addictive power. How appropriate, right now, at this particular moment in time, you have some of the more powerful people inside Apple, hey, we need to do something about this.
It’s not so much to try to virtue signal that, hey, this is the right thing to do. It’s a real societal thing because work is not getting done. That doesn’t mean like people aren’t getting work done meaning practical work at work.
I’m saying that we’re no longer solving things. We’re actually going out there and just burning our time. Is that really what we want precious few years that we exist on this planet being used for? I don’t know. We’ll see. History has its way of dealing with humanity making bad decisions. We get resets. [laughs]
Rene: If people are interested in following your work, reading your work, following you on social, where can they go?
Brian: My first and last name, basically, on any social platform, B-R-I-A-N, Brian. My last name’s Roemmele, R-O-E-M-M-E-L-E.
If you’re a brand or a company and you resonate with any of this, and you’re freaking out, you don’t know what voice represents in your brand…
Brian: …go to voicefirst.expert. Talk about domains. If I can’t help you, I’ll find somebody that can.
If you have a company, you have a brand, and you don’t have a voice strategy right at this moment, you better get one, because you’re going become a generic very quickly. This includes the smallest merchant to the largest international brand.
When somebody says, “Reorder paper towels,” and you’re Scottie Paper Towels and you don’t have a strategy, there’s about 25 people on this planet that can help you right at this moment, and most of them are working for our company that may not be in your best interest. Let’s put it that way.
Rene: [laughs] Well put.
Brian: I’m here to help people understand this. I can do it to the best of my ability, I’m only one person, but I would encourage anybody just resonates with any of this, learn this stuff. Learn the psychology behind it. Learn philosophy. Learn the Jungian archetypes. Learn Myers-Briggs.
All these stuff is going to be the future graphic artist of the Voice First revolution and it ain’t going away. It’s just going to get bigger.
Rene: Awesome. Brian, thank you so much for spending the time with me. We’ll have to do a follow-up…
Brian: Rene, thank you.
Rene: …on the privacy and security aspects.
Brian: I’m here anytime and it’s been an absolute honor and pleasure, sir.
Rene: Same, likewise, thank you.
You can find me @reneritchie. You can email me firstname.lastname@example.org. I want to thank you all for listening. I’m still moving [laughs] so I stole the Tortured House Podcast Studio today to record this. I’m going to keep going through next week. Have a great one, folks. That’s the show. I’m out.
Do you have CarPlay installed in your vehicle? How are you liking it? Let us know!