Hacking SiriKit

New Features in Realm Obj-C & Swift

We recently released version 2.5 of Realm Objective‑C and Realm Swift – In this release, we’re empowering queries with support for diacritic-insensitive string search, adding the ability to compare nested objects against NULL and applying a large number of bug fixes to keep your apps running strong. We’re also now producing Swift binaries for Swift 3.0, 3.0.1, 3.0.2 and 3.1.

A Little Bit Of History - Brewbot (01:07)

When SiriKit was announced, I was working for a company that was called Brewbot. Brewbot was a machine that helped you brew beer and control the whole process from your phone.

We wanted to use Siri so that the brewing process can occur without holding the phone, or placing it on top of the BrewBot. When the BrewBot started stirring the mash, people have reported the phone to drop and crack.

Limitations (03:44)

Apple limited what SiriKit would be able to do, it was only able to inteface with specific apps.

We were devastated, but there were two things we can do about this: 1) We can gave up, or 2) Hack around SiriKit.

To pursue the second option, we had to access the hackability of SiriKit by understanding how it works.

How Does SiriKit Work? (04:15)

First, setup SiriKit by completing the entitlements through Xcode, then use app extensions to allow it to communicate with Siri.

App extensions

App extensions are like small applications embedded into your app’s binary. When they get installed into your device, your application will install those extensions (like today’s widget, or how Apple Watch apps work). They’re sandboxed, and if you want to start sharing data between your widgets and extensions with your main application, you need to start using shared containers.

Intents and UI intents

SiriKit doesn’t need both Intents and UI intents to work.

An intent is an extension that will get called when you execute a Siri command that should match your application. It’s the way to represent the user’s intent of trying to command Siri to do something.

The UI intent gives special feedback to the user (for example, “Siri, bring me an Uber”, and you will get a map). It can be as complex as a map, or as simple as a regular view with a cat smiling.

For a sample app, I want to hack Siri to communicate with Travis so that it can trigger builds without involving a website.

First, I want to query Travis, based off a user. I want it to retrieve the list of repos and the state of their last build. Second, if there’s a failed job, I want to launch it using Siri.


func resolveRecipients(forSendMessage intent: INSendMessageIntent, with completion: @escaping ([INPersonResolutionResult]) -> Void) {
        var resolutionResults = [INPersonResolutionResult]()
        for recipient in recipients {
            

            switch matchingContacts.count {
            case 2 ... Int.max:
                // We need Siri's help to ask user to pick one from the matches.
                resolutionResults += [INPersonResolutionResult.disambiguation(with: matchingContacts)]

            case 1:
                // We have exactly one matching contact
                resolutionResults += [INPersonResolutionResult.success(with: recipient)]

            case 0:
                // We have no contacts matching the description provided
                resolutionResults += [INPersonResolutionResult.unsupported()]
                

The first scenario that we’re going to check is what happens if there are multiple matches for a contact. Siri would pick this up and ask for which one I would want.

What I wanted to do was create a table view, that you could tell, “Siri, send Travis a message with a status.” It would retrieve the list of repos and with a emoticon or an emoji, showing the state of the build.

But with SiriKit you need to register the intent that is going to be used in the UI. If you try to execute a UI intent for a response, or search for unread messages, it will fail. We need to figure out a way to circumvent the requirement to add a UI it to call Travis.

Calling Travis (11:57)

For the first part I thought of having a predefined contact called Travis and then sending a command like fetch or a status. Siri will do all the connections of the intents; it will call your code and you’ll be able to check if Travis is a valid recipient and if the action is valid.

Confirming the Command (12:26)


// Handle the completed intent (required).
  func handle(sendMessage intent: INSendMessageIntent, completion: @escaping (INSendMessageIntentResponse) -> Void) {
    let userActivity = NSUserActivity(activityType: NSStringFromClass(INSendMessageIntent.self))
    let response = INSendMessageIntentResponse(code: .success, userActivity: userActivity)

    // Here we should be hitting Travi's API

    completion(response)
}

In order for you to implement SiriKit, you need to adhere to a couple of protocols that are already predefined, such as send message, search for unread messages, and confirm message, etc. Confirming the command executes the Travis API in our app. We’re going to ignore the user activity on the response and we are going to call the API.

Verifying the Commands (13:18)

In order to resolve our commands, we need to verify if they’re valid. I’ve created a simple and basic enum that has like a string type value assigned to it and it says execute or status.


private enum Actions: String {
    case execute
    case status
}

func resolveContent(forSendMessage intent: INSendMessageIntent, with completion: @escaping (INStringResolutionResult) -> Void) {
    // Extract the 1st recipient (only expect 1 here)
    if let recipients = intent.recipients, let recipient = recipients.first {
      let key = (recipient.customIdentifier ?? recipient.displayName).lowercased()
      if (!self.repos.contains(key)) { // Repos should be read from a shared container DB
        completion(INStringResolutionResult.unsupported())
      }
    }

    // Extract the command
    if let text = intent.content, !text.isEmpty {
      // Extract the actual action
      if let _ = Actions(rawValue: text.lowercased()) {
        completion(INStringResolutionResult.success(with: text))
      } else {
        completion(INStringResolutionResult.unsupported())
      }
    } else {
      completion(INStringResolutionResult.needsValue())
    }
}

I am extracting the actual action by calling the initializer of the action’s enum with a root value. If I call execute, it will resolve into an instance and then I can call completion.

If the user sent an unsupported message, I can return that back to the user with a feedback, or if the user doesn’t provide us with an action, I can request it from Siri.

The message that Siri is giving is misleading, because you’re telling it, “Send a message to Miguel using Commander”. You can see that I’m checking for Travis and I’m trying to send a message to Miguel. It’s not a recognized recipient, that’s why it’s failing.

Putting All the Parts Together (15:36)

We have everything ready to start hacking Siri. We need to hit the Travis API, then store the repos under a status and they will be our contacts.

The next time that the user wants to send a message, we’re going to fake we have multiple matches for their recipient then show them in a UI. When the user next interacts with the application, we’ll hijack the request to show it to the user.


// If no recipients were provided we'll need to prompt for a value.
if recipients.count == 0 {
  // TODO: Read from the DB/store the list of repos and statuses
  let person = INPerson(personHandle: INPersonHandle(value: "RxViewModel", type: .unknown),
  nameComponents: nil,
  displayName: "✅\tRxViewModel",
  image: nil,
  contactIdentifier: nil,
  customIdentifier: "RxViewModel")
  completion([INPersonResolutionResult.disambiguation(with: matchingContacts)])

  return
}

This is where you are resolving the intent that the user is trying to pass from Siri.

We should be reading from the database store that list is repos on the statuses then showing the icon and name of the repo.

Shortcomings (18:44)

We have proven that Siri it’s powerful, that it can be abused, and that we should be able to do more with it. Some issues involve the actual language used. English and Spanish works fine, but German requires the speaker to be very specific with their words.

The feedback concepts are confusing, as we are using the messages app to accomplish this.

Takeaways (20:05)

Siri is powerful, but we are still far from a full bloom system of interfacing with apps using only voice commands.

You won’t be able to submit something like this to Apple, because you are not a messaging app.

I’ve uploaded the code sample that will connect to the Travis API, and you will be able to execute Travis queries and jobs from Siri.


Esteban Torres

Esteban Torres

An iOS Developer for over 5 years, Esteban is a big proponent of OSS and is the head organizer of CocoaHeads Costa Rica.

Transcribed by Sandra Sanchez-Roige