Integrating Transformers into IOS Applications

11/30/22

Why developing models for mobile devices

  • Develop mechanical sympathy and understand the real use cases

  • Crowd-source training data, e.g. image recognition

How to integrate transformers into mobile devices

Depending on where the model is:

  • server-based: model is located on an enterprise server

  • client-based: model is built into the user’s device

Demo: server-based GPT3 chatbot

Making API requests

Python API Server

server.py
import openai
def reply(prompt: str):
     response = openai.Completion.create(
        model="text-davinci-002",
        prompt=prompt,
        temperature=0.3,
        max_tokens=60,
        top_p=1.0,
        frequency_penalty=0.5,
        presence_penalty=0.0,
        stop=["You:"],
    )
    return response["choices"][0]["text"].strip()

@app.post("/chat")
def gpt3_reply(prompt: str):
    all_messages = get_all_messages()
    prompt = "\n".join(all_messages)
    reply = reply(prompt)
    return {"text": reply}

IOS app

client.swift
AF.request("http://localhost:8000/chat", method: .post).responseData { response in
    add_message(response.text)
}

Pros & Cons of the client-server approach

Pros

  • Easy to implement and reason about, built-in support on many platforms (e.g. huggingface, replicate)

  • Universal to any language, frameworks and models

Cons

  • Limited usage in areas with insuffcient international bandwidth

  • Additional codebase, infrastructure, and maintenance

Alternative: built-in model in the client

Apple provids the Core ML framework for transforming common machine learning models into a .mlmodel format 1 that can be built directly into an IOS app’s bundle.

Model card

Transforming BERT into .mlmodel

transform-bert.py
import coremltools as ct
import numpy as np
import tensorflow as tf
from transformers import DistilBertTokenizer, TFDistilBertForMaskedLM

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-distilled-squad")
distilbert_model = TFDistilBertForMaskedLM.from_pretrained(
    "distilbert-base-uncased-distilled-squad"
)

input_shape = (1, 20)
input_layer = tf.keras.layers.Input(shape=input_shape[1:], dtype=tf.int32, name="input")
prediction_model = distilbert_model(input_layer)
tf_model = tf.keras.models.Model(inputs=input_layer, outputs=prediction_model)

mlmodel = ct.convert(tf_model)
mlmodel.save("distilbert.mlmodel")

Demo: client-based Q&A system using BERT

Critical analysis

Future directions

  • Online methods: model updates on the fly

  • A more streamlined process of model transformation