It's Here! Apple Intelligence Finally Meets Apple Fans!

With the launch of the iOS 18.1 Beta version, registered developers can now experience some of the features of Apple AI, a cutting-edge addition to AI tools.

The most obvious change is the complete revamp of Siri, which has transformed into Apple Intelligence & Siri.

Apple Intelligence

Another major update is the writing feature. It can help polish Twitter comments, effortlessly arranging advanced expressions. Apple Intelligence Even dirty words can be elegantly refined in no time:

Once Apple Intelligence is activated, Apple’s self-developed on-device large model will be downloaded to the device.

According to feedback from quick-handed users, it doesn't frequently refuse service like other AI tools.

Meanwhile, Apple’s own large model report has also been released, revealing many technical details. Apple Intelligence

The report shows that Apple's cloud large model outperforms GPT-4 in tasks such as instruction following and text summarization.

Ruoming Pang, the head of Apple's foundational large model team, also stated that their model is competitive with some of the best models of the same kind.

Ruoming Pang is a computer science Ph.D. from Princeton, with a bachelor's and master's from Shanghai Jiao Tong University and the University of Southern California, respectively. He joined Apple in 2021 after 15 years as an engineer at Google.

The main dialogue function of Apple Intelligence is supported by the model developed by his team.

He emphasized that these foundational models are "not chatbots" but support a wide range of functions, including summarization, writing assistance, tool usage, and coding.

Apple has also developed many self-developed algorithms to enhance model performance, with detailed information disclosed in the report.

Attentive users also noticed an interesting point—the Apple large model was trained using Google's TPU clusters, with zero NVIDIA components.

Siri Upgrade, But No ChatGPT Integration Yet

To experience Apple's Apple Intelligence, several conditions need to be met.

First, the iOS 18.1 Beta version it’s on is currently limited to registered developers at $99 a year, so regular users will have to wait.

Additionally, it only supports M-series and A17 Pro chips, meaning only the iPhone 15 Pro and 15 Pro Max in some regions can use it.

Besides hardware and identity requirements, system settings also need to be changed to the United States, and both the device and Siri language must be set to English.

After meeting all these requirements, you can join the waiting queue.

The Apple Intelligence launched this time includes some features mainly focused on text generation, Siri, and Photos.

Firstly, text generation, as a significant part of Apple AI, is not limited to official Apple apps.

As long as the standard text input system is used, this feature can be utilized in third-party applications for text summarization, proofreading, and rewriting.

Combined with the audio transcription feature already available in Voice Memos on iOS 18 Beta, the text generation system can also create summaries for recordings.

The second significant update is Siri.

In terms of interface, the new Siri is no longer a circular icon but has colorful lights flashing around the screen during operation.

It also provides a text interaction mode for users who don't want to talk, where they can bring up the keyboard by double-tapping the bottom of the screen to type with Siri.

In terms of content, the new Siri can answer questions related to Apple products and help users troubleshoot issues.

Moreover, the new Siri can understand the context from one query to the next, such as creating a calendar event and then requesting a reminder without restating the topic.

However, the previously introduced screen recognition feature is not included in this Siri update.

The Photos update allows users to search for specific photos using natural language, even pinpointing specific moments in videos.

These are the main AI-related contents in this developer test version. It’s worth noting that this is just a part of the features showcased at the previous press conference, and many others have not been launched.

Notably, the previously mentioned ChatGPT integration is not included in this update.

Decoding Apple's Large Model

Apple has said that ChatGPT is not a mandatory component in Apple AI, with major functions driven by its large model.

Regarding this model, Apple also released a comprehensive technical report along with the launch.

The model is straightforwardly named Apple Foundation Model (AFM), with on-device and server versions.

The on-device model has around 3B parameters, while the cloud version's specific size is not disclosed, only said to be larger than the on-device model, with both having a 32k context window.

Training Process With 0% NVIDIA Content

The model was trained using Apple's own JAX-based AXLearn framework, employing tensor parallelism and pipeline parallelism strategies.

The hardware used was Google TPUs, with 8192 TPUv4 chips for the cloud version and 2048 TPUv5p chips for the on-device version, totaling zero NVIDIA components.

The data mainly comes from web pages crawled by Applebot, and publicly licensed code and math datasets.

Notably, none of the data sets used GPL licenses, all using more open licenses like MIT, Apache, and CC0.

The pre-training process of AFM is divided into three stages—core training, continued training, and context extension.

During the core training stage, the cloud version’s data volume was 6.3T tokens with a window length of 4096, and the on-device version was distilled from this.

During continued training, the weight of low-quality data was reduced, and high-quality data from math, code, and authorized sources were used to enhance the model’s capabilities.

This process used 1T tokens of data, with the window length increasing from 4096 to 8192.

In the next stage, the window length was further expanded to 32k, involving long-sequence text and synthetic data, totaling 100B tokens.

Original Reinforcement Learning Algorithms

The AFM post-training includes supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF).

In the SFT phase, synthetic data and human-labeled data were used, mainly concerning math, tool usage, and code.

During the RLHF phase, Apple created two reinforcement learning algorithms, iTeC and MDLOO.

iTeC (Iterative Teaching Committee) is an algorithm for reinforcement learning post-training, aimed at optimizing model performance through multiple iterations.

Its core idea is to combine different preference optimization algorithms, including rejection sampling and direct preference optimization (DPO), allowing the model to benefit from various optimization strategies to improve its adaptability and performance for specific tasks.

In each iteration, iTeC selects a group of the best-performing models from the latest model, forming a "model committee". These models are obtained through different training methods such as SFT, RS, DPO/IPO, and RL.

By collecting human preference feedback on model responses, iTeC continuously updates its reward model and uses it to train new model sets.

Each time a batch of human preference data is collected, iTeC refreshes its reward model and trains new model sets, iterating to improve model performance gradually.

MDLOO is an online reinforcement learning algorithm designed to optimize the quality of model responses.

As an online algorithm, it can decode responses in real-time during model training and apply RL algorithms to maximize rewards.

This method allows the model to continuously learn and adjust its strategy during training to generate responses that better align with human preferences.

In implementation, it combines the Leave-One-Out (LOO) advantage estimator and Mirror Descent Policy Optimization (MDPO) to achieve more stable and effective policy updates.

On-Device Mixed Precision Quantization

To make the on-device model run more efficiently while avoiding excessive memory usage, Apple performed quantization on the on-device version of AFM.

Specifically, Apple adopted a mixed precision quantization approach, applying different quantization precisions for different components.

The approach, called "Palette" quantization, groups weights and shares the same quantization constant within the group.

For projection weights, every 16 columns/rows share the same quantization constant and use the K-means algorithm for 4-bit quantization.

For embedding layers, shared between input and output, 8-bit integers are used for per-channel quantization, with some less critical layers further compressed to 2-bit quantization.

To recover the performance lost due to quantization and maintain model output quality and accuracy, Apple introduced Accuracy-Recovery Adapters.

These adapters are small neural network modules that can be inserted into specific layers of the pre-trained model, training on the quantized model to learn how to compensate for the impact of quantization.

Surpassing GPT-4 in Some Tasks

After applying a series of optimization techniques, it was time to evaluate model performance.

During this process, Apple combined human evaluation with automated evaluation.

For human evaluation, assessors designed various questions covering analytical reasoning, brainstorming, and chatbot topics, having models generate corresponding responses.

The questions were also posed to other models for comparison, with assessors judging which model's output was better.

Results showed that both cloud and on-device models had at least a 60% chance of not losing to models like Llama 3 and GPT-4.

The remaining tests were mainly conducted using datasets.

In instruction-following capability, Apple conducted the IFEval test, where the cloud AFM surpassed GPT-4 at both instruction and prompt levels, setting a new SOTA.

The on-device model also outperformed similarly sized models like Llama 3-8B and Mistral-7B.

In the AlpacaEval, both on-device and cloud AFM achieved second place.

Looking at specific task performance, AFM achieved SOTA in summarization tasks within writing benchmarks and was close to first place in writing tasks.

In mathematics, Apple used the GSM8K and MATH datasets for evaluation.

The on-device model lagged behind L

lama 3-8B and Microsoft's Phi 3 mini on GSM8K, with the cloud version being surpassed by GPT-4 and Llama 3-70B but outperforming GPT-3.5.

The MATH results were relatively better, with the on-device version leading similar-sized models, and the cloud version surpassing Llama 3-70B.

Beyond performance, safety is also crucial. Apple assessed AFM's resistance to adversarial attacks through human evaluation.

Results indicated that AFM's violation rate was significantly lower than other open-source and commercial models when faced with adversarial prompts.

These are some noteworthy contents from Apple's large model technical report, with more details available in the report.

One More Thing

Although Apple Intelligence has been provided for developer testing, Bloomberg reported that the official version might be delayed.

Indeed, according to Apple's previous version release patterns, the 18.1 version number implies that these features won’t be launched with the new devices in September.

Analyst Gene Munster suggested that Apple should consider delaying the iPhone 16 release date to align with Apple Intelligence, a significant step towards an All in AI future.

Whether Tim Cook will consider this suggestion remains to be seen.

Report link: 《Apple Intelligence-All in AI Tools》