Apple releases its family of OpenELM minis

Apple released a family of mini models called OpenELM . What's remarkable is that the model is accompanied by code to convert the models into an MLX library for output and fine tuning on Apple devices.

Apr 28, 2024 0 233

Apple releases its family of OpenELM minis

Apple introduced OpenELM, a family of open-source language mini-models that can run on devices regardless of connection to cloud servers.

“We're introducing OpenELM, a family of powerful, open-source language models. OpenELM uses a layered scaling strategy to efficiently distribute parameters at each layer of the transformer model, resulting in improved accuracy."
Apple on Hugging Face

About the OpenELM family

In an article describing the family of models, Apple states that the development of OpenELM was led by Sachin Mehta, with additional contributions from Mohammed Rastegari and Peter Zatloukal.

The family consists of eight mini-models designed for text generation. OpenELMs have between 270 million and 3 billion parameters and have been trained on 1.8 trillion token datasets such as Reddit, Wikipedia, arXiv.org and others.

OpenELM is suitable for running on regular laptops or smartphones. All models in the new family use a unique layered scaling strategy to efficiently distribute parameters within each layer of the transformer model. According to Apple, this allows them to produce results with increased accuracy while maintaining computational efficiency.

In terms of performance, the OpenLLM results published by Apple on Hugging Face show that the models perform quite well, especially the variant with 450 million parameter instructions.

In the ARC-C benchmark, designed to test knowledge and reasoning skills, the pre-trained version of OpenELM-3B achieved an accuracy of 42.24%. Meanwhile, he scored 26.76% and 73.28% on MMLU and HellaSwag respectively.

Why do we need mini-models from Apple?

Small language models may not have the comprehensive knowledge base or conversational capabilities of ChatGPT or Gemini, but they are effective at solving specific problems and queries and are generally less error-prone.

While Apple didn't mention any specific use cases for AI models, it did offer weightings for them. The weights are available under a license that permits use of the code for both research and commercial purposes.

Notably, the company also acquired French startup Datakalab , which works with computer vision models.