IBM just announced a new collection of AI models, the third generation of Granit LLM. The foundation models of the new collection are the Granite 3.0 2B Instruct and Granite 3.0 8B Instruct models (Teach meaning that these models can understand and execute instructions more accurately). The models were trained on over 12 trillion arguments in 12 different human languages and 116 different programming languages. These templates all come with an open source Apache 2.0 license. It is also important to note that IBM Granite models are indemnified against legal issues with training data when used on the IBM watsonx AI platform.
Enterprise use for smaller granite models
IBM designed the new 2B and 8B Granite models to handle a wide range of common enterprise tasks. Think of these patterns as convenient tools for everyday language tasks like summarizing articles, finding important information, writing code, and creating explanatory documents. Patterns also perform well in common language tasks such as entity extraction and augmented generation with feedback that improves text accuracy. According to IBM, by the end of 2024, Granite 3.0 models will be able to understand documents, interpret graphics and answer questions about a GUI or product display.
AI agents are rapidly becoming more important, and creating agent use cases is a new capability for Granite 3.0 that was not previously available in IBM language models. Agentic use cases can proactively identify needs, deploy tools, and initiate actions within predefined parameters without human intervention. Typical use cases for agents are virtual assistants, customer service, decision support and recommendations, and a variety of other complex tasks.
Speculative AI decoders are also a new offering from IBM. Decoders optimize the generated text of an LLM by making assumptions about the identification of future tokens. IBM’s speculative decoder called Granite 3.0 8B Accelerator can speed up text output by up to 2x during completion.
Granite 3.0 models will get another update in a few weeks. IBM will increase their context size from 4,000 to 128,000 tokens, which is a key enabler for longer conversations as well as the RAG tasks and agent use cases mentioned above. By the end of the year, IBM plans to add vision data to the models, which will increase their versatility and allow them to be used in more applications.
Standards for performance and cyber security
The Hugging Face LLM leaderboard rates and ranks open source LLMs and chatbots by benchmark performance. The chart above shows how the IBM Granite 3.0 8B Instruct model compares to the Llama 3.1 8B Instruct and the Mistral 7B Instruct. The Granite 3.0 2B Instruct model performs equally well compared to other top models.
IBM Research’s cybersecurity team helped identify high-quality data sources that were used to train the new Granite 3.0 models. IBM Research also helped develop the public and proprietary benchmarks needed to measure the performance of the cybersecurity model. As shown in the graph, the IBM Granite 3.0 8B Instruct model was the best performer in all three cybersecurity benchmarks against the same Llama and Mistral models mentioned above.
Future Models of Granite Mixing Experts
At some point in the future, IBM plans to release several smaller, more efficient models, including the Granite 3.0 1B A400M, a 1 billion parameter model, and the Granite 3.0 3B A800M, a 3 billion parameter model. Unlike the Granite 3.0 models discussed above, future models will not be based on the dense transformer architecture, but will instead use a mixed expert architecture.
The MM architecture divides a model into several specialized sub-networks of experts for more efficiency. MM models are small and light, but still considered best in class for efficiency, with a good balance between cost and power. These models use a small fraction of their overall parameters for inference. For example, the MM model with 3 billion parameters uses only 800 million parameters during inference, and the MM model with 1 billion parameters uses only 400 million parameters during inference. IBM developed them for applications such as edge server and CPU deployment.
In 2025, IBM is planning to scale its largest MM architecture models from 70 billion parameters to 200 billion parameters. Initially, the models will have language, code and multilingual skills. Video and audio will be added later. All of these upcoming Granite models will also be available under Apache 2.0.
Granite Guardian Patterns
Along with the Granite 3.0 2B and 8B models, IBM also announced a Granite Guardian 3.0 model, which acts as a guardrail for the inputs and outputs of other Granite 3.0 models. When monitoring logins, Granite Guardian looks for jailbreaking attacks and other potentially harmful requests. To ensure safety standards are met, Granite Guardian also monitors LLM scores for bias, fairness and violence.
These models also provide task-based hallucination detection that anchors model outputs to specific data sources. In a RAG workflow, Granite Guardian verifies whether a response is based on the given base context. If the response is not context-based, the model marks it as an exception.
By 2025, IBM plans to reduce the size of Granite Guardian models to somewhere between 1 billion and 4 billion parameters. The reduction in model size makes them more versatile and accessible. It will also allow for wider reach across industries and applications as diverse as end devices, healthcare, education and finance.
The continued evolution of IBM granite designs
IBM’s Granite 3.0 models are high-performance open source models with standards to support their performance and security. IBM plans to add new developer-friendly features to these models, such as structured JSON requests. As with previous Granite models, updates will be made regularly to ensure the models remain current. This means we can be on the lookout for a conveyor belt of new features as they develop. Unlike some of the competing open source templates with custom licenses, Granite templates’ lack of restrictions in its Apache 2.0 license makes them adaptable for a wide variety of applications.
It looks like IBM has big plans for the future of the entire Granite 3.0 collection.