Fujitsu has developed a new generative AI reconfiguration technology, a core technology for its AI service “Fujitsu Kozuchi,” that reduces the weighting and power consumption of large-scale language models (LLMs). This technology has successfully enhanced its LLM “Takane.” This technology consists of two core technologies: the world’s highest-precision quantization technology, which minimizes the weights assigned to the connections between neurons that form the basis of AI thinking; and the world’s first specialized AI distillation technology, which achieves both lightweighting and accuracy exceeding that of the original AI model. By applying this quantization technology to “Takane,” we achieved the world’s highest accuracy retention rate of 89% compared to before quantization, with 1-bit quantization (reducing memory consumption by up to 94%), and three times the speed before quantization. This significantly exceeds the accuracy retention rate of less than 20% of the conventional mainstream quantization method (GPTQ). This technology enables large generative AI models that previously required four high-end GPUs to be executed at high speed on a single low-end GPU.
The dramatic weight savings achieved by this technology enable AI agents to be run on edge devices such as smartphones and factory machines. This will improve real-time responsiveness, strengthen data security, and dramatically reduce power consumption during AI operation, contributing to a sustainable AI society. Fujitsu will begin gradually offering trial environments for “Takane,” which applies quantization technology, from the second half of fiscal year 2025. Furthermore
, starting today, we will progressively release models of Cohere’s research-oriented open weight “Command A,” quantized using this technology, via Hugging Face. Fujitsu will continue to dramatically improve the capabilities of generative AI while promoting research and development to ensure its reliability, thereby contributing to solving the more difficult challenges facing customers and society and pioneering new possibilities for the use of generative AI.
In recent years, generative AI has evolved into AI agents that perform tasks autonomously, and its industrial implementation has progressed rapidly. However, the underlying LLMs have become larger in scale, requiring a large number of high-performance GPUs, which has led to significant issues such as rising development and operation costs and the environmental burden caused by high power consumption. Furthermore, for companies to fully utilize generative AI in their operations, it is essential to improve the accuracy of models tailored to specific tasks, rather than simply using general-purpose models, and to make them lightweight enough for use on edge devices in factories and stores.
こちらもお読みください: NTT-AT Equips WinActor RPA with Built-In AI Functionality
Two core technologies that make up generative AI reconstruction technology
Many of the tasks performed by AI agents require only a small portion of the general-purpose capabilities of LLMs. In designing LLMs, the generative AI reconfiguration technology developed here is inspired by the human brain’s ability to reconfigure its neural circuits to specialize in specific skills, reorganizing them in response to learning, experience, and changes in the environment. It efficiently extracts only the knowledge necessary for a specific task from a huge model with general knowledge, creating a lightweight, highly efficient, and highly reliable AI model, similar to the brain of an expert. This is made possible by the following two core technologies of this technology.
Quantization technology that makes AI thinking more efficient and reduces power consumption
This technology compresses the vast amount of parameter information that forms the basis of generative AI thinking, significantly reducing the weight, power consumption, and speed of generative AI models. Previous methods posed a challenge in neural networks with many layers, such as LLMs, due to the exponential accumulation of quantization error. Based on theoretical insights, Fujitsu Laboratories developed a new quantization algorithm (QEP: Quantization Error Propagation) that prevents quantization error from increasing by propagating it across layers. Furthermore, by utilizing QQA, the world’s most accurate optimization algorithm for large-scale problems developed by 富士通 Laboratories, we achieved 1-bit quantization of LLMs.
Specialized AI distillation technology that condenses specialized knowledge and improves accuracy
This technology optimizes the structure of AI models so that the brain reinforces necessary knowledge and organizes unnecessary memories. First, we generate a diverse set of candidate models by pruning the base AI model to remove unnecessary knowledge and adding Transformer blocks to impart new capabilities. Next, we use Neural Architecture Search (NAS), a proprietary proxy evaluation technology, to automatically select the optimal model from these candidates that balances customer requirements (GPU resources, speed) and accuracy. Finally, we distill knowledge from training models such as “Takane” into the selected model. This unique approach goes beyond simple compression and achieves accuracy that exceeds that of the base generative AI model for specialized tasks.
In a demonstration of a text QA task using our CRM (customer relationship management) data to predict the outcome of each sales deal, we confirmed significant improvements in accuracy, increasing inference speed by 11 times and improving accuracy by 43% by using a model distilled only from task-specific knowledge based on past data. By simultaneously achieving high accuracy and model compression, we have confirmed that accuracy exceeding that of the teacher model can be achieved with a lightweight student model with 1/100th the parameter size, reducing the required GPU memory and operating costs by 70% while enabling more reliable predictions of deal outcomes. Furthermore, in image recognition tasks, we have succeeded in improving the detection accuracy of untrained objects by 10% compared to existing distillation technology. This is a breakthrough achievement, more than triple the accuracy improvement in this field over the past two years.
ソース 富士通