Ricoh announced that it has developed a multimodal large language model that can read documents with diagrams and charts from Japanese companies. It is built on the Qwen2.5-VL-32B-Instruct model from Alibaba Cloud. The development is part of the second phase of the GENIAC project run by Japan’s Ministry of Economy, Trade and Industry and NEDO. Ricoh had earlier released a basic 70 billion parameter model for free.
Using feedback from that model, Ricoh has now built a more compact, higher-performance version. It is designed to be easier to deploy and use in real applications. They also offer a 4-bit quantized version for lighter setups.
Also Read: pyscn-bot Launches as a Code Audit AI Agent Built for the LLM Development Era
To train the model, Ricoh used about 600,000 images from business documents, including characters, pie charts, bar graphs, and flow charts. The model was tested with benchmarks like the Japanese question-answering dataset JDocQA, which uses both text and visual information, and it outperformed other models.
The new LMM will be available individually for customers and will also be included in the ‘RICOH On-Premise LLM Starter Kit’ offered by Ricoh Japan.

