パナソニックR&Dカンパニー・オブ・アメリカ(PRDCA)とパナソニックホールディングス株式会社(Panasonic Holdings Co.(パナソニックHD)の研究者と共同で 北京大学, 復旦大学, カリフォルニア大学バークレー校そして 上海交通大学, have developed “SparseVLM,” a technology that reduces the weight of the Vision-Language Model (VLM), an AI model that understands visual information (image and video information) in language.
In recent years, VLM, an AI model that processes visual and text information simultaneously and answers questions about visual information, has been actively developed. However, there is a problem that the amount of information handled by the AI model increases, especially with high-resolution images and long videos, resulting in increased inference time and calculation volume. The developed “SparseVLM” takes a new approach of processing (sparse) only the visual information related to the input prompt, and has succeeded in significantly reducing inference time and calculation volume while maintaining high accuracy in answering questions about images.
本技術の先進性が国際的に認められ、AI・機械学習技術のトップ会議である第42回機械学習国際会議(ICML2025)での発表が決定しました。本技術は、2025年7月13日から7月19日までカナダのバンクーバーで開催される同会議で発表される予定です。
こちらもお読みください: チャットセンス、自治体向けRAGデモを開始
The newly developed “SparseVLM” is a technology that roughly doubles the processing speed while maintaining question-answering accuracy by considering input prompts that were not taken into account in conventional VLM lightweighting methods. It is expected to be used in many fields that require rapid recognition and verbalization of the user’s state and surrounding environment from visual information.
パナソニックHD will continue to accelerate the implementation of AI in society and promote research and development of AI technologies that will contribute to improving our customers’ lives and workplaces.
ソース PRタイムズ


