Turing Inc. has developed a technology (patent pending) that can efficiently compress large amounts of video and image data while retaining it with high accuracy in a format suitable for AI. This technology combines a learning technique that locally aggregates important information with data allocation according to importance, enabling high-speed and high-accuracy data utilization in autonomous driving AI, multimodal AI, and other applications.
In recent years, multimodal large-scale language models (MLLMs), which simultaneously handle multiple types of data such as images and text, have been attracting attention, and the need for advanced development to input large amounts of data is increasing. However, with conventional image data embedding technology, there was an issue that it was difficult to efficiently transfer information in a form optimized for AI.
Overview of the technology
The technology developed by Turing provides a mechanism for efficiently compressing huge amounts of data while retaining necessary information with high accuracy. It converts various information such as text and images into a string of tokens (the smallest unit for AI processing) and introduces a mechanism (variable-length compression) that can increase or decrease these as needed. This makes it possible to significantly reduce data volume while maintaining the necessary image quality and analysis accuracy.
Among these, we have introduced a technique called “Tail Token Drop” that randomly deletes the end of the token string during the learning stage and compares the differences to optimize the model, so that important information is concentrated at the beginning of the data string. This has realized a design that is less likely to lose important parts even when the compression rate is increased.
In addition, this technology can reconstruct images from token sequences, making it possible to reconstruct visually natural images using fewer bytes than conventional image formats such as JPEG and WebP. In the future, this technology is expected to be applied to autonomous driving and cloud-linked systems, where real-time performance and communication costs are particularly important.
SOURCE: PRTimes