Data Analytics Lab Co., Ltd. is pleased to announce that it has compiled research results on AI-generated audio content judgment technology as part of a joint research project conducted as part of a project selected by the Ministry of Internal Affairs and Communications for the “Development and Demonstration Project of Countermeasures against False and Misleading Information on the Internet” by Evixar Co., Ltd.
This research was conducted with the aim of improving countermeasures against false and misinformation (such as deepfakes), which is becoming a serious problem as generation AI becomes more sophisticated, by combining Evixar’s acoustic signal processing technology with our company’s AI and data analysis technology.
Summary of Research Results
In this study, we reproduced and analyzed the characteristics of speech content generated by AI, and constructed verification data and an AI model verification environment for determining synthesized speech.
こちらもお読みください: マイクロソフト、日本のAI変革を加速するために$100億円を拠出
Main Achievements
Construction of a verification platform that supports diverse speech generation models
For the analysis of synthesized speech, including Japanese,
- Tortoise
- XTTS (Multilingual Model)
- Qwen3-TTS
We investigated and compared cutting-edge speech generation models and conducted verification tests on multiple generation methods.
In particular, by supporting multilingual, large-scale learning-based speech generation technologies such as XTTS, we conducted verification under conditions close to those of a real-world generative AI environment.
Systematic generation and feature analysis of synthesized speech data
In this study,
- Organization and systematization of conditions for generating synthesized speech data
- Analysis of audio signals (spectrograms, etc.)
- Extraction of structural differences from natural speech
We conducted this initiative to quantitatively capture the characteristics of synthesized speech.
This led to our efforts to research and develop a general-purpose judgment technique that is independent of generative models.
Verification of a deep learning-based synthesized speech recognition model
Regarding the determination of generated speech,
Research and validation of deep learning models
- Building a training dataset
- Development of a process for evaluating the accuracy of the judgment
We conducted this test and confirmed to a certain extent the effectiveness of the judgment model that utilizes the unique characteristics of AI voice.
Technological advancement through the integration of acoustic signal processing and AI
In this research, we combined Evixar’s acoustic signal processing technology with our AI technology to provide verification support for enhancing Evixar’s synthesized speech recognition system (EAF). Specifically,
- Generation of synthesized speech data and verification of the operation of various generative models
- Understanding the differences between synthesized speech and natural speech through feature analysis of audio signals.
- Verification of the accuracy of judgments using deep learning models and construction of evaluation datasets.
We implemented this initiative and worked to provide technical knowledge aimed at improving the accuracy of EAF (Expert Advisory Function) assessments.
ソース PRタイムズ


