NTT and DOCOMO just pulled off something that could quietly redefine how personalization actually works in marketing. They’ve built what they’re calling a ‘Large Action Model’ or LAM, an AI system trained not on text like ChatGPT, but on the sequence of human actions across time. The premise is sharp: predict what a person intends to do next based on how they behave across digital and physical touchpoints, and then tailor the outreach accordingly.
This isn’t theory. Using LAM on DOCOMO’s own customer base, they managed to double the success rate of telemarketing orders for mobile and smart life services. The model studied real-world data in the ‘4W1H’ format (Who, When, Where, What, and How) spanning app usage, store visits, purchases, and call interactions. It didn’t just flag leads but also figured out the best timing and method for engagement, allowing agents to reach people when they were actually ready to act.
The best part is the efficiency. NTT and NTT DOCOMO pulled this off with just 145 GPU hours on eight NVIDIA A100s, which is about 1/568th the cost of training a standard large language model like Llama 1 7B. For enterprise AI teams, that’s a quiet but powerful flex, not only proving that generative AI can extend beyond text, but also that it can be built lean and purpose driven.
Also Read: Mercari Launches ‘Biz Delivery’ to Boost E-Commerce
LAM learns intent from the order of behavior. A call before a purchase means awareness, a call after browsing suggests persuasion, and a call post purchase signals support. The model captures these subtle contextual cues to predict what’s most likely to convert next. That’s what makes it so different from conventional recommendation systems that treat user actions as isolated dots instead of connected signals.
While marketing is the current playground, NTT is already looking at using the same logic in healthcare and energy. For healthcare, LAM could analyze treatment histories to predict patient outcomes or optimize prescriptions. In the energy sector, it could learn from time series data of solar radiation and power generation to forecast energy output more precisely.
At its core, LAM is a glimpse into the next chapter of AI, one that shifts from understanding language to understanding behavior. If large language models helped machines read us, large action models might finally help them anticipate us.

