Generative AI quickly gained acceptance after its appearance. Currently, many generative AI services have been released, including Open AI’s “ChatGPT”, Google’s “Gemini”, and Microsoft’s “Copilot”. Generative AI has become a major movement, but SoftBank is working on it widely, from developing large-scale language models (LLMs) to providing corporate services such as “Gen-AX” and “TASUKI”, as well as the generative AI service for individuals “satto” and partnering with “Perplexity”. One of the most active efforts is the development of an AI computing platform that learns large amounts of data and performs complex calculations in order to develop basic AI technology. In September 2023, SoftBank will begin operating an AI computing platform with 0.7 exaflops (1 exaflop is the ability to perform 100 quadrillion floating-point operations per second) using over 2,000 “NVIDIA Ampere GPUs”. By October 2024, SoftBank will have more than 4,000 NVIDIA Hopper GPUs, increasing its computing power to 4.7 exaflops. Ashiq Khan, head of SoftBank’s Common Platform Development Division, said, “At SoftBank, we position AI as an important pillar of our business strategy, and we proceeded with the construction of the AI computing infrastructure at a great speed. It actually took about two to three months to build. It is currently being used for learning the Japanese LLM “Sarashina” at our subsidiary SB Intuitions, and will be provided to domestic companies and research institutions in the future.” Khan explained the reason why the AI computing infrastructure was built on a scale unprecedented in Japan in such a short period of time, saying, “It was because of the technical capabilities we have had in building data centers up until now. We are promoting in-house production, with our own employees taking on a wide range of technical management and integration, such as designing network equipment and high-speed storage.”
In addition, the company places great importance on relationships with partner companies. “Many stakeholders are involved in building a data center, including NVIDIA, server vendors, and switches. By parallelizing the building process that comes from collaboration with each company, we can shorten the construction period,” he explains. Such advanced parallelization is possible because of the detailed knowledge and experience gained in building large-scale high-performance computing (HPC) infrastructure. SoftBank has set up a team of internal and external members to build an AI computing infrastructure. “The team does not have many members, and we do the design and integration of the equipment and rack planning in the data center, GPU servers, networks inside and outside the data center, and HPC storage in-house. Because the partner ecosystem is very diverse, international and advanced project management is also essential,” he says. Khan’s organization has members who can handle all infrastructure design, including IT, telecommunications, and enterprise cloud, and Khan himself is an expert in infrastructure design for over 20 years. “Working with members who are skilled in each platform design allows us to complement each other. We can proactively tackle new challenges such as this AI computing platform by overcoming hurdles,” said the diverse team members. The fact that one organization had all the platform know-how allowed them to build a completely new large-scale AI computing platform at this speed. The need for high computing power is sure to increase Although the company built the largest AI computing platform in Japan, it faced many challenges during construction. “Because of its large scale, there were a lot of devices, and therefore a considerable number of wires, but SoftBank has the know-how to build its own data center, so it was easy to find a solution. We were able to find a solution by accumulating data from many platform designs and by working with partners both in Japan and overseas,” he said, making use of past efforts in every aspect. Regarding the significance of working on building an AI computing platform in an unexplored area, Khan said, “AI development, including LLM, requires a huge amount of learning. Currently, AI computing platforms are mainly used for learning LLMs around the world, and SoftBank is working on the development of the largest LLM in Japan.
By quickly introducing an AI computing platform, development can proceed quickly, and the technology’s independence and uniqueness in business will increase. I hope that this trend in AI development will spread to Japan in the future.” He looks forward to further expansion of demand in AI development. What is of concern is the computing power of the AI computing platform. The larger the model size of the generative AI, the higher the computing power required. With computing power, AI development can be advanced quickly and new services can be launched one after another. Khan said, “There is definitely a need (for increasing computing power),” and that further increases are expected in the future. Another thing that is required is the amount of electricity. “Compared to conventional data centers, AI data centers require more electricity. To address concerns about insufficient power, SoftBank has announced a policy to utilize renewable energy and energy derived from nature. Japan’s data centers are mostly located in the Kanto and Kansai regions, and have a bipolar structure. This puts a strain on power companies. Data centers, which will become the next generation of social infrastructure, should be distributed throughout the country,” he said. SoftBank is currently strengthening its AI computing infrastructure using the “NVIDIA DGX B200” (B200). The current AI computing infrastructure, which has been built using the “NVIDIA DGX H100” (H100) and other devices, is a high-performance computing infrastructure that is ranked 16th in the world ranking “TOP500,” a ranking of supercomputers that compete for computing speed announced in November 2024, and second in Japan after “Fugaku,” developed by the RIKEN and Fujitsu. Compared to the H100, the B200 has more than 2.25 times the computing power.
Also Read: AKA Virtual Signs a sales agency Agreement with MOVE AI
The computing platform built using the B200 will be used to develop SB Intuitions’ goal of a 1 trillion parameter LLM. In the future, the company is also considering providing it to external parties. Khan said, “We are actively developing AI data centers, which are still unheard of, because SoftBank is a cutting-edge company in the world both technologically and business-wise, and we want to maintain that position. By introducing an AI computing platform before other companies, we were able to get to work on the development of LLMs early on, and increase the uniqueness of our own business. This is one of the reasons we took the initiative in working on the AI computing platform. We want to continue to keep up with domestic and international trends and work to build cutting-edge AI computing platforms and data centers.”
SOURCE: Yahoo