To develop an NPU inference chipset, I gathered data on how companies like NVIDIA considered handling specific parts of the attention mechanism, such as how to process certain components and what methods to use.
DeepNetwork CEO / Seokwon Jang / HP: 010 3350 6509 / sayhi7@daum.net
Hello, my name is Seokwon Jang, and I am the CEO of the one-person AI startup, DeepNetwork. Our DeepNetwork team has been analyzing around 700-800 papers related to LLM analysis. Models like ChatGPT are shaking up the global AI market, aren't they? Even in the United States, Sam Altman has secured massive investments for infrastructure costs and other aspects to develop risky services like ChatGPT. In Korea, it seems that large companies have not aggressively pursued such developments due to the risk and high infrastructure costs, likely because Korea does not have as much capital as the United States.
Now, let me tell you about DeepNetwork. When I started analyzing LLMs, I believed that the foundation of LLMs was the Google Transformer model. I spent a lot of time analyzing the detailed implementation and working principles of the Google Transformer model. While analyzing the principles of the Google Transformer model, I realized that understanding how the TensorFlow development environment is constructed and operates would clarify the operating principles of the Google Transformer model even more. To build a distributed and parallel learning environment for TensorFlow, I wondered what I needed to study. I learned that NVIDIA's CUDA development environment is required for handling distributed or parallel learning. As I delved deeper into these analyses, I began to ponder how to apply these design structures to construct the infrastructure for developing NPU AI chipsets.
Then I wondered how to design inference-specialized NPU chipsets with specific functions. I could understand how NVIDIA established the design structure by examining their efforts and concerns in implementing matrix operation parallel calculation mechanisms. For developing an NPU inference chipset, I gathered data on how NVIDIA and others considered handling specific parts of the attention mechanism. When considering these aspects, I realized that I must consider all the concerns NVIDIA had while trying to profit, not just a few aspects. I am contemplating which parts of my envisioned NPU design structure and principles or LLM design structure and principles are suitable and which parts are lacking.