딥 네트워크 - 딥러닝 모델 분석/네트웍 통신/카메라 3A 튜닝 분야

[Introduction to the Know-how of Handling Key Issues in Detail] When implementing LLM (Large Language Model) with Google Transformer Model in TensorFlow environment… 본문

Kernel Porting/Linux

[Introduction to the Know-how of Handling Key Issues in Detail] When implementing LLM (Large Language Model) with Google Transformer Model in TensorFlow environment…

파란새 2024. 3. 15. 15:16

I am Seokwon Jang, a technical advisor specializing in ultra-large model technology at DeepNetwork. I approached the commercialization preparation of the ultra-large language model, ChatGPT, somewhat vaguely three years ago. In fact, many corporate officials may wonder if a one-person company like me can understand the implementation know-how of an ultra-large language model like ChatGPT.

For over three years, I have been reviewing and analyzing two foreign papers related to LLM (Large Language Model) every day. After reviewing and analyzing papers for three years, I learned what global companies are concerned about when implementing ultra-large models. The content I have reviewed and analyzed for three years is roughly that I have carefully studied about 100 key issues in the implementation design field of deep learning. Based on this, I first analyzed the design structure of LLM.

Once I understood the design structure of LLM to some extent, in the case of Facebook, it was said that they proceeded with LLM learning by designing and building a clustering of 16,000 NVIDIA A100 GPUs. If such a huge infrastructure cost is incurred, it may be difficult to secure profits, so I looked at a considerable part of the papers that research to reduce the parameters of LLM while maintaining performance. I also reviewed and analyzed papers on what preparations are needed to implement on-device AI, which is a model lightweight.

It was also possible to understand that lightweight implementation can be largely processed with Quantization design and knowledge distillation techniques. The Quantization part is absolutely necessary for on-device AI design and integration design within SOC, and NVIDIA has already integrated FP8 function in H100 GPU.

Did DeepNetwork only review and analyze up to here? This is not all. When implementing LLM or sLLM, it is possible to review and analyze what issues should be considered when customizing each part of several layers of Google Transformer model in TensorFlow environment and how to solve these issues.

I can’t speak English, but I can send and receive English emails. I ask for careful review by AI managers of domestic and foreign global companies. I have learned how to catch fish, so please don’t underestimate the LLM and sLLM technology of DeepNetwork just because there are no fish caught right now. 

 

Deep Network, a one-person startup specializing in consulting for super-large language models  

E-mail : sayhi7@daum.net    

Representative of a one-person startup /  SeokWeon Jang