딥 네트워크 - 딥러닝 모델 분석/네트웍 통신/카메라 3A 튜닝 분야

I am Seok-won Jang , the head of DeepNetwork, a sole proprietorship that has achieved some degree of success in securing detailed implementation strategies in the TensorFlow environment based on the detailed analysis of the structure design of the LLM. 본문

Kernel Porting/Linux

I am Seok-won Jang , the head of DeepNetwork, a sole proprietorship that has achieved some degree of success in securing detailed implementation strategies in the TensorFlow environment based on the detailed analysis of the structure design of the LLM.

파란새 2024. 4. 7. 10:38

I am 60 years old this year… I have been working in the field of Information and Communication Technology (IT) for 30 years… For the past 10 years, I have been a self-employed individual providing development services in the firmware sector of IT… I am going to talk about life in today’s world after 30 years of social life… From now on, I have been reviewing and analyzing 2-3 papers related to Large Language Models (LLM) every day for the past 3-4 years, which is my biggest area of interest… Leaving all sorts of stories aside, I am going to talk about why I have been interested in and analyzing the construction of a GPU Cloud development environment recently… I have also been reviewing and analyzing papers related to LLM for about 3 years… Through paper review and analysis, I have a certain understanding of the design structure of LLM… I have analyzed how to implement it in TensorFlow API, i.e., Python… Although it is not perfect, it has been analyzed to some extent… As I was reviewing and analyzing, there was no place to mention in detail about the development environment for the main issues of distributed learning and parallel learning when implementing LLM… I judge that the key to building a LLM development environment in response to the issue of distributed learning of LLM is to use the concept of Docker container design issue to separate each software development environment… Containers are processes that run in an isolated environment, and each container has an independent file system, network, and execution space. This allows multiple different software development environments to be operated independently on a single server PC… I support the environment for developers to develop TensorFlow containers in an isolated environment with Docker when talking about the LLM distributed development environment… So why this is important is as follows… The reason why you only need the NVIDIA driver to use TensorFlow that supports GPU is because Docker provides an image of TensorFlow that uses GPU. This image already has a CUDA environment that matches the version of TensorFlow. In other words, TensorFlow and CUDA toolkit are installed in Docker, and if there is only NVIDIA driver in the host, you can use GPU. This way, you can avoid the hassle of installing or matching the version of the CUDA toolkit… I can also build a container in an isolated environment with Docker related to distributed learning, but everything is not ready, but I almost understand the key point…
To understand how TensorFlow image, NVIDIA driver, and CUDA environment work together, you need to know the role of each and how they interact.
TensorFlow Image: The TensorFlow image is provided by Docker, and the CUDA environment that matches TensorFlow is already set. This image contains all the software components and libraries needed to run TensorFlow applications. Through this, users can use TensorFlow immediately without a complicated setting process.
NVIDIA Driver: The NVIDIA driver is installed on the host system and acts as an intermediary between the GPU hardware and the operating system. The driver receives requests for GPU from the operating system or application, and converts the request into a command that the GPU can understand. Therefore, the NVIDIA driver is a key element that allows TensorFlow to use GPU.
CUDA Environment: CUDA is a parallel computing platform and API set developed by NVIDIA. CUDA enables high-performance parallel computing by utilizing the computational power of the GPU. The CUDA toolkit is included in the TensorFlow image, which supports TensorFlow operations using the GPU.
These three elements work together so that Docker can use TensorFlow that supports GPU on Linux. The TensorFlow image provides all the necessary software and libraries, and the NVIDIA driver allows this software to communicate with the GPU hardware. Finally, the CUDA environment accelerates TensorFlow’s operations by utilizing the parallel processing power of the GPU. The reason why all three elements must operate for Docker to use TensorFlow that supports GPU on Linux is because they interact with each other to enable TensorFlow’s GPU accelerated operation.
 
 I started analyzing deep learning papers about 4 - 5 years ago…  Initially, I started analyzing papers to apply deep learning to the vision field… Then, I also analyzed papers to understand the detailed structure of ultra-large language models… When implementing ultra-large language models, I looked at what techniques global corporations used in their papers and how the detailed implementation of the Google Transformer model was applied using the Google TensorFlow API, how the detailed layers of the Transformer Model were implemented, and how Python syntax was applied… As I looked into this, I realized that it was very important to solve the question of how the GPU Cloud Server Infra is designed… I have succeeded in understanding how the GPU Cloud Server Infra is designed, not 100%, but the big core principle… Nowadays, even large corporations seem to realize the importance of these things and are focusing on securing know-how in GPU Cloud Server Infra design…
Docker uses the concept of containers to separate each software development environment. A container is a process that runs in an isolated environment, and each container has its own file system, network, and execution space. This allows you to operate multiple different software development environments independently on a single server PC. Docker creates and manages these containers, using something called Docker images in the process. Docker images contain all the file systems, settings, dependencies, etc. needed to run a container. You can create and run containers based on this image, and if necessary, you can change the state of the container to create a new image. These created images can be shared through remote repositories like Docker Hub, and any system with Docker installed can download the image and run the container. In this way, Docker ensures the consistency of the development environment and facilitates the execution of software in various environments.
Therefore, by using Docker, you can install and operate multiple different software development environments independently on a single server PC, and these environments operate independently of each other.

 

Lately, there’s a buzz that Nvidia’s market capitalization is around 2 trillion dollars…  Nvidia has been manufacturing GPUs such as V100 / A100 / H100 at TSMC in Taiwan… The issue of super-large language models is flooding the media in Korea and abroad… Global companies are desperate to secure the original technology of how to develop something like a super-large model GPT-3.5… In the case of such a GPT-3.5 Model, the detailed model design structure information is confidential, and I know that the model design structure of GPT-3 is also non-disclosed… So, EleutherAI, an open-source version of GPT-3, has learned and released the GPT-J model, and there seem to be quite a few places preparing for development based on this… Because it is possible to understand how GPT-3 has what characteristics when it has what structure with this model. GPT-J and GPT-NeoX are GPT Opensource made by a place called EleutherAI. EleutherAI is an open-source AI organization made by voluntary researchers, engineers, and developers. It is known to make Large Language Models (LLM) open source. I also have a considerable understanding of the detailed structure information of the GPT-3 model necessary for understanding and distilling the model structure of GPT-3.5 or GPT-3, and how to apply and implement this with the TensorFlow API… But after studying this super-large model for about 3 years, I felt that it was necessary to understand how to operate the GPT-3 Model on the cloud GPU server and how to operate the TensorFlow development environment on the A100 / H100 GPU on the cloud GPU server… To implement this, Docker design technology is needed… Nvidia GPU can be operated with Docker technology… To operate Nvidia GPU with Docker design technology, software development of a function that operates it, which is a function of Nvidia library, is necessary. These days, it’s a boom in Korea’s fabless to make NPU, and if this is to grow like Nvidia, software development of a function that operates it, which is a function of Nvidia library (in the case of Korea’s fabless, it is a function of the operation library of NPU (Neural Processing Unit) developed by Korea’s fabless), is necessary to operate Nvidia’s GPU like Nvidia. I have a plan for this, but I haven’t been informed about this yet, so there is no suggestion to jointly identify technical issues, but I hope you will contact me after seeing this blog…


Deep Network, a one-person startup specializing in consulting for super-large language models
E-mail : sayhi7@daum.net
Representative of a one-person startup / SeokWeon Jang