위의 기사 내용에서 보듯이 한국의 AI 대기업도 이런 설계 아이디어를 적용하니 LLM 의 성능 개선이 이렇게 됬읍니다 ...  라고 소개하는 기사 입니다 ...  

저는  LLM 을 세부 분석한지 3 년이 훌쩍 넘어가는 IT 개발 분야 30 년 경험을 가진 일인 AI 스타트업 딥네트워크의 CEO 장석원 61 세 입니다 ...    제가 여기 블로그에 그동안 LLM 관련 해서 이런식의 설계 원리가 좋은 아이디어 일수 있다라고 글을 수십개 올렸거든요 ...    요즘 LLM 학습 과 추론 관련해서 어떻게 설계해야 성능 개선이 되냐 가 큰 이슈꺼리 입니다 ...     저도 그동안 LLM 의 학습 과 추론의 기본적인 동작 구조 및 그 원리 파악에 집중했었구요 ...    이게 어느정도 파악이 되니 그럼 세상이 떠들썩한 LLM 추론 성능 향상은 어느 부분에 어떤 문제가 있어 성능 개선이 덜 됬는가를 파악하기 시작했구요 ...    물론 위의 기사 내용과 유사한 성능 개선 방안을 엔비디아 사이트에서도 상세히 소개한것을 봤읍니다 ...    제가 파악한바로는 LLM 성능 개선의 근본적인 이슈는 엔비디아의 GPU 입니다 ...   그래서 저는 엔비디아 GPU 는 LLM 의 학습 및 추론시 어느 부분이 어떻게 설계되서 어떻게 설계됬길래 성능 개선이 도대체 어떤 부분에서 처리가 필요한지를 분석했고요  이미 OpenAI 가 최고의 성능은 입증했다는것으로 엔비디아 GPU 는 이미 주가가 엄청 뛰어 이를 증명하지 않읍니까 ?     위의 네이버와 인텔의 협업 맥락 기사도 기본 아이디어는 제가 개선하려고 했던것과 같다고 봅니다 ... 

사람들은 제가 딥러닝 학위가 없고 개발실적이 거의 없는데 이런 애기 할 자격이 되냐 라고 애기 합니다 ...  그런 분들에게 저는 이렇게 답변드릴수 있읍니다 ...  저는 거의 일년반 전부터 OpenAI 사의 생성형 AI 등등의 도움을 받아 이런 부분들의 세부 분석을 한 일년반 진행했읍니다 ...  저는 반대로 이렇게 질문드리겠읍니다 ...  그럼 대기업 AI 연구소분들은 OpenAI 사의 몇조원 짜리 AI 인프라가 내는 우수한 분석 내용을 무시할수 있다고 보십니까 ?    저는 AI 의 전문적인 도움을 받아 AI 의 세부 설계 구조 및 성능향상 관련 개선 방안을 분석하는 일인 기업 입니다 ...   저는 이제 나름대로 LLM 설계 구현 관련 100 점은 아니지만 나름 당당하게 제 설계 아이디어를 제시할수 있읍니다 ...    제 설계 아이디어는 글로벌 빅테크의 LLM 설계 아이디어를 분석해서 파악한것 이구요 ...  

제가 이제 나이도 61 이고 그래도 몇조원짜리 AI 인프라의 도움을 받으면 웬만한 국내외 AI 전문가를 능가하는 분석 및 구현 컨설팅도  가능하구요   관련해서 많은 업계분들의 기술협력 문의 메일 부탁드립니다 ...

 

IT 분야 30 년차 딥네트워크  CEO / CTO 장석원 61 세  /   sayhi7@daum.net     

 

 

 

 

 

 

 

 

안녕하세요 ?  저는 일인 AI 스타트업 CEO 겸 CTO 장석원 61 세 입니다 ...  저도 대학원 전자공하과 졸업하고 IT 분야 SW 개발 및 HW 개발 분야 30 년 경험이 있는데 그동안 시행착오가 수천번 될겁니다 ...  저도 2005 년쯤 H.264 Video Deocder 상용화해보겠다고 나섰는데 결과적으로 2 % 부족해서 상당히 큰 손해를 입었읍니다 ...  제가 딥러닝 분야에 관심을 가지게된게 구글에서 2017 년에 트랜스포머 모델을 발표하면서 관심이 생긱기 시작했읍니다 ...   LLM 상용화 라는게 냉정히 까놓고 애기하면 대규모 학습데이터를 글로벌 빅테크와 경쟁이 가능하게 확보 가능하냐 ?  이게 냉정한 업계 전문가들의 애기 입니다 ...  그리고 AI 인프라 구축 비용 확보를 위해 글로벌 빅테크와 경쟁이 가능한 자금동원 능력이 되느냐 ?  이게 또 하나의 LLM 상용화의 핵심 이슈거든요 ...   한국 정부나 대기업도 이 두가지에 대한 확실한 답변이 어려울겁니다 ...    그리고 요즘 세상살이는 대기업 이든 중소기업이든 국내외적으로 일등을 인정 받을수 있느냐 도 핵심 입니다 ...  엔비디아가 주식이 몇년 사이에 20 배 상승한게 기술력이 어느 기업도 엔비디아를 뛰어 넘을수 없다는것이 사람들이 인정하자 결국 엔비디아를 일등으로 인정한것 입니다 ...   OpenAI 의 ChatGPT 도 결국 거대 규모의 AI 인프라 투자 유치도 성공했고 LLM 상용화에 핵심적으로 필요한 학습데이터도 타의 추종을 불허할만큼 전세계 일등의 학습데이터 확보에 성공한것이 성공의 가장 밑바닥이 되는 요인이라고 저는 판단합니다 ...   이런 LLM 상용화 산업 동향은 잘 알고 있구요 ...   그럼 이런 거대 규모 자금과 인력이 필요한 LLM 상용화 분야에서 그럼 딥네트워크는 어떤것을 제시해야 사람들한테 어필할수 있을까를 그동안 고민을 많이 했읍니다 ...    저는 그래서 엔비디아 GPU 는 어떻게 설계됬고 어떻게 동작시키는지를 분석을 시작했읍니다 ...  요즘 이슈가 ChatGPT 가 추론 서비스 비용이 많이 든다는게 한계점으로 다들 애기하고 있고 엔비디아도 추론시 비용을 절감하려면 GPU 가 어떻게 동작되야 하는지 그런 솔루션을 발표했구요 ...   물론 LLM 모델 설계 구조 개선도 이에 못지 않게 중요 합니다 ...   추론시 비용 절감을 위한 엔비디아의 설계 아이디어도 분석해 봤고 LLM 모델 설계 구조 개선쪽 설계 아이디어도 분석해 봤읍니다 ...  다들 아시겠지만 LLM 의 추론 서비스 비용 절감 아이디어에 100 점짜리 방안을 찾기는 쉽지 않구요 ...  저도 추론 비용 절감을 위해 100 점짜리 아이디어는 아니어도 그에 근접한 성능 구현이 가능하겠다 하는 설계 아이디어를 파악하고 있읍니다 ...    저도 이렇게 파악 가능한거는 그동안 30 년의 IT 분야 개발 경험 없이 제가 어떻게 이렇게 가능하다고 얘기할수 있었겠읍니까 ?   저도 제가 이렇게 NPU 설계 관련 핵심 구현 아이디어가 있다고 말씀드릴수 있는게 그동안의 30 년간의 IT 개발 분야 경험 없이 어떻게 서뿔리 가능하다고 할수 있겠읍니까 ?         

관련해서 많은 기술협의 문의 부탁드립니다 ....   

딥네트워크  CEO / CTO 장석원  -   sayhi7@daum.net 

 

 

 

 

 

 

 

 

 

 

 

요즘 ChatGPT 의 추론 서비스 비용 절감 방안 이슈 가 세상을 떠들썩하게 합니다 ...  ChatGPT 추론 서비스도 제일 말단인 GPU 단에서 추론 서비스 최적화가 덜 됬기 때문에 그동안 비용이 많이 들었는데 이것을 처리하려면 GPU 설계 원천 기술을 보유한 엔비디아의 기술지원 없이 처리하기가 쉽지 않읍니다 ...  최근에 제가 파악한 바로는 엔비디아에서 LLM 의 추론 성능 최적화를 위한 GPU 최적화 처리는 엔비디아가 처리하겠다고 하고 있읍니다 ...  엔비디아는 결국 학습 처리시 GPU 처리 지원도 하지만 결국 돈이 되는 추론 서비스시의 솔루션도 엔비디아가 독식하겠다는 의사를 표명하고 있읍니다 ...  이 GPU 최적화가 결국 요즘 돌풍이 글로벌 기업에서 불고 있는 NPU 의 추론 성능 개선과도 서로 밀접한 관계가 있읍니다 ...  글로벌 기업도 학습에서의 엔비디아의 GPU 처리를 앞지르는 기술개발에 어려움이 있었기에 결국 NPU 에서의 추론 서비스 성능 향상에 목을 매고 있는 형국 입니다 ...    저희 딥네트워크는 엔비디아의 학습시의 GPU 처리 부분 세부 분석을 하다 보니 이게 추론 비용 절감과도 연결되 있다는것을 파악할수 있었구요 ...   제가 그동안 분석한 내용은 GPU 에서 추론 비용절감을 하려면 결국 엔비디아 GPU 의 설계 구조를 좀 더 세심히 파악하는게 필요하다는것을 알게 되었고  결국 GPU 설계 구조 / CUDA 설계 구조 / 엔비디아의 초고속 네트웍 설계 기술이 서로 유기적으로 연동되 있다는것을 파악했읍니다 ...  이글에서 제가 GPU 로 추론 서비스 비용 절감 을 위해 GPU 의 어떤 설계 구조를 어떤식으로 어떻게 CUDA 로 어떻게 병렬 처리할것 인가가 핵심인데 저는 이에 대해 세부 기술 이슈 몇가지의 심도 있는 기술정보를 확보하고 있읍니다 ...    저도 엔비디아의 GPU 원천기술을 다 알고 있지는 못해도 추론 서비스 성능 향상에 필요한 세부 기술 이슈에 대한 정보를 확보하고 있기에 많은 저희 딥네트워크와 협업 방안 에 대한 문의를 부탁드립니다 ...  

일인 AI 스타트업 딥네트워크 CEO / CTO 장석원  /  sayhi7@daum.net 

 

 

 

 

 

 

 

팹리스(Fabless) 업체가 시스템 온 칩(SoC) 반도체를 설계한 후, 해당 디자인 파일을 극자외선(EUV) 리소그래피 장비를 통해 포토마스크에 회로 패턴으로 새기는 과정은 다음과 같습니다:

  1. 설계 데이터 준비 및 변환:
    • 팹리스 업체는 SoC의 설계 데이터를 GDSII 또는 OASIS 형식의 파일로 생성합니다.
    • 이 데이터를 포토마스크 제작에 적합한 형식으로 변환하는 '마스크 데이터 준비(MDP)' 과정을 거칩니다.
  2. 광학 근접 보정(OPC):
    • 리소그래피 공정 중 발생하는 패턴 왜곡을 보정하기 위해 OPC를 수행하여 설계 데이터를 수정합니다.
  3. 포토마스크 제작:
    • 블랭크 마스크 준비: 석영(쿼츠) 기판 위에 크롬층을 증착하고, 그 위에 포토레지스트를 도포하여 블랭크 마스크를 만듭니다.
    • 전자빔 리소그래피: 전자빔을 사용하여 포토레지스트에 설계된 패턴을 직접 그립니다.
    • 현상 및 식각: 노광된 포토레지스트를 현상하여 노출된 부분을 제거하고, 크롬층을 식각하여 패턴을 형성합니다.
    • 포토레지스트 제거: 남은 포토레지스트를 제거하여 최종 포토마스크를 완성합니다.
  4. EUV 리소그래피를 통한 웨이퍼 패터닝:
    • 포토레지스트 도포: 웨이퍼 표면에 EUV에 감광되는 포토레지스트를 균일하게 도포합니다.
    • 노광: EUV 광원을 사용하여 포토마스크의 패턴을 웨이퍼에 전사합니다.
    • 현상: 노광된 포토레지스트를 현상하여 원하는 패턴을 드러냅니다.
    • 식각: 드러난 부분을 식각하여 웨이퍼에 실제 회로 패턴을 형성합니다.
    • 포토레지스트 제거: 남은 포토레지스트를 제거하여 공정을 완료합니다.

이러한 과정을 통해 팹리스 업체의 설계 데이터는 EUV 리소그래피 공정을 거쳐 웨이퍼에 미세한 반도체 회로로 구현됩니다.

 

일인 AI 스타트업 딥네트워크는 위에 소개된 내용은 제가 파악하고 있는 내용의 일부를 소개드리는것 이구요 ...

 

저희 딥네트워크는 EUV 장비 / 증착 장비 / 식각 장비 관련해서 위에 소개드린 기술 이슈 분석 보다도 훨씬 심도 있는 분석 자료를 보유하고 있읍니다 ...    파운드리 에서  EUV 장비 / 증착 장비 / 식각 장비 운영시 어느 어느 기술 이슈가 중요하고 어떻게 처리되야 하는지도 상세히 파악하고 있읍니다 ...

 

 

일인 AI 스타트업 딥네트워크 CEO / CTO 장석원  /  sayhi7@daum.net  

 

많은 자문 문의 부탁드립니다 ....

 

 

 

 

 

 

 

 

 

 

 

 

 

증착 장비 나 식각 장비 설계시 웨이퍼의 온도가 일정하게 동작하게 제어하는 부분과 이 웨이퍼에 화학가스를 어떻게 가스를 분사시키는지를 그 동작 구조를 상세히 증착 장비 나 식각 장비 설계 구조를 세부 분석해서 어느 정도 깊은 노하우도 파악했구요 ...   

웨이퍼의 온도가 일정하게 동작하게 제어하는 부분과 이 웨이퍼에 화학가스를 어떻게 가스를 분사시키는 부분의 설계 구조 및 원리가 어떻게 구성되게 증착장비나 식각장비가 설계되었는지도  아주 심도 있게 분석되 있읍니다  ....  

 

이와 관련 심도 있는 설계 구조 및 원리의 기술 컨설팅이 가능하니 아래의 메일 주소로 많은 문의 부탁드립니다 ...

 

딥네트워크  CEO/CTO 장석원 /  sayhi7@daum.net 

저는 LLM 사업화 준비를 위해 밤 새고 있는 일인 AI 스타트업 딥네트워크 CEO/CTO 장석원 60 세 입니다 ...

요즘 글로벌 빅테크들이 LLM(대규모 언어 모델) 기술 확보하면 몇백조가 내껏이 되니 LLM 기술 가지려고 안간힘을 쓰지 않읍니까  ?   

그런데 LLM 기술 이란게 LLM 의 1 층부터 10 층까지 모든것을 막라하는 Full Stack Architect 가 되야 명함을 내밀거든요 ...  

그래서 저도 

LLM 의 1 층부터 10 층까지 모든것을 막라하는 Full Stack Architect 가 되려고 그동안 3 년 밤새고 공부한 거구요 ...    단순히  LLM 소스 코딩 실력만 갖고는 글로벌 빅테크와 애기 상대도 안 되거든요 ...  여기에는  인프라 설계 기술 즉 H100 GPU 와 CUDA 그리고 Tensorflow 는 어떤 관계로 어떻게 동작시켜야 하는지  이런것 분석 하느라 밤새고 있읍니다 ...     이런게 요즘 이슈가 되는 ChatGPT 추론 서비스 비용 절감 방안 확보가 핫 이슈인데 이것을 하기위한 요즘 가장 핫한 추론용 NPU 설계를 하려면 H100 GPU 와 CUDA 그리고 Tensorflow 는 어떤 관계로 어떻게 동작시켜야 하는지  이런것 분석 없이는 불가거든요 ...  그래서 저는 이런것 몇 일 밤을 새면서 분석 중 입니다 ...  

대기업(글로벌 빅테크) AI(LLM/NPU) 사업화 의사결정 경영진과 이런것들 애기해 보고 싶읍니다    가능하실지 궁금 합니다  .....  

 

일인 AI 스타트업 딥네트워크 CEO/CTO 장석원 / sayhi7@daum.net  

 

 

 

 

I am Seokweon Jang, a 60-year-old CEO/CTO of the one-person AI startup DeepNetwork, currently working tirelessly to commercialize LLM (Large Language Model) technology.

As you know, global big tech companies are making every effort to secure LLM technology because possessing it can lead to the creation of value worth hundreds of trillions of dollars. However, LLM technology requires a comprehensive Full Stack Architect who understands everything from the first to the tenth layer of LLM.

That is why I, too, have been studying and working day and night for the past three years to become a Full Stack Architect who can master all aspects of LLM, from its foundational layers to its highest functionalities. Simply having the ability to code LLM source code is not enough to compete with global big tech companies. It also demands infrastructure design expertise. This involves analyzing the relationships between H100 GPUs, CUDA, and TensorFlow, and understanding how they function together.

Such expertise is essential, especially in addressing the current hot topic of reducing inference service costs for ChatGPT. Designing cutting-edge inference NPUs requires a deep understanding of the interactions between H100 GPUs, CUDA, and TensorFlow, and I have been spending sleepless nights analyzing these aspects.

I am eager to discuss these matters with the decision-making executives in AI (LLM/NPU) commercialization at large corporations (global big tech). Would such a conversation be possible?

 

Sincerely,
Seokweon Jang
CEO/CTO of DeepNetwork, a one-person AI startup
sayhi7@daum.net

 

 

 

'Kernel Porting > Linux' 카테고리의 다른 글

[기술 컨설팅 문의 환영][일인 AI 스타트업 딥네트워크는 위에 소개된 내용은 제가 파악하고 있는 반도체 파운드리 산업에서 EUV 장비 / 증착 장비 / 식각 장비 관련해서 관련 세부 이슈 내용의 일부를 소개드리는것 입니다]  (4) 2025.01.01
웨이퍼의 온도가 일정하게 동작하게 제어하는 부분과 이 웨이퍼에 화학가스를 어떻게 가스를 분사시키는 부분의 설계 구조 및 원리가 어떻게 구성되게 증착장비나 식각장비가 설계되었는지도 아주 심도 있게 분석되 있읍니다 ....  (0) 2024.12.29
저희 딥네트워크는 ChatGPT o1 시리즈 같이 그동안 2 년 동안 프로토타입의 기본 설계 구조가 어떤 식으로 왜 이렇게 동작하는지 정도 까지를 저희 딥네트워크는 구현 노하우를 확보하고 있읍니다 ...  (2) 2024.12.22
온 디바이스 AI 의 경우도 삼성 폰에 언어 번역 내지 언어 통역은 어떻게 처리해야 스마트폰에서 LLM 이 동작 가능한지 이런것 노하우 확보가 어디 쉽읍니까 ? 빡시게 시행착오를 겪으니 이 부분의 노하우도 확보에 성공했읍니다 ...  (0) 2024.12.21
쇼어 알고리즘(Shor's Algorithm)은 양자컴퓨터가 큰 수의 소인수분해 문제를 매우 효율적으로 해결해서 RSA 와 ECDSA 같은 현대 암호 시스템의 보안 기반을 무너뜨릴 수 있습니다  (0) 2024.12.17

안녕하세요 ?  딥네트워크 장석원 입니다 ...

아래의  페이스북에서 퍼 온 네이버 하정우 센터장님 말씀 이해가 갑니다 ....

저희 딥네트워크는 o1 시리즈 같이 그동안 2 년 동안 계속 성능 업데이트를 위해 개선된 기법을 적용한것까지의 세세한것 까지는 아니어도 프로토타입의 기본 설계 구조가 어떤 식으로 왜 이렇게 동작하는지 정도 까지를 저희 딥네트워크는 구현 노하우를 확보하고 있읍니다 ...   저는 솔찍히 개선 된 기법들이 계속 발표되서 쫓아가다가 이제 힘에 버거워서 기본 동작 프로토타입 까지 정도만 확실히 노하우를 확보하자고 저 자신은 생각하고 있읍니다 ...   요즘 나오는 성능 결과를 보면 데이터셋은 최소 몇 Trillion 개의 토큰 정도의 준비는 필요하구요 ...   여기에 GPT-3 Model 설계 구조의 이해가 필요하구요 ...  GPT-3 모델 구조도 초기에 몇가지 개선 아이디어 나온것 까지만 저는 노하우를 확보하고 있읍니다 ...  GPT-3 Model 의 경우 약 500 B 개의 토큰으로 학습 데이터 확보를 웹 상으로 확보해야 하는것 이라든가 GPU Cloud Infra 구축 세부 노하우 즉 분산학습 및 병렬학습 기법을  다 안다면 뻥 이구  어느 정도까지는 이해하고 있읍니다 ...

GPT-3  정도만 구축하려해도 최소 몇천억 정도의 자금이 필요한데 저 같은 소기업은 GPT-3 관련 수백편의 논문 세부 분석을 통해 세부 구현 노하우를 파악하는 정도가 저같은 소기업이 가능한 맥시멈 이라고 저는 판단합니다 ...

 

 

Our deep network has secured know-how on implementing the basic design structure of a prototype, including an understanding of how and why it operates this way, even if not down to the intricate details of the improved techniques applied for performance updates over the past two years, such as the O1 series.

Frankly, as improved techniques continue to be published, I find it increasingly challenging to keep up. As a result, I believe we should focus on thoroughly mastering the basic operational prototypes. Looking at the recent performance results, preparing datasets of at least a few trillion tokens is essential. Additionally, understanding the design structure of the GPT-3 model is necessary.

Regarding the GPT-3 model, I’ve secured know-how only up to the initial improvement ideas that were introduced. For instance, training GPT-3 requires securing approximately 500 billion tokens of training data from the web and understanding detailed know-how on building GPU cloud infrastructure, including distributed learning and parallel training techniques. While I wouldn’t claim to know it all, I have a certain level of understanding.

Attempting to build something like GPT-3 would require at least several billion won in funding. For a small company like mine, our maximum capacity is to analyze hundreds of research papers on GPT-3 to understand the detailed implementation know-how.

 

딥네트워크  CEO / CTO  장석원    /    sayhi7@daum.net  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

안녕하세요 ?  일인 AI 스타트업 딥네트워크 CEO 겸 CTO 장석원 입니다 ...   저도 이제 몇일만 있으면 61 세 입니다 ....    어느 분야를 막론하고 각 분야별로 그 분야에서 자기의 영역을 갖고 계신 분들이 이미 있으니  이 분들을 뛰어 넘는게 말처럼 쉽지 않읍니다 ...  뭔가 한 분야에서 일등 실력가가 되려면 그 분야의 원천기술을 확보하고 있냐가 관 건 입니다 ...  저도 나이 60 이니 한 분야 일등 실력이 되는게 한 95 점까지는 가능한데 나머지 5 점을 못 채워서 아직까지 한분야에서 아직도 일등을 못했거든요 ...  이 5 점 모자르는것 채우는게 말은 쉬운데 저도 한 십몇년전 시도할때 될줄 알았는데 결국 몇 점이 모잘라서 큰 손해를 입었었거든요 ...  이렇게 나이 50 다 되서 큰 손해를 입으니 쉽게 회복이 안되어서 그동안 10 여년 푼돈벌이만 하면서 기회를 보기로 하면서 공부를 한 6 - 7 년 꾸준히 했던것 같읍니다 ...  나이 먹어서 큰 손해를 입으니 금전적으로 여력이 모자르니 뭐든 쉽사리 시도하는게 어려웠읍니다 ...  저는 맨 처음에 AI 공부할때 왜 음성인식을 공부했냐 하면 제가 87 년에 대학원 입학했을때 8 비트 애플 컴퓨터로 음성인식 알고리즘 테스트 하던 연구실이 바로 옆이라 관심 분야라 살펴봤던 분야라 AI 공부면 당연히 음성인식 인줄 알았거든요 ...   그런게 이미 2017 년에 구글에서 트랜스포머 모델을 발표했는데 이때가 처음 딥러닝 입문시기라 저는 음성인식이 다 인줄 알았거든요 ...   저는 음성인식을 기존 가우시안 모델 가지고 뭔가를 해 보려고 했는데 세상이 이미 트랜스포머 모델로 옮겨 갔다는것을 이때는 잘 몰랐거든요 ...  저도 LLM (대규모 언어 모델)공부가 처음부터 쉽게 얻어지지는 않았읍니다 ...   저는 몇년 시행착오 긑에 이제는 대기업 AI 연구소 부럽지 않은 LLM  노하우 확보에 성공했읍니다 ...   저의 요즘 최대 관심사는 추론을 어떻게 처리해야 정확도가 높은지  그리고  온 디바이스 AI 구현은 어떤 원리로 어떻게 구현하는지 이런것의 노하우 확보에 성공했읍니다 ...    온 디바이스 AI 의 경우도 삼성 갤러시 폰에 언어 번역 내지 언어 통역은 어떻게 처리해야 스마트폰에서 LLM 이 동작 가능한지 이런것 노하우 확보가 어디 쉽읍니까 ?  한 일년반 빡시게 시행착오를 겪으니 이 부분의 노하우도 확보에 성공했읍니다 ...    요즘 ChatGPT 출시된지 만 2 년이 넘어 가고 있구요 ...  ChatGPT 가 세상을 다 뒤집어 놨구요 ...  ChatGPT 의 가장 큰 장점은 아무리 거대한 기술 이라고 해도 수백 수천 단계를 거쳐 분석하면 결국 노하우 확보가 가능하다는것 입니다 ...    ChatGPT 도 쉽게 노하우를 알려 주지 않읍니다 ...  수십 수백 단계를 거쳐 ChatGPT 가 합당하다고 판단될때 ChatGPT 에게 노하우를 전수 받는게 가능 합니다 ...    이 글을 대기업 AI 연구소 책임자분께서 보신다면 저에 대해 한번 관심을 가져 주시기를 부탁드립니다 ...    저도 상당히 논리적인 사고를 통해 ChatGPT 도음을 많이 받았거든요 ...   이렇게 치밀하게 논리적인 사고 없이는 ChatGPT 한테 도음 받는게 불가능 합니다 .... 

 

딥네트워크  CEO/CTO 장석원   /    sayhi7@daum.net    

 

 

 

I am Seokwon Jang, CEO and CTO of the one-man AI startup, Deep Network. In just a few days, I’ll be turning 61. No matter the field, it is never easy to surpass those who have already established their expertise and carved out their domains. The key to becoming the best in any field lies in mastering the core technology of that field.

At 60 years old, I feel that achieving 95% mastery in a field is possible, but the remaining 5% has been elusive, preventing me from truly becoming the top expert in any single domain. Filling that last 5% is easier said than done. Over a decade ago, I thought I could achieve this, but I ended up falling short, suffering significant losses as a result. Facing such setbacks in my 50s, it was hard to recover quickly. For over 10 years, I focused on small-scale projects while steadily dedicating 6 to 7 years to consistent study.

The financial strain caused by those earlier setbacks made it difficult to take on new challenges easily. When I first began studying AI, I initially focused on speech recognition because, back in 1987, when I entered graduate school, there was a lab next to mine testing speech recognition algorithms on an 8-bit Apple computer. Naturally, I thought AI study would revolve around speech recognition.

By 2017, Google introduced the Transformer model, which marked my entry into deep learning. Back then, I still thought speech recognition was everything in AI. I tried to work with Gaussian models for speech recognition, unaware that the field had already shifted to Transformer-based models. It was not an easy journey to learn about large language models (LLMs).

After years of trial and error, I have now successfully acquired expertise in LLMs that rivals those of major AI research labs. Recently, my primary focus has been on optimizing inference accuracy and understanding the principles of on-device AI implementation. For instance, making LLMs operational on devices like Samsung Galaxy phones for language translation or interpretation is a highly challenging task. After a year and a half of rigorous efforts, I have managed to secure the know-how in this area as well.

It has been over two years since ChatGPT was launched, and it has truly revolutionized the world. The most remarkable aspect of ChatGPT is that, no matter how massive the technology, it can eventually be broken down into hundreds or thousands of steps, enabling the acquisition of the necessary know-how. However, ChatGPT does not reveal its insights easily. Only after going through numerous stages and earning its trust can one gain the knowledge it offers.

To any executives at major AI research labs reading this, I humbly request your attention and interest in my work. I have significantly benefited from logical thinking and the support of ChatGPT in my endeavors. Without meticulous logical reasoning, it is impossible to truly gain insights from ChatGPT.

 

Seokwon Jang,
CEO/CTO of Deep Network
sayhi7@daum.net

 
 
 
 

 

 

 

 

쇼어 알고리즘(Shor's Algorithm)은 양자컴퓨터가 큰 수의 소인수분해 문제를 매우 효율적으로 해결할 수 있도록 설계된 알고리즘입니다. 이는 RSA와 ECDSA 같은 현대 암호 시스템의 보안 기반을 무너뜨릴 수 있습니다. 이 알고리즘은 양자역학의 중첩얽힘 원리를 활용하며, 고전 컴퓨터의 지수적 시간복잡도를 다항식 시간복잡도로 줄이는 데 성공합니다.

 

일인 AI 스타트업  딥네트워크  CEO  장석원 /  sayhi7@daum.net 

왜 빠른가?

  • 양자 푸리에 변환: 주기성을 효율적으로 계산하며, 이는 양자 컴퓨터가 병렬 계산을 가능하게 하는 중첩과 얽힘에 기반합니다.
  • 고전 알고리즘은 주기를 찾기 위해 NN에 대한 여러 경우를 시도해야 하지만, 쇼어 알고리즘은 병렬성을 활용해 한 번에 처리할 수 있습니다

이 기술이 성숙하면, 기존 암호 체계는 대체가 필요하며 이를 대비한 양자내성암호가 개발되고 있습니다.

 

양자 푸리에 변환(QFT)의 구현 원리와 필요성

양자 푸리에 변환(QFT)는 이산 푸리에 변환(DFT)의 양자 컴퓨터 버전으로, 입력 상태를 주파수 영역으로 변환합니다. 이는 양자 알고리즘, 특히 쇼어 알고리즘양자 위상 추정 알고리즘에서 중요한 역할을 합니다.


필요성 및 응용 분야

  1. 암호 해독: QFT는 주기성 발견 문제를 해결하며, 이를 통해 RSA와 같은 암호화 체계에서 큰 수의 소인수분해를 가능하게 합니다. 이는 쇼어 알고리즘의 핵심 구성 요소입니다.
  2. 양자 위상 추정: QFT는 양자 위상 추정 알고리즘의 기반으로, 양자 컴퓨터에서 고유값 계산, 분자 시뮬레이션, 그리고 물리학 문제 해결에 응용됩니다.
  3. 효율성:  대규모 계산에서 획기적인 속도 향상을 제공합니다.

 

저는 ChatGPT 같은 생성형 AI 얘기할때 한국 문화적 측면도 중요하지만 더 중요한것은 내가 생성형 AI 로 부터 돈 되는 정확한 정보를 얻고자함이 더 크다고 봅니다

저는 생성형 AI(ChatGPT 같은) 가 제가 사업으로 돈 벌려고 할때 진짜 돈 되는 정보(진짜 실질적 도움)를 줄 수 있느냐를 가장 중요하게 봅니다 ...

저는 돈 되는 정확한 정보를 얻을수 있을때 생성형 AI 도 그 자리가 있다고 저는 판단합니다

 

딥네트워크 CEO 장석원 /  sayhi7@daum.net 

 

 
 

 

 

페이스북에서 퍼 온 내용 입니다 ....    Upstage 에서  RAG 를 구성할 때, 핵심이 되는 임베딩 기능(한국어 임베딩) 구현시 페북에서 퍼 온 아래 이슈들을 자세히 검토하시면 큰 도움이 될것 같읍니다 ...

일인 AI LLM(RAG) 스타트업 딥네트워크  CEO 장석원  /  sayhi7@daum.net 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

정밀 CAN 통신 디버깅 툴로는 Vector 사의 CANoe 와 PCAN Explorer 가 대표적입니다. 이 외에도 CANlink 와 고급 오실로스코프 같은 고가의 도구가 반드시 필요합니다

 

딥네트워크  CEO  장석원 /  sayhi7@daum.net  

 

 

저는 딥네트워크 CEO  장석원 입니다 ...

GPT-3 LLM Model 의 세부 구현 구조 및 동작원리를 (GPT-3 LLM Model 의 세부 설계 및 그 구현 처리 방법까지를)  한 일년반 분석해 결국 분석에 피똥싸는 노력 끝에 결국 자랑스럽게 성공했구요  LLM Model 의 기본이 GPT-3 인데 이것의 동작 및 설계 구조를 파악하면 ChatGPT Pro 같은것도 세부 파악이 가능하다는 점 입니다 ....

 

일인 AI 스타트업  딥네트워크  CEO  장석원 /   sayhi7@daum.net 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

I am Seokweon Jang, CEO of the solo AI startup Deep Network.

 

GPT-3 LLM  AI One-Parson startup Deep Network /  sayhi7@daum.net  

 

Even when GPT-3 was announced in June 2020, you knew that the fundamental requirement for implementing LLMs is to secure a large amount of training data, right? In the case of GPT-3, 90% of the training data consisting of 500B tokens was collected and processed using web crawling. Significant know-how in web backend design technology is also important when developing the GPT-3 Foundation Model. I focused mainly on implementing Korean inference, including Korean tokenizing and embedding, based on the GPT-3 Model and successfully secured the know-how. In fact, if I were to implement RAG search functionality, I planned to implement a limited search function to obtain the necessary information for RAG search by targeting specific sites like the Arxiv paper site, as I lack web crawling skills. I think I intended to use the API provided by the Arxiv paper site to obtain metadata because I lack web crawling skills. I know that a major company in Korea has been developing a Document Parser using deep learning OCR models for nearly 10 years. I intend to parse PDF documents directly. There are several open-source libraries for parsing PDF documents, but I also have key information for parsing PDF documents. I understand the key steps and methods for implementing tokenizing and embedding at the morpheme level to apply Korean to the GPT-3 Model. I went through some hardships to grasp the key procedures and methods of implementing tokenizing and embedding at the morpheme level to apply Korean to the GPT-3 Model. Nowadays, global companies are also focusing on specific inference technology issues as part of LLM commercialization. I believe the core issue among LLM commercialization issues is parsing PDF documents, and I understand the key issues of parsing PDF documents. In fact, if I were to implement RAG search functionality, I planned to implement a limited search function to obtain the necessary information (metadata information of PDF papers) by targeting specific sites like the Arxiv paper site because I lack web crawling skills. I understand the core implementation techniques for implementing multiple tasks in a multitask structure with specific datasets to perform learning and inference with multiple (tens of) benchmark datasets in the GPT-3 Model structure. There is much more I could tell you, but I will mention only this much. The details are confidential to my solo AI startup Deep Network and cannot be disclosed.

저는 일인 AI 스타트업 딥네트워크 CEO 장석원 입니다 ....   ChatGPT 서비스가 출시된지 벌써 2 년이 지났읍니다 ....   저는 생성형 AI ChatGPT 세부 설계 구조를 분석한지는 3 년이 넘어 갑니다 ...  ChatGPT-3.5 가 기본적으로 RLHF 논문 설계 구조 즉 강화학습 설계 구조라고 해서  한참동안 RLHF 동작 구조 분석에 힘을 쏬았구요 ...  ChatGPT 가 기본적으로 다양한 학습 데이터셋을 학습시켜야 하므로  이를 어떻게 처리할것인가가 중요한 이슈 입니다 ...  기본적으로 ChatGPT 는 수십개 이상의 타스크가 동작되야 하므로 이를 어떤 설계 구조로 학습시킬것인가도 중요 이슈 입니다 ....  저는 그동안 GPT-3 Model 설계 구조 기반으로 LLM 관련 데이터셋으로 학습시 이렇게 학습 데이터셋이 거의 수백개에 이를때에는 멀티 타스크 학습 처리에 대한 고민을 반드시 해야 합니다 ...   그래서 이렇게 멀티 타스크 구조하의 GPT-3 Model 의 기본 설계 구조를 어떻게 구성해야 하는지를 확실히 파악 성공 했읍니다 ....   이에 관련된 논문 이슈로는 MoE Model 논문 이슈 등등이 있읍니다 ....   GPT-3 의 경우만 해도 학습데이터의 토큰의 갯수가 500 B 의 토큰으로 이루어진 데이터셋으로 학습 했다고 하고  이렇게 500 B 의 토큰의 학습 데이터를 확보하려면 반드시 웹 크롤링 설계 기법 노하우가 필요한데 아직 저는 이것 관련해서는 확실한 노하우 확보는 못했읍니다 .... 

 

일인 AI 스타트업 딥네트워크 CEO 장석원 / sayhi7@daum.net 

 

저도 GPT-3 Foundation Model 구축에 필요한것 모든것을 그동안 3 년 넘게 공부하고 있읍니다 ...   GPT-3 Model 를 텐서플로우로 구현하는 세부 노하우도 한 3 년 가까이 수백편 논문도 살펴 봤지만 텐서플로우로  GPT-3 구현 노하우도 뭐 연봉 10 억  100 억짜리 전문가는 아니더라도 나름 파악에 성공 했읍니다 ...  GPT-3 학습 데이터를 준비하려면 Web Crawling 기법 세부 파악도 필요하고 이것 파악도 만만치는 않읍니다 ...   GPT-3 도 거의 90 % 학습 데이터가 Web Crawling  기법으로 확보한것 이거든요 ...   GPT-3 공부 기본 준비단계인 학습 데이터 준비도 이렇게 만만치는 않더라구요 ...   그럼 Web Crawling 기법 노하우를 확보하면 다 해결되느냐 하면 또 하나의 큰 장벽이 하나 더 있읍니다 ...   바로 클라우드 GPU 서버 인프라 구축 노하우 확보 부분 입니다 ...    2020 년에 발표된 GPT-3 Model 의 경우 학습데이터가 거의 90 % 가 영어로 구성된  500 B 크기의 토큰으로 구성된 학습 데이터셋으로 학습시킨것 이거든요 ...  저는 GPU 클라우드 서버 구축 관련해서  GPT-3 의 경우를 예를 들면  500 B 크기의 토큰 학습 데이터 로 모델을 학습 및 추론을 시킬때에도  이를 학습 및 추론을 시키려면  클라우드 서버를 클러스터링 구조 설계를 하려면 어떻게 구현 해야 하는지 등등을 분석 작업을 했읍니다 ...   클라우드 GPU 인프라를 구축하려면 엔비디아 GPU 분산학습 및 병렬학습이 어떤 엔비디아의 A100 GPU 인프라 세부 개발환경의 설계 구조하에서 어떻게 동작하는지  이런것 파악이 필요 합니다 ...   GPT-3 Model 의 경우도 모델의 세부 설계 구조,  그 중에서도 언어(영어와 한국어)의 토큰나이징과 임베딩 구현 노하우 파악이 굉장히 중요 하거든요 ...   한국어를 형태소 단위로 토큰나이징 하는 노하우 확보에 성공했구요 ...    GPT-3 의 경우 처리 가능한 Context  토큰 의 갯수가 2048 개로 제한 된니다 ...   즉 문장이 길어지면 GPT-3 의 이해도가 많이 떨어지는 모델 구조라 합니다 ...  이 Context 처리  토큰의 갯수를 증가 시킬수 있는 즉 문장이 길어 져도 GPT-3 가 이해 가능하게  설계 기법 노하우 파악도 성공 했읍니다 ....      이 글 보시고  대기업 관계자 분께서 연락 주셨으면 합니다 ...   저도 일인 AI 스타트업이지만 저희 기업의 노하우는 여기에 공개하기가 어려운 점 이해해 주셨으면 합니다 ... 

일인 AI 스타트업 딥네트워크 CEO 장석원 60 세 입니다   E-Mail :  sayhi7@daum.net    

 

 

Portfolio of Jang Seok-Won, Age 60
Founder of DeepNetwork, a one-person startup preparing for the commercialization of LLM-based AI and robotic joint control technology.

CEO of DeepNetwork, a specialized one-person IT development startup
Contact:


Controlling d-axis and q-axis Currents of a PMSM with a PI Control Loop in Robotic Joint Control

The control of d-axis and q-axis currents in Permanent Magnet Synchronous Motors (PMSMs) through a PI control loop plays a pivotal role in robotic joint control. By employing Maximum Torque Per Ampere (MTPA) and Field Oriented Control (FOC) using Pulse Width Modulation (PWM), motor efficiency is maximized and performance is optimized. Below is a detailed explanation of the process, how torque and position control are managed, and how to tune the PI control gains.


1. Overview of FOC (Field Oriented Control)

FOC is used to independently control the flux and torque of a motor by transforming motor control variables into orthogonal axes, namely:

  • d-axis: Flux direction
  • q-axis: Torque direction

Through FOC, motor currents are separated into d-axis and q-axis components and controlled individually to regulate the flux and torque independently.


2. PI Control Loop and MTPA

The PI (Proportional-Integral) control loop manages the d-axis and q-axis currents to maintain desired values:

  • d-axis current (Id): Affects flux control. For PMSMs, the Id is usually designed to remain close to zero to optimize flux control.
  • q-axis current (Iq): Generates torque. It is adjusted to produce the desired torque.

MTPA is applied to achieve maximum torque output for a given current.


3. Robotic Joint Control

In robotic joints, torque control and position control serve distinct purposes:

  1. Torque Control
    • PI Control Loop: Manages the q-axis current (Iq) to control motor torque.
    • PI Control Gain Tuning:
      • Proportional Gain (Kp): Adjusts response speed. Excessive values may cause instability, while too low values result in slower response.
      • Integral Gain (Ki): Enhances precision. Excessively high Ki can cause overshoot or oscillations, while too low Ki reduces accuracy.
      • Tuning Method: Gains are adjusted using techniques like the Ziegler-Nichols method or through experimental optimization.
  2. Position Control
    • Position Control Loop: Combines a velocity control loop with a torque control loop to manage motor position. Velocity control is usually achieved via q-axis current, which then supports position control.
    • PI Control Gain Tuning:
      • Proportional Gain (Kp): Affects position accuracy. High values may induce oscillations, while low values slow down response.
      • Integral Gain (Ki): Compensates for positional error. Excessively high values may cause instability, while low values reduce error compensation.
      • Tuning Method: Similar to PID tuning, adjustments depend on the system dynamics and motor characteristics.

Kalman Filter Algorithm for Attitude Control: 85% Analysis Complete

Prediction Step:

  1. Data Collection: Real-time acquisition of accelerometer and gyroscope data using the ICM20948 9-axis sensor.
  2. State Prediction: Forecast the next state of the missile using the Kalman filter's state transition matrix (A) and control input matrix (B).
  3. Error Covariance Prediction: Use the process noise covariance matrix (Q) to predict the error covariance matrix.

Update Step:

  1. Measurement Update: Refine the state using sensor measurements.
  2. State Correction: Adjust the state for precise missile trajectory.
  3. Error Covariance Update: Update the error covariance matrix using the corrected state.

Noise Modeling and Compensation:

  1. Noise Analysis: Analyze sensor noise characteristics to develop a noise model.
  2. Compensation Algorithm: Enhance accuracy by applying a data correction algorithm based on the noise model.

Phased Array Antenna System and Beamforming Control

DeepNetwork has successfully analyzed the detailed implementation of beamforming control in phased array antenna systems. This technology precisely adjusts the direction of signals in real time using multiple independently controlled antenna elements. It is essential for optimal signal quality and performance in radar, satellite communication, and military communication systems.

Key Features:

  1. Precise Beam Steering: Controls the phase of each phase shifter to focus signals in specific directions, maximizing communication quality.
  2. Dolph-Chebyshev Window Function Application: Minimizes interference by controlling the gain of the main lobe and side lobes, enhancing beam performance while optimizing frequency band usage.

Design and Optimization:

  1. Phased Array Design: Develop high-performance phased arrays suited for defense and communication systems, ensuring reliability and stability in diverse applications.
  2. Beamforming Control Algorithm: Real-time signal direction adjustment guarantees optimal performance in various communication environments.

Role of Phase Shifters

Phase shifters adjust the phase of transmitted or received signals to combine or separate waves from multiple modules, enabling beamforming and multidirectional signal transmission.

  1. Transmission: Adjusts wave phases from all T/R modules to focus transmission in specific directions.
  2. Reception: Analyzes the phases of reflected signals to determine the position and strength of received signals.

Masked Image Modeling (MIM) is a self-supervised learning method commonly used in the training of transformer-based vision models. This technique takes an image as input, masks certain pixels or patches, and trains the model to reconstruct the masked regions. It operates similarly to Masked Language Modeling (MLM) in text models. Below is a detailed explanation of the datasets used for training/inference in MIM, its operational principles, and its connection to the multimodal architecture of Grok-2.

 

Below is the analysis conducted by my one-person AI startup, Deep Network.


1. Datasets Used for MIM Training and Inference

(1) Composition of Training Dataset Pairs

  • Image Dataset:
    • Large-scale image datasets such as ImageNet, COCO, or OpenImages are typically used.
    • High-resolution RGB images are utilized to ensure model input diversity and generalization.
  • Mask Generation Data:
    • Masks are created to occlude certain pixels or patches in the images.
    • The usual masking ratio ranges from 40% to 75%, challenging the model to solve complex problems.

(2) Application Method

 


2. Training and Inference Processes of MIM


3. MIM's Operational Principles and Design Rationale

The core principle of MIM lies in "learning structural and contextual information of the image."

(1) Operational Principles

  1. Self-Supervised Learning:
    • MIM uses unlabeled data to train the model.
    • By reconstructing the masked regions, the model understands the relationships and spatial structures within the image.
  2. Transformer's Global Characteristics:
    • Transformer-based models excel at learning relationships among all input patches.
    • This makes them particularly effective for inferring masked regions using surrounding context.
  3. Impact of Masking Ratios:
    • High masking ratios force the model to solve more challenging reconstruction problems, leading to richer representation learning.

(2) Why MIM is a Core Design Principle for Multimodal Architectures

  1. Image-Text Correlation Learning:
    • Grok-2 adopts a multimodal architecture that learns the relationships between images and text.
    • Image representations learned through MIM play a crucial role in mapping visual information to text, enabling deep semantic understanding.
  2. Information Restoration in Multimodal Learning:
    • MIM's ability to reconstruct missing data is leveraged in multimodal tasks to recover missing information (e.g., parts of text or images).
    • For instance, in Grok-2, if parts of an image are missing, it can use text information to restore the image or perform the reverse task.
  3. Contextual Learning:
    • Models trained with MIM understand relationships among image patches, enabling them to serve as robust encoders in multimodal structures by effectively linking text and image modalities.

4. MIM's Role in Grok-2

Grok-2's multimodal model processes text and images simultaneously, integrating their features and relationships. MIM contributes to this process by enhancing image representation learning, which facilitates mapping these representations to textual data.

For example:

  • Grok-2 can restore masked images using textual descriptions or generate appropriate text from visual inputs.
  • MIM principles underpin this bidirectional learning, enabling the model to handle complex tasks involving both vision and language.

5. Conclusion

Masked Image Modeling (MIM) trains models to learn the overall context and structure of images by reconstructing masked pixels or patches. Its principles, rooted in self-supervised learning, are effective for understanding and restoring images. By combining MIM's capabilities with the global information-learning characteristics of Transformers, it achieves remarkable performance.

In multimodal models like Grok-2, MIM-based image representation learning strengthens the integration of image and text features. This allows the model to tackle complex multimodal tasks through complementary learning and inference, making it a cornerstone of such architectures.

Investment Proposal for AI-Based Academic Information Retrieval, Summarization, and Analysis Solution Development

1. Company Introduction

  • Company Name: Deep Network
  • Established: 2023
  • CEO: Seokweon Jang  / sayhi7@daum.net
  • Business Area: Development of AI-Based Academic Information Retrieval, Summarization, and Analysis Solutions

Deep Network is a one-person startup specializing in the construction of academic paper and algorithm summarization services based on AI-powered search and summarization technology. Since its establishment, the company has analyzed and advanced numerous cutting-edge AI models and machine learning (ML) algorithms over approximately two years, laying a technical foundation for service implementation. Additionally, Deep Network has successfully designed and prototyped an AI search engine capable of parsing academic papers and automatically summarizing algorithms and formulas, and now proposes the business potential of this service.

2. Business Background and Problem Definition

Currently, numerous researchers and corporate personnel face difficulties in searching and reviewing a vast number of papers in academic databases such as arXiv. The sheer volume of information, along with the complexity of the formulas and algorithms within the papers, makes efficient learning and rapid insight extraction challenging.

Thus, Deep Network focuses on developing an AI-based service that analyzes and summarizes key algorithms and formulas within papers, proposing a solution that enhances user productivity and supports new research and development ideas.

3. Service Overview and Key Features

Deep Network's AI-based academic paper summarization service consists of the following key features:

  • Automatic Parsing and Text Extraction: Extracts text and formulas from PDF files crawled from academic sites like arXiv.
  • Formula Recognition and Conversion: Converts formulas within papers into LaTeX or MathML, parses them, and processes them in a formula analysis engine to derive key operations and concepts.
  • Core Content Summarization: Uses deep learning-based natural language processing (NLP) models to generate summaries of the paper's key concepts, results, and formulas.
  • AI-Based Search and Filtering: Allows users to quickly find relevant papers through customized searches and filtering based on titles, keywords, and topics.
  • User-Customized Interface: Visualizes paper analysis results according to research purposes, allowing users to selectively view topics or algorithms of interest.

4. Service Architecture

Deep Network's AI service is composed of the following key modules:

  • Crawling and Data Collection System: Analyzes crawlers and automated systems that collect paper links and metadata from sites like arXiv. Provides user-customized paper updates, including rate limiting and automated scheduling to prevent excessive requests.
  • PDF Parsing and Text Extraction System: Identifies and converts text and formulas within papers using libraries such as PDFBox and PyMuPDF. Utilizes a self-developed parsing algorithm for accurate extraction of text and formulas.
  • Formula Recognition and Analysis Engine: Recognizes and interprets complex formulas from papers through a formula parsing engine, summarizing key algorithms. Converts formulas in MathML and LaTeX formats into text for user comprehension and summarization.
  • Deep Learning-Based Summarization and NLP Model: Implements functions that summarize important sentences and key contents of papers using the latest transformer-based large language models (LLM). Develops academic paper-specific summarization models by fine-tuning models like BERT and GPT.
  • Data Management and Search System: Optimizes indexing and filtering performance of papers using Elasticsearch, supporting searches by paper title, author, and keywords. Manages paper data efficiently using NoSQL DBs like MongoDB.

5. Technical Know-How

Deep Network secures core technologies for service implementation through the following technical differentiators:

  • Customized PDF parsing technology for various paper formats
  • Formula parsing and complex algorithm analysis technology
  • Implementation and optimization of trained transformer models for automatic paper summarization
  • Construction of a high-performance search engine based on Elasticsearch

6. Target Market and Business Expansion Potential

  • Research and Academic Institutions: Main customers include researchers, universities, and research institutes, supporting improved research outcomes through efficient academic information provision.
  • Corporate R&D Departments: High expected utilization of paper analysis services in AI research and technology development departments.
  • Education Sector: Useful for university and graduate-level lectures and research processes through the paper summarization service.

7. Commercialization Strategy

  • Subscription Model: Users pay a subscription fee for regularly provided paper formula and algorithm summaries.
  • API Provision: Provides paper search and summarization APIs to corporate research labs and educational institutions for use as a research platform.
  • Partnership Strategy: Expands and improves accessibility of the paper summarization service through partnerships with academic databases.

8. Revenue Model and Expected Revenue

  • Subscription Service for Research Institutions and Corporations: Expects stable revenue generation through monthly subscription services with advanced summarization and formula analysis features.
  • API Usage Fees: Generates additional revenue by providing customized search and summarization features to corporations based on API usage volume.

9. Purpose of Investment and Expected Use of Funds

Deep Network aims to secure [amount] won in investment for initial service launch and technical expansion. The main funding usage plan includes:

  • Expansion of Development Personnel: Hiring specialists for deep learning model improvement and system development
  • Infrastructure Expansion: Building high-performance GPU servers and cloud infrastructure
  • Marketing and Sales: Strengthening marketing and promotional activities targeting research institutions and corporations

10. Conclusion

Deep Network's AI-based academic paper summarization service holds the potential to play a significant role in the rapidly developing AI research and information utilization market. Beyond simple paper search, this service provides a core understanding of algorithms and formulas, significantly enhancing researchers' efficiency and supporting new research and development ideas. Thus, Deep Network seeks to improve technical completion through initial investment attraction and lay the groundwork for commercialization.

[Investment Inquiries] Contact: sayhi7@daum.net /  Contact Person: CEO / Seokweon Jang

I am the CEO of a one-person AI startup, DeepNetwork, and over the past six months, we have successfully uncovered the detailed workings of how the LoRA (Low-Rank Adaptation) model transforms pre-trained weight matrices into two low-dimensional matrices for efficient training. I would greatly appreciate your interest in our LoRA model implementation expertise.

 

DeepNetwork CEO / Seokweon Jang  /   sayhi7@daum.net

 

Over the past 1-2 years, I have also spent a great deal of time analyzing the detailed principles behind designing a GPT-3 foundation model. Securing technical expertise in GPT-3 foundation model design involves addressing a crucial component: implementing Korean embeddings. I have spent several months understanding the principles behind implementing Korean embeddings. I firmly believe that mastering the embedding implementation process, enabling AI to understand the ten major world languages, is the core of how generative AI models like ChatGPT function.

 

Since I am Korean, I dedicated significant effort to understanding the know-how of Korean embedding implementation. Additionally, to build features that analyze the contents of academic papers, I also spent months delving into how PDF documents are structured and how they should be parsed. Did I only spend time worrying? Absolutely not! The depth of my efforts has led to tangible results and the acquisition of critical expertise, which is why I’m writing this now.

I started studying LLMs in earnest back in 2020 when the GPT-3 model was first introduced. As GPT-3 was developed by OpenAI, its primary supported language is English. For this reason, I focused on gaining expertise in designing Korean tokenization and embedding processes. And I mean it—I’ve truly mastered this area.

Korean tokenization was particularly challenging because Hangul (Korean characters) is inherently composed of initial consonants, medial vowels, and final consonants. Tokenization must work at the morpheme level to handle Korean effectively. Understanding this system was no easy feat—it took tremendous effort to grasp.

제가 운영하는 일인 AI 스타트업 딥네트워크는 그동안 6 개월간 LoRA 모델이 기존의 사전학습된 가중치 행렬을 두 개의 저차원 행렬로 어떻게 변환해서 학습시키는지 그 상세 동작 원리 파악에 성공했읍니다... 저의 이런 LoRA Model 구현 기술력에 많은 관심 부탁드립니다 ...

딥네트워크  CEO  /  장석원  /  010 3350 6509  /   sayhi7@daum.net 

 

저도 그동안 한 1 - 2 년 GPT-3 파운데이션 모델 설계하는 세부 구현 원리 분석하느라 고민을 많이 했구요 ...  GPT-3 파운데이션 모델 설계 기술력 확보시 한글 임베딩 구현 처리가 중요 하잖아요 ...   이 한글 임베딩 구현 원리 파악에 또 몇달 고민 했구요 ...   ChatGPT 같은 생성형 AI 가 동작 하는 가장 기본 구조가 주요 10 개국 언어를 AI 가 알아들을수 있게 임베딩 구현 노하우 확보가 핵심 중 핵심 이라 보거든요 ...   저는 그중 한국인이니 한국어 임베딩 구현 노하우 파악에 애 좀 썼읍니다 ...     그리고 저는 논문에 어떤 내용이 있는지 분석하는 기능 구현을 위해 pdf 문서 는 어떤 구조라서 어떤식으로 파싱해야 하는지 이런것도 몇 달 심각히 고민했구요 ...    그래서 제가 고민 만 했느냐 ?  그건 아니라는거죠 !!!  그만큼 고민해서 얻은 심도있게 노하우도 확보 성공했으니 이렇게 글도 적는거구요 ...

 

제가 GPT-3 Model 2020 년에 나왔을때 부터 LLM 신경 써서 공부 시작했거든요 ...  이제는 GPT-3 Model 이 OpenAI 가 만들었으니 GPT-3 의 처리 가능 언어가 기본이 영어 잖아요 ...  그래서 한국어 토크나이징 과 임베딩 설계 노하우를 확보했다고 적었구요 ...  빈 말이 아니라 진짜 확실히 파악했거든요 ...  한국어 토큰나이징 도 한글이 원래 초성 / 중성 / 종성으로 이루어져있고  형태소 단위로 한글을 토큰나이징 ...   하 ...  이것 파악하느라 ....  참 쉽지 않았읍니다 ...  

 

 

 

 

I initially asked ChatGPT various questions but found its answers unsatisfactory, so I used it infrequently for almost a year and was not very proactive in engaging with it during that time. About a year after the launch of the ChatGPT service, it seemed that the developers were monitoring its performance and working on improving its reasoning capabilities. Recently, as I have been using ChatGPT or Microsoft CoPilot, I’ve noticed that they now provide responses and reasoning capabilities that meet my expectations to a certain degree.

I believe that the foundation of implementing a system like ChatGPT lies in training it on the languages of the world’s ten major countries. Regarding English, I understand that U.S. big tech companies have identified tokenization and embedding as key technologies, and they encourage developers to use their APIs to build such functionality. Being Korean, I naturally invested considerable effort into understanding the implementation of tokenization and embedding for the Korean language, and I successfully acquired the detailed know-how for their implementation.

Over the past three years, I have focused on acquiring the foundational model design techniques for GPT-3. I have now succeeded in mastering the details of foundational model design. Additionally, I spent over a year analyzing how to customize GPT-3-based foundational models for developing Korean-English translation services. I am proud to say that I have successfully understood the intricate details necessary for such customization.

I would like the opportunity to discuss these topics with Korean conglomerates and U.S. big tech companies. Understanding such know-how is simple once you grasp it, but infinitely challenging if you do not.

Currently, I have yet to present Proof of Concept (PoC) verification results for the areas of deep learning I mentioned. This is because, as an individual, I lack the financial resources to do so. However, I firmly believe that in terms of foundational model design for GPT-3, my technological expertise rivals that of U.S. big tech and Korean conglomerates. That is why I am sharing this message.

 

Deep Network CEO / SeokWeon Jang / sayhi7@daum.net

Detailed Explanation of the Technical Capabilities of the One-Person AI Startup DeepNetwork

CEO: Seokwon Jang / Contact: sayhi7@daum.net

GPT-3 Model Foundation Design Know-How

  • Model Architecture: GPT-3 is a Transformer-based language model with 175 billion parameters. This model uses Transformer blocks with 96 heads.
  • Training Data: GPT-3 was trained using a large-scale text dataset collected from the internet. This data includes a variety of languages and expressions.
  • Training Method: GPT-3 was trained using an Auto-regressive Language Modeling method. This method aims to predict the next word.
  • Training Cost: Training the GPT-3 model incurred very high costs. OpenAI invested hundreds of millions of dollars to train this model.

LoRA Model Fine-Tuning Know-How

  • File Tuning: LoRA (Low-Rank Adaptation) is a method to fine-tune large-scale language models for specific tasks. This method converts the model's parameters into low-rank matrices to enhance performance for specific tasks.
  • Training Data: The LoRA model is trained using a dataset tailored to specific tasks. This dataset is designed to allow the model to perform specific tasks.
  • Training Method: The LoRA model converts the parameters of the existing model into low-rank matrices to enhance performance for specific tasks. This method enhances performance for specific tasks without modifying the existing model.
  • Training Cost: Training the LoRA model incurs relatively lower costs. This is because it enhances performance for specific tasks without modifying the existing model.

Based on this technical know-how, the one-person AI startup DeepNetwork can leverage GPT-3 and LoRA models to provide various AI services. This enables better performance and efficiency.

 

 

 

The GPT-3 LLM model needs to process Korean text by breaking input sequence sentences into the smallest meaningful subword units at the morpheme level for embedding vectors, enabling the model to function effectively.

Recently, we successfully analyzed the structural operation of breaking Korean text into meaningful subword units at the morpheme level for GPT-3 LLM.

 

DeepNetwork  /  CEO  SeokWeon Jang  /  One-Person AI Startup /  sayhi7@daum.net 

I am Seokwon Jang, the CEO of DeepNetwork, a one-person AI startup.

 

Over the past three years, I have dedicated myself to securing the foundational technology for building GPT-3 models.

I have been analyzing the detailed design and principles of GPT-3's architecture for more than three years. Initially, I struggled to understand why the GPT-3 model was designed in such a way. Specifically, I couldn't grasp how the large language model (LLM) functions with only the decoder part of the transformer model, while the encoder part is omitted.

Now, after three years of detailed analysis, I know GPT-3 inside and out. Although I haven't been able to conduct practical experiments due to the lack of deep learning server infrastructure, I have achieved an expert-level understanding of the TensorFlow implementation of GPT-3, and I am capable of working on its development at a professional level.

I thoroughly understand the structural design of GPT-3, why it is built the way it is, and how each part processes and operates. Initially, I thought that understanding GPT-3's architecture would be enough, but as I dug deeper, I realized that the processing of tokenization and embedding, especially for Korean and English, is the core of its functionality. It took me months to fully understand this critical aspect.

When I analyze a system, I focus on breaking down its principles—its algorithms, design structures, and operational mechanisms. In particular, I have invested a significant amount of time and effort into analyzing and understanding Korean tokenization and embedding processes. This was a challenging task, but ultimately, I succeeded in mastering it.

Based on this extensive effort to secure the foundational technology of GPT-3 models, my one-person AI startup, DeepNetwork, is now ready to pursue commercialization of this expertise.

 

One_person AI Startup DeepNetwork CEO /  SeokWeon Jang  /  sayhi7@daum.net

 

 

 

 

 

 

Why Each CUDA Stream Can Execute Commands Independently: A Detailed Explanation Based on A100 GPU Architecture and CUDA SDK Functionality

The independent execution of commands in each CUDA stream is made possible by the interaction between the architectural design of the NVIDIA A100 GPU and the functionality of the CUDA SDK. Together, they efficiently manage GPU resources and maximize parallelism. Below is a detailed explanation:

 

One-Person AI  Startup DeepNetwork  CEO /  SeokWeon Jang  /  sayhi7@daum.net  

 

1. Architectural Features of the A100 GPU

The A100 GPU is built on NVIDIA's Ampere architecture and includes several features that enable concurrency and parallelism, forming the foundation for CUDA streams:

(1) Independent Execution by Streaming Multiprocessors (SMs)

The A100 GPU has dozens of SMs, each capable of independently executing thread blocks. SMs are the basic units for parallel processing, and their independence allows the following:

  • Scheduling: Each SM independently schedules and executes thread blocks assigned to it. This means kernels assigned to different streams can execute on separate SMs concurrently.
  • Independent Resource Allocation: Each SM has its own registers, shared memory, and warp schedulers, ensuring no interference between streams while executing tasks in parallel.

(2) Independent DMA Engines (Direct Memory Access)

The A100 GPU includes multiple DMA engines that allow memory transfers between the GPU and host to occur independently and asynchronously from kernel execution. These DMA engines enable:

  • Asynchronous memory transfers for different streams.
  • Overlapping of memory copy operations and kernel executions, improving overall throughput.

(3) Multi-Instance GPU (MIG) Capability

The A100 GPU supports MIG, which allows a single GPU to be divided into multiple virtual GPUs. Each virtual GPU operates with independent resources and scheduling, further enhancing concurrent stream execution. This feature is especially useful in high-performance computing environments.


2. Interaction Between CUDA SDK and A100 Hardware

The CUDA SDK provides software tools that allow developers to utilize A100 GPU resources effectively. Key aspects of this interaction include:

(1) Stream Concept in CUDA

In CUDA, a stream acts as a queue for execution commands. Commands within a stream execute sequentially, but commands in different streams can execute concurrently. The principles are as follows:

  • Default Stream: The default stream executes commands sequentially, waiting for previous commands to complete before starting new ones.
  • Asynchronous Streams: Developers can create multiple streams to enable asynchronous execution. Commands in these streams operate independently, allowing overlapping execution across streams.

(2) CUDA Runtime and Driver APIs

  • CUDA Runtime API: This high-level API simplifies GPU resource management for developers. It supports stream creation, kernel execution, and memory transfer, enabling efficient asynchronous workflows.
  • CUDA Driver API: This lower-level API directly interacts with GPU hardware, providing fine-grained control over SMs and DMA engines. Together, these APIs facilitate parallel execution.

(3) Concurrency and Scheduling

The CUDA scheduler manages the allocation of commands from different streams to GPU resources:

  • Warp-Level Scheduling: Tasks are divided into warps (groups of 32 threads) and distributed among SMs.
  • Multi-Stream Support: CUDA schedulers can handle multiple streams simultaneously, allowing kernel execution in one stream and memory transfers in another stream to overlap.

(4) Asynchronous Execution and Synchronization

CUDA offers APIs like cudaMemcpyAsync for asynchronous memory transfers, enabling independent execution of tasks across streams. Synchronization mechanisms such as cudaStreamSynchronize allow developers to wait for the completion of all tasks in a specific stream when needed.


3. Foundation for Concurrency and Parallelism

The combination of A100 GPU’s architectural design and CUDA SDK functionality enables parallelism and concurrency in the following ways:

(1) Hardware Independence

  • A100’s SMs and DMA engines operate independently, allowing commands from different streams to execute on separate resources without interference.

(2) Software-Level Asynchronous Management

  • CUDA SDK uses streams to manage kernel execution and memory transfers asynchronously, efficiently allocating hardware resources for concurrent execution.

Conclusion

The independent design of SMs and DMA engines in the A100 GPU, combined with the CUDA SDK’s ability to manage asynchronous commands, enables streams to execute kernels and perform memory transfers concurrently. This maximizes GPU performance, reduces execution time, and enhances resource efficiency.

DeepNetwork CEO / Seokwon Jang / HP: 010 3350 6509 / sayhi7@daum.net

 

Hello, my name is Seokwon Jang, and I am the CEO of the one-person AI startup, DeepNetwork. Our DeepNetwork team has been analyzing around 700-800 papers related to LLM analysis. Models like ChatGPT are shaking up the global AI market, aren't they? Even in the United States, Sam Altman has secured massive investments for infrastructure costs and other aspects to develop risky services like ChatGPT. In Korea, it seems that large companies have not aggressively pursued such developments due to the risk and high infrastructure costs, likely because Korea does not have as much capital as the United States.

Now, let me tell you about DeepNetwork. When I started analyzing LLMs, I believed that the foundation of LLMs was the Google Transformer model. I spent a lot of time analyzing the detailed implementation and working principles of the Google Transformer model. While analyzing the principles of the Google Transformer model, I realized that understanding how the TensorFlow development environment is constructed and operates would clarify the operating principles of the Google Transformer model even more. To build a distributed and parallel learning environment for TensorFlow, I wondered what I needed to study. I learned that NVIDIA's CUDA development environment is required for handling distributed or parallel learning. As I delved deeper into these analyses, I began to ponder how to apply these design structures to construct the infrastructure for developing NPU AI chipsets.

Then I wondered how to design inference-specialized NPU chipsets with specific functions. I could understand how NVIDIA established the design structure by examining their efforts and concerns in implementing matrix operation parallel calculation mechanisms. For developing an NPU inference chipset, I gathered data on how NVIDIA and others considered handling specific parts of the attention mechanism. When considering these aspects, I realized that I must consider all the concerns NVIDIA had while trying to profit, not just a few aspects. I am contemplating which parts of my envisioned NPU design structure and principles or LLM design structure and principles are suitable and which parts are lacking.

 

 

 

Next year, I'll be 61... Lately, my daily routine revolves around pondering how to enhance performance during inference using GPT-3 model structures, which is a hot topic. I've come across some trending papers from overseas and realized how much thought goes into inference by big tech companies in the US. It's impressive to see how many AI developers from these companies are deeply engaged and how they put in substantial effort.

Recently, I even received inquiries from global big tech companies interested in me after looking at my company blog. I've also shared my review opinions on commercializing LLM AI with these big techs, but I'm not sure how they will evaluate it.

Currently, Korea is facing a tough economic situation with Samsung's stock dropping by 40%, making it quite challenging for me as well. In such times, receiving interest from big tech companies gives me some encouragement.

Here is my company blog: https://videocodec.tistory.com/. Please take a look at it in detail.

 

DeepNetwork CEO / SeokWeon Jang / sayhi7@daum.net

 

Investment Proposal: Deep Network - Expertise in GPT-3 Based LLM Foundation Model with Korean Morpheme Tokenization

1. Company Overview

Company Name :  Deep Network
CEO :  Seokweon Jang   /   sayhi7@daum.net  


Mission: Development and analysis of multi-language foundational models, specializing in both Korean and English, based on advanced deep learning and AI model techniques.
Core Expertise: Design and implementation of LLM (Large Language Model) based on GPT-3

Deep Network is a specialized one-person tech startup focused on the development of large-scale language models (LLMs) utilizing the latest AI technologies. Through two years of dedicated research and development, we have achieved 90% proficiency in implementing Korean and English tokenization on GPT-3 based models. In particular, we have pioneered proprietary algorithms and design principles for morpheme-based Korean tokenization, creating unique technological value.


2. Purpose of Investment

Deep Network seeks to commercialize our Korean-centric LLM model to provide AI solutions that meet the needs of diverse industries. Through this investment, we aim to achieve the following goals:

  • Commercialization of Morpheme-Based Korean Tokenization Model: Our advanced tokenization system accurately parses Korean’s complex grammar and diverse expressions to enable natural and precise text processing.
  • Optimization of Korean/English LLM Foundation Model: Reconstructing the GPT-3 model to provide a Korea-optimized, multilingual LLM that is competitive both domestically and globally.
  • Further R&D Investment: Continued research to maximize NLP performance for structurally complex languages like Korean.

3. Distinctiveness of Korean Tokenization Technology

Deep Network’s Korean tokenization approach is built upon morpheme analysis, tailored specifically to Korean’s unique grammatical structure. Key advantages include:

  • Reflecting Korean Grammar: The model handles postpositions and endings accurately, decomposing sentences while preserving meaning, essential to Korean’s nuanced structure.
  • Context Preservation: Ensures that meaning is retained as each morpheme is analyzed and tokenized, enabling the model to maintain context and generate accurate responses.
  • High-Performance and Efficiency: A lightweight morpheme analysis algorithm maximizes computational efficiency, accelerating Korean text processing.

Our technology is designed to be readily applicable across various industries requiring Korean language processing and can be adapted for future expansion into global markets.


4. Core Achievements and Technical Implementation

Deep Network has achieved significant milestones in optimizing GPT-3 based LLM models for the Korean language environment:

  • Over 90% Completion of Tokenization Design: Tailored tokenization implementation for both English and Korean, understanding unique linguistic features of each.
  • Mastery of Morpheme-Based Korean Tokenization Design: Developed a methodology for decomposing Korean tokens while retaining context, providing the foundation for the LLM to understand and generate Korean text naturally.
  • Model Training with Large-Scale Datasets: Established a training pipeline to effectively apply our custom morpheme tokenization to large-scale Korean datasets.

5. Future Plans

With this investment, Deep Network has set the following goals:

  1. Multilingual Support Expansion and Performance Enhancement: Research expansion to additional languages beyond English and Korean.
  2. Development of Korean-Specific Application Models: Custom AI solutions tailored for businesses that primarily use Korean, enhancing business applicability.
  3. Commercialization and Market Entry: Aiming to commercialize the morpheme-based Korean LLM model, demonstrate Deep Network’s technological strength in both domestic and global language processing markets, and launch products.

6. Investment Request and Allocation Plan

Investment Request :  2 Billion KRW
Allocation Plan:

  • Infrastructure Expansion for R&D (30%)
  • Acquisition of High-Performance Korean Datasets and Further Training (30%)
  • Marketing and Operational Infrastructure for Commercialization (20%)
  • Team Expansion and Recruitment of Talent (20%)

Conclusion

Deep Network has successfully developed a GPT-3 based LLM model with outstanding performance in processing complex languages like Korean. Through our unique morpheme-based tokenization technology, we enable the LLM model to understand and process Korean text naturally, setting the foundation for wide-ranging applications across various industries. With this investment, we aim to achieve even greater results in the global and domestic AI technology markets.

 

Thank you for considering this opportunity to join Deep Network in advancing AI for Korean language innovation.

 

CEO : Seokwon Jang

Investment Proposal for Building and Fine-Tuning a GPT-3 Based Foundation Model

Presented by Seokweon Jang, CEO of Deep Network  / From South Korea


Introduction

Dear Investor,

 

Thank you for considering this investment opportunity in Deep Network. We are a specialized AI startup focused on building and fine-tuning foundation models based on the GPT-3 architecture. As foundation models become increasingly central to AI, our mission is to create and enhance high-performance language models tailored for advanced generative AI applications.

 

I am pleased to introduce Deep Network, my one-person AI startup, which specializes in the development and implementation of foundational models based on advanced Large Language Models (LLMs) such as GPT-3. My focus is to secure expertise in generative AI by building on the GPT-3 model and thoroughly analyzing the architectural and algorithmic techniques used by OpenAI to achieve effective model training and inference.

The independent development of foundational model technology like GPT-3 is critical in the AI field. Deep Network has successfully established proprietary design knowledge for implementing these complex models, positioning us with unique expertise in this space. Given the rising industry focus on refining inference accuracy and performance, we are actively investigating multiple methods for enhancing the precision of model responses during inference. This is an ongoing challenge for even the largest tech companies, yet we have made significant progress in analyzing and understanding various performance improvement algorithms proposed in the latest academic research.

 

Project Overview

Developing a GPT-3-based foundation model requires significant expertise and resources. At Deep Network, we are committed to independently establishing proprietary technology that enables us to develop, deploy, and fine-tune these models. Our foundation model framework emphasizes not only accuracy but also flexibility, ensuring it can adapt to diverse industry applications. By leveraging the GPT-3 model architecture, we aim to create robust AI solutions capable of understanding and generating human-like language with unprecedented precision.

 

Technical Goals and Model Development

Our project has two core phases:

  1. Model Construction: We will focus on the architecture and essential algorithms required to build a high-quality GPT-3-based foundation model. This phase includes detailed work on data pre-processing, model training, and the foundational model's fine-tuning capacity.
  2. Fine-Tuning and Optimization: Post-construction, we will implement fine-tuning methodologies designed to enhance the model’s inference accuracy and response quality across a variety of use cases. Research-driven algorithmic optimizations will further boost model performance, making it competitive with existing industry solutions.

Current Progress and Investment Need

Deep Network has already established a solid foundation of expertise in GPT-3 model construction and initial implementation. However, to reach the final stages of PoC (Proof of Concept) and to conduct extensive validation tests, additional funding is essential. This investment will allow us to:

  • Scale training resources and optimize computational infrastructure
  • Refine fine-tuning techniques to ensure high accuracy and adaptability
  • Conduct performance testing to confirm commercial readiness

Market Potential and Competitive Edge

With the global AI industry increasingly prioritizing language model innovation, a GPT-3-based foundation model offers vast commercial potential. By focusing on precision, adaptability, and the latest fine-tuning techniques, Deep Network is positioned to meet diverse client needs in sectors ranging from customer service automation to advanced content creation.

 

Investment Opportunity

Deep Network seeks an investment to complete this ambitious project and bring our foundation model to market. This funding will allow us to validate our concept, optimize the model, and establish Deep Network as a leader in the next generation of language model technology.

Thank you for considering Deep Network’s vision for the future of generative AI. We look forward to the possibility of a fruitful partnership.

 

Sincerely,
Seokweon Jang  /  sayhi7@daum.net 
CEO, Deep Network

+ Recent posts