Build your own LLaMA3 application in 20 minutes.

On April 19th, Meta released its latest large language model, LLaMA3, which includes an 8B model and a 70B model. The context length support is 8K, making it the most powerful open-source language model in history and a "bombshell" for the open-source community, aiming directly at GTP4. LLaMA3 has shown explosive performance in various evaluation tasks. The 8B model surpasses Gemma 7B and Mistral 7B Instruct in multiple metrics, while the 70B model surpasses the closed-source Claude 3 Sonnet and Gemini Pro 1.5. For specific evaluation reports, please refer to: eval_details.md

Due to Llama 3's choice of the relatively standard pure decoder Transformer architecture, it is speculated that the performance improvement mainly comes from the improvement in data quality. Firstly, it uses 15T of pre-training data, which is 7 times higher than Llama 2, and significantly increases the usage of code to enhance the model's inference ability. Secondly, it uses a tokenizer with a vocabulary size of 128K, which improves the granularity of tokenization compared to the 32K tokenizer used in Llama 2. In addition, the grouped query attention (GQA) is used in the 8B and 70B models, which improves the inference efficiency of Llama 3.

The open-source community has responded strongly to this, with over 1000 variants on Hugging Face in just 5 days, and the number is still growing.

Faced with one wave after another of major events in the AI industry, what can we do besides praise and anxiety? Although we can't fully immerse ourselves in it, we still want to experience the ripples brought by the AI wave. As the saying goes, it's not AI replacing humans, but people who understand AI replacing those who don't.

So I found the experience address of LLaMA3 using Baidu: https://www.meta.ai/. But after entering it in the browser and waiting for 10 minutes, I gave up... Network issues mercilessly hindered my progress. I opened Baidu again and tried to install LLaMA3 locally, but when I saw the 60GB model size, the expensive GPU computing power, and various program errors, I once again retreated. To persist or to give up, that is the question...

Until I saw this product on JD Cloud...

What? Build your own LLaMA3 application in 20 minutes!

So, I opened the timer on my phone and started my journey to explore the AI wave.

Step 1: Enter the JD Cloud Intelligent Computing Service Console: https://gcs-console.jdcloud.com/instance/list

Step 2: Click the create button to purchase a GPU instance. Make sure to choose "By Configuration" as the billing method, which charges based on usage duration. It's only 1.89 yuan per hour, and with a recharge of 2 yuan, you can play for 2 hours. It's really affordable. Click "Buy Now" to place an order.

Step 3: On the instance list page, wait for the instance status to become "Running", then click Jupyter to enter the AI development environment.

Step 4: In the Jupyter page, click on the Terminal to enter the terminal, and execute the following command:

cp -r /gcs-pub/llama-factory/ /data/

Step 5: In the directory tree on the left, find the llama-factory/src/web_demo.py file, double-click to open it, and modify the server_port to 28888. Save the modification by pressing Ctrl+S.

Step 6: Open the terminal again and execute the following commands:

cd /data/llama-factory
conda create -n liandan python=3.10 -y
conda activate liandan
pip install -e .[metrics]
CUDA_VISIBLE_DEVICES=0 python src/web_demo.py --model_name_or_path /gcs-pub/Meta-Llama-3-8B-Instruct --template llama3

The platform is very fast, faster than other platforms. After a few minutes, I saw the dawn of victory...

Step 7: On the instance list page of the console (https://gcs-console.jdcloud.com/instance/list), click the last column of the instance, then go to Actions -> Apply -> Custom Application, and voila, LLaMA3 appears in its prototype form.

I heard that this platform can also launch document generation applications without code. I'll try it next time. Now, I can't wait to play with LLaMA3. Perfect!