Recently, Meta has released its open-source Large Language Model (LLM) model named Meta Llama 3. The are two major features of this model:
First, Llama 3 is an open-source LLM model which means anyone can download and use it.
Second, The Meta Llama 3 model comes up with two types of parameters intake, 8B and 70B which means it can use 8 billion and 70 billion parameters.
Meta also said in the coming months they are expecting to introduce longer context windows, new capabilities, enhanced performance, and, additional model sizes, and they will also share the Llama 3 research paper.
There are a lot of things to know about the Llama model. Personally, I am very much excited about the model. So, Let’s dive into the article and learn more about the model.
What is Meta Llama 3?
Llama 3 is a family of powerful language models created by Meta AI (formerly Facebook AI Research). It is considered to be the successor of Llama 2 which was released in 2023.
LLMs are a type of artificial intelligence trained on massive amounts of text data. This training allows them to communicate and generate human-like text.
Meta has released the Llama 3 model as open-source, making them more accessible to researchers and smaller companies compared to closed-source models from companies like Google and OpenAI.
This update has new language models ready to go, with different settings for how they learn, with a total of 8 billion and 70 billion parameter sizes.
They’re super versatile and can help with lots of different things. This newer version of Llama is good at lots of different tests used in different industries, and it’s got some new features too, like better thinking skills.
Key features of the model
There are a few key features of the Llama 3 meta models. Let’s go through with them one by one:
Improved Performance, especially in Smaller Models: Meta focused on significantly improving the performance of its smaller models. This means that even the 8B parameter version of Llama 3 may outperform larger models from other companies.
Better Reasoning: It can follow instructions more complex instructions and engage in more nuanced reasoning tasks.
Stronger Code Generation: It has better abilities when it comes to generating and understanding computer code.
Increased Accuracy: It performs better on various benchmarks that test a model’s ability to understand and respond to language (examples include ARC, DROP, and MMLU).
Question Answering: It can provide informative answers to your questions, even if they are complex or open-ended.
Text Generation: Llama 3 can write different kinds of creative text formats, like poems, code, scripts, musical pieces, emails, letters, etc.
Image Generation: The Llama model can generate images very fast but the resolution of the images is not better than other image-generating models like DALL E.
Here’s the sample image generated by the Meta Llama 3 model.
You can clearly see that the image is good but the quality is not as good as the images generated by DALL E and other image-generating tools. But considering that it is an open-source LLM model then the quality is good.
You can also see that there is a watermark on the image which says ‘Imagines with AI’. This watermark you will get in every image that is generated by Meta.ai.
4 key ingredients
In creating a top-notch language model, Meta thinks it’s crucial to innovate, grow, and keep things simple.
This philosophy guided their work on Llama 3, where they concentrated on four main aspects:
- The model architecture
- The pre-training data
- Scaling up pre-training
- Instruction fine-tuning
The architecture of the Meta Llama 3 model
Meta kept things simple with Llama 3 by using a standard decoder-only transformer design, sticking to what works. But they didn’t stop there.
They upgraded from Llama 2 by making some important changes. In Llama 3, they use a special way of organizing language that’s much better, boosting how well the model works.
To make Llama 3 faster at figuring things out, they introduced something called grouped query attention (GQA), which helps with both the smaller (8B) and larger (70B) versions of Llama 3.
During training, Meta focused on sequences of 8,192 words and made sure the model didn’t mix up different parts of a document while learning.
Training data of model
To make sure Llama 3 is the best it can be, Meta focused on getting the best training data. They gathered a huge amount of text, over 15 trillion tokens, from sources available to the public.
That’s seven times more data than what they used for Llama 2, including four times more code. And to get ready for using Llama 3 in different languages, they added over 5% of top-notch non-English data covering more than 30 languages.
However, they know it won’t perform as well in these languages as it does in English.
To ensure their training data is top-notch, Meta created several filters like NSFW filters, semantic deduplication approaches, and text classifiers. These filters check for things like inappropriate content, repeated information, and how good the text is.
Interestingly, they discovered that previous versions of Llama are really good at spotting high-quality data, so they used Llama 2 to help them create these filters for Llama 3.
Meta AI also tried out different ways of mixing data from different sources to see what works best.
This helped them choose a mix of data that makes sure Llama 3 can handle all sorts of tasks, like answering trivia, dealing with science and technology, coding questions, and knowing about history.
Pre-training
To make the most of Meta’s pre-training data in Llama 3, They focused on increasing the scale of their training efforts.
They developed precise scaling laws to guide their evaluations of different tasks. These laws help them choose the right mix of data and make smart decisions about how to use their computing power efficiently.
Using scaling laws, they can predict how well their biggest models will perform on important tasks, like generating code, before they start training them.
This ensures that their final models excel in a wide range of uses and abilities. As they worked on Llama 3, developers noticed some interesting things about how scaling affects performance.
For instance, they found that even though an 8B parameter model reaches its best performance with about 200B tokens of training data (what they call the Chinchilla-optimal amount).
It keeps getting better with even more data. Both their 8B and 70B parameter models showed steady improvement even after being trained on up to 15 trillion tokens of data.
You can clearly see the difference between the sizes of the parameters in the image but It totally depends on your requirement of which one to choose.
If you want to perform small tasks then the llama 8B parameter size will be suitable for you but if you want complex calculations, tasks, and better results then you should definitely go for the llama 70B parameter size.
Instruction fine-tuning
To make their pre-trained models work really well in chat situations, they came up with new ways to fine-tune them after training.
They use a mix of methods like supervised fine-tuning, rejection sampling, proximal policy optimization, and direct preference optimization.
But what really makes a difference is the quality of the prompts they use for fine-tuning and the rankings they use for optimization.
They saw significant improvements in model performance by carefully selecting and checking these prompts and rankings with help from human annotators.
Llama 3 benchmark
Meta’s new Llama 3 models with 8 billion and 70 billion parameters are a big step forward from Llama 2, setting a new standard for large language models of these sizes.
Because of better training before and after, their models are the best ones out there in terms of parameters.
They’ve made significant progress in how the models respond, reducing mistakes, making them more consistent, and giving more varied answers.
Plus, they’re much better at things like thinking things through, writing code, reasoning, and following instructions, which means they’re easier to guide.
In creating Llama 3, Meta wanted to make sure it performed well not only on standard tests but also in real-life situations.
So, they made a special set of tests done by humans to check its quality. This set has 1,800 different questions covering 12 important uses, like asking for advice, writing code, or answering open-ended questions, reasoning, rewriting, and summarization.
They keep this test set private even from their own team to avoid any bias.
The chart below shows how well Llama 3 does compared to other models across these different uses and questions.
The rankings from human evaluators show that our 70B instruction-following model outperforms other models of similar size in real-life situations. Their pre-trained model also sets a new standard for large language models of these sizes.
Meta Llama 3 system requirements
If you want to use the Llama model locally on your computer, you should have minimum system requirements to run the model smoothly in your system.
Llama 3 – 8B:
Storage: ~16GB of disk space.
GPU: Dedicated NVIDIA GPU, ideally from the Ampere architecture (e.g., NVIDIA A10) with a minimum 20GB VRAM.
RAM: 30GB+ system RAM is recommended for efficient loading and inference.
Llama 3 – 70B:
Storage: ~140GB disk space.
GPUs: Multi-GPU system with a minimum of 8x NVIDIA A10 GPUs (or equivalent) providing a combined VRAM of at least 160GB.
RAM: Similar recommendations as the 8B model.
Software:
Operating System: Linux-based distributions (e.g., Ubuntu) are strongly recommended for optimal compatibility.
Programming Language: Python (3.6+)
Deep Learning Framework: PyTorch (1.10+)
Libraries:
Hugging Face Transformers or alternatives like vLLM for model interaction.
Distributed inference libraries may be necessary for effective load balancing across multiple GPUs for the 70B model.
Deployment and Optimization:
Cloud Infrastructure: For researchers and organizations lacking dedicated high-performance hardware, cloud platforms like AWS EC2 (g5 instances), Google Cloud, or Microsoft Azure offer scalable GPU-accelerated solutions.
Technical Expertise: Deployment and maintenance of LLM infrastructure or the management of complex cloud environments require specialized knowledge and experience.
How to access Llama 3
If you want to experience the Meta Llama 3 model without downloading any files on your computer, then you can visit their official Meta website. They have a user interface similar to ChatGPT and Gemini.
They have integrated their model on the website so that a user can directly experience their model.
NOTE: If you’re not in Australia, Canada, Ghana, Jamaica, Malawi, New Zealand, Nigeria, Pakistan, Singapore, South Africa, Uganda, Zambia, or Zimbabwe, you’ll see a message saying “Meta AI isn’t available yet in your country” because the Llama 3 model is only accessible in those places.
Meta Llama 3 download
If you want to download their model whether it is 8B parameter size or 70B parameter size, and use it on your computer then there 3 ways to download the Llama 3 model and use it on your computer.
Whether you get the Meta Llama models directly from Meta, Hugging Face, or Kaggle, you’ll need to agree to their license agreements first.
- Meta:
To access the Meta Llama models directly from Meta, visit the Meta Llama download form.
Fill in your details, including your email address. Choose the models you’re interested in and carefully review and accept the relevant license agreements.
After making your request, you’ll receive an email for each model you’ve chosen. These emails will contain instructions along with a pre-signed URL to download the model.
You can use the same URL to download multiple model weights, like 7B and 13B.
Remember, the URL expires either after 24 hours or after five downloads, whichever comes first.
But don’t worry, you can always request the models again to get fresh pre-signed URLs if needed.
2. Hugging face:
To access the models from Hugging Face (HF), start by logging into your HF account.
Then, choose the model you’re interested in like llama 3 gguf, instruct, guard, etc.
You’ll be directed to a page where you can enter your information and review the relevant license agreement.
Once you’ve accepted the agreement, your information will be reviewed. This process may take a few days.
Once your request is approved, you’ll receive an email notifying you that you now have access to the HF repository for the model.
It’s important to note that cloning the HF repository to your local computer won’t give you all the model files because some files are too large.
In the local clone, these files only contain metadata for the actual file.
To access these larger files, you’ll need to go to the specific file in the repository on the HF site and download it directly from there.
For instance, to obtain consolidated.00.pth for the Meta Llama 2 7B model, you can download it from the Hugging Face page.
3. Kaggle:
To obtain the models from Kaggle, including the Hugging Face versions, start by logging into your Kaggle account.
Before accessing the models on Kaggle, you’ll need to submit a request for model access.
This requires accepting the model license agreement on the Meta site.
Make sure the email address you provide when accepting the license agreement matches the one you use for your Kaggle account.
After accepting the license agreement, return to Kaggle and submit your request for model access.
Approval may take a few days. Once approved, you’ll receive an email confirming your access.
To access a specific model on Kaggle, choose it from the Model Variations dropdown menu, then click the download icon.
An archive file containing the model will begin downloading.
I hope you like the article. I tried to cover almost all the important related to Llama 3, but if there is still any point that you think is not covered in the article then please let me know in the comment box. I will surely include that in the article.
Related Questions
Is Llama 3 open source?
Yes, Meta AI has released the core parameters (weights) of Llama 3 as open-source. This means researchers can analyze, experiment with, and even fine-tune the model.
Is Llama 3 multimodel?
It doesn’t have multimodal capabilities at the moment. However, Meta is actively working on solving this and plans to introduce multimodal capabilities soon, starting with the Ray-Ban Meta smart glasses.
What is Llama 3 context length?
The models support a context length of 8,000 tokens. This means it can remember and use more information from previous interactions, making it better at understanding conversations, following complex instructions, and remembering important details.
Also Read: