Advanced Computing in the Age of AI | Sunday, May 26, 2024

Meta Joins the AI Race with the Llama 3 Model 

The race is on to create the best AI tool. While the likes of Google and ChatGPT are already deeply invested in the field, other companies have lagged behind. As of this Thursday, Meta is no longer one of those companies.

Meta – which is the parent company of Facebook, Instagram, and WhatsApp – is joining the competitive field of AI tools with its powerful model Llama 3. Called Meta AI, this new AI assistant is now integrated into the search bar of all the company’s tech holdings. What’s more, Meta AI can also be accessed via a specific website built for the tool.

Much like Llama 2, Llama 3 is an open-weight large language model. However, it is important to note that Llama 3 does not technically qualify as open source. While it does rely on open-source software and technologies such as PyTorch and Transformers, the Meta AI model itself has certain intellectual property protections that disqualify it from official open-source status.

Both Llama 2 and Llama 3 were trained on two data center scale clusters each with 24,576 NVIDIA Tensor Core H100 GPUs. This is a major improvement over the original clusters for this work, which contained 15,000 NVIDIA A100 GPUs.

Currently, Llama 3 is available in both 8 billion and 70 billion parameter sizes and comes in two versions: pre-trained and instruction-tuned, with the latter being fine-tuned to follow user instructions. Each of these has an 8,192 token context limit.

Meta also announced that there have been many major improvements to Llama 3 over Llama 2. For instance, Llama 3 uses a tokenizer with a vocabulary of 128,000 tokens that is meant to encode language much more efficiently. Additionally, the company sought to improve inference efficiency with Llama 3, which they say they have achieved by adopting grouped query attention (GQA) across both the 8 billion and 70 billion sizes.

Llama 3 is also pretrained on over 15 trillion tokens that were collected from publicly available sources, which is seven times larger than what was used for Llama 2. What’s more, Meta is hoping to stay ahead of the curve by preparing for upcoming multilingual use cases. The company states that over 5% of the Llama 3 pretraining dataset consisted of non-English data from over 30 languages.

Additionally, Meta mentioned that only allowing high-quality data in the Llama 3 training regiment was of the utmost importance. The company developed a series of data-filtering pipelines that include using heuristic filters, NSFW filters, semantic deduplication approaches, and text classifiers to predict data quality.

Only time will tell how valuable the Meta AI platform is. However, it would appear that Meta itself is working hard to position the AI assistant as a major development in the field. The company stated that they are currently working on training models with over 400 billion parameters. What’s more, Meta expects to release new capabilities over the next few months, including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities.

EnterpriseAI