Covering Scientific & Technical AI | Wednesday, October 9, 2024

Meta Releases LLaMA Foundation Language Models to Researchers 

Meta has released a collection of foundation language models dubbed LLaMA, which is short for “Large Language Model Meta AI.”

“Today we’re releasing a new state-of-the-art AI large language model called LLaMA designed to help researchers advance their work. LLMs have shown a lot of promise in generating text, having conversations, summarizing written material, and more complicated tasks like solving math theorems or predicting protein structures,” said Meta CEO Mark Zuckerberg in a Facebook post. “Meta is committed to this open model of research and we’ll make our new model available to the AI research community.”

LLaMA is an auto-regressive language model based on the transformer architecture and was developed by Meta’s Fundamental AI Research (FAIR) team. It is 10x smaller than ChatGPT and comes in four different sizes: 7B, 13B, 33B, and 65B parameters. For comparison, GPT-3.5, the model ChatGPT is based on, was trained with 175B parameters.

Meta trained LLaMA on tokens, which are pieces of words instead of full words, saying this makes the models easier to retrain and fine-tune for specific potential use cases: “We trained LLaMA 65B and LLaMA 33B on 1.4 trillion tokens. Our smallest model, LLaMA 7B, is trained on one trillion tokens.” The company chose text from the 20 most spoken languages and focused on those with Latin and Cyrillic alphabets.

In a company blog post, Meta says that smaller models like LLaMA can enable those in the research community who lack access to large amounts of infrastructure to study these models: “Training smaller foundation models like LLaMA is desirable in the large language model space because it requires far less computing power and resources to test new approaches, validate others’ work, and explore new use cases.”

(Source: Facebook)

Like ChatGPT and Bard, LLaMA is not free from the problems plaguing LLMs, including hallucinations, bias, and generating harmful content. Meta asserts that full research access to these models remains limited due to resource constraints, hindering progress in understanding them and mitigating these known issues.

LLaMA is being released under a noncommercial license focused on research use cases, and access will be granted on a case-by-case basis to academic researchers, civil and governmental organizations, and industry research labs, according to Meta.

Meta hopes that by sharing the code for LLaMA, researchers can test new approaches to limiting these problems in LLMs. In its research paper, the company has provided a set of evaluations on benchmarks evaluating model biases and toxicity to show LLaMA’s limitations and support further research in this area.

The company noted that these foundation models were trained on a large set of unlabeled data, making them ideal for fine-tuning for different tasks. The FAIR team trained the model with publicly available data from CCNet, C4, GitHub, Wikipedia, books, ArXiv, and Stack Exchange, with 67% of the total data coming from CCNet.

Meta claims its LLaMA 13B model can outperform GPT-3 while running on a single GPU when measured on benchmarks such as BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC, and OpenBookQA, which could set the stage for developing applications based on this model using consumer-level hardware in the future.

“We believe that the entire AI community—academic researchers, civil society, policymakers, and industry—must work together to develop clear guidelines around responsible AI in general and responsible large language models in particular. We look forward to seeing what the community can learn — and eventually build — using LLaMA,” the company said.

Download the research paper at this link, and request access to LLaMA here.

This article first appeared on sister publication Datanami.

AIwire