Homescreen
Posts
🏠 Llama 3

🏠 Llama 3

Meta is catching up

Brett Goldstein
April 22, 2024

Sponsored by

GM! Brett. If you have a startup and are fundraising (or thinking about it) I'd love to try to put you in front of some investors and give you early access to this sweet fundraising tool we've been building. Please fill out this short survey.

🏠 AI

Meta: almost there!

Meta dropped their Llama 3 models a couple of days ago and they're really good.

Introducing Meta Llama 3: the most capable openly available LLM to date.
Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes.
Today's release includes the first two Llama 3… twitter.com/i/web/status/1…
— AI at Meta (@AIatMeta)
4:30 PM • Apr 18, 2024

I found some interesting stuff under the hood.

On the model itself, they've made solid gains with the 8B and 70B parameter versions.

LYMSYS Arena Leaderboard

Key things they've done:

Way more training data (7x vs Llama 2)
Tweaks to model architecture like expanded tokenizer vocab and grouped query attention
A blend of training approaches: supervised learning, rejection sampling, PPO, preference optimization

But the bits that really caught my eye were:

Using Llama 2 to filter the training data for Llama 3. Kinda meta.
Detailed scaling laws let them predict the performance of their biggest models before training them.
On 16K GPUs, they're pushing 400 TFLOPS per GPU. That is a LOT of compute.
Instruction tuning / RLHF on reasoning tasks teaches the model how to actually use its own reasoning to pick the best answers. Interesting.

Responsibility-wise, good to see things like Llama Guard 2 for filtering, CyberSecEval 2 for catching vulnerabilities, and Code Shield for blocking sketchy generated code.

The new Meta AI assistant rolling out on Facebook, Insta, WhatsApp etc is powered by Llama 3. It can help with all sorts of tasks like planning trips, generating images, explaining complex topics. Tighter search integration means you don't have to jump between apps.

The image gen in particular sounds slick - faster, higher quality, animations and remixing.

Zooming out, the revelation that Meta's largest Llama 3 models in the works are 400B+ parameters and posting impressive numbers mid-training really underscores the stakes. The technical moat around the "closed-source first" crowd looks to be thinning faster than anticipated.

The Big Picture: Open source models are slowly but steadily catching up to OpenAI.

Fun fact: Today is Sam Altman's birthday and I'm wondering if the below tweet might turn out correct...

There's virtually no chance OpenAI will sit on their hands and let Meta release something this good without replying.
When Google's Gemini dropped, OpenAI showed us Sora. Claude 3? New GPT-4.
Rumour has it: a new OpenAI release on Monday, Sam Altman's birthday!
— Jeremy Nguyen ✍🏼 🚢 (@JeremyNguyenPhD)
5:45 PM • Apr 20, 2024

Get smarter on AI.

What’s the secret to staying ahead of the curve in the world of AI? Information. Luckily, you can join early adopters reading The Rundown– the free newsletter that makes you smarter on AI with just a 5-minute read per day.

🤝 THE LATEST IN…

TECH

China's nuclear-powered spacecraft could essentially cut the time it takes to reach Mars in half.
Tesla’s stock down nearly 37%.
Tencent to release its new mobile game after 7 years.
New bill could ban TikTok in the US.

AI

Google merges Android and Pixel divisions to better integrate AI in hardware.
Olympics Organizers to use AI for enhanced sports performance and fairness.
Global banking regulator urges banks to proactively manage AI and machine learning risks in daily operations.
Google also merges Research and DeepMind teams to drive AI development.

SCIENCE

Satellite data reveals widespread land sinking in Chinese cities, posing a risk for a third of China’s population.
Astrophysicists solve Pluto's heart shape mystery, linking it to a slow, oblique impact and altering views on its internal structure.
Colorado law extends privacy rights to neural data collected by consumer companies.
3 Russians parachute from the Earth's stratosphere to the North Pole to test a new communications system for Arctic use.