TAI #110; Llama 3.1’s scaling laws vs 100k…

Jul 30, 2024

Also Mistral Large 2 123B, AlphaProof Olympiad, SearchGPT, GPT4o 64k output, SAM 2

6 Comments

Llama 3.1 and new models show that AI is entering a period of explosive growth in computing power and applications. https://fnfmods.io

Expand full comment

Maryann Fisher

Dec 23, 2024Edited

Very intriguing. Much obliged for the updates.

https://doodle-jump.co

Expand full comment

Nelly P.

Jul 31, 2024

Really interesting. Thanks for these updates

Expand full comment

Jul 31, 2024

150 Trillion training tokens?

Who has that?

Expand full comment

Reply (1)

Jul 31, 2024

Sorry 140 trillion. After cleaning/ post processing. Text max comes out to 20 trillion tops. Including multimodal (audio, video, image) will

Be 5-10 Trillion max.

Expand full comment

Reply (1)

Towards AI

Jul 31, 2024

This is an illustration based on LLama 3.1's scaling laws which apply to their data mix on their dense architecture. LLMs at this next scale will use a different data mix and different architecture which most likely will reduce the optimal tokens. They can also use synthetic data and multiple epochs though; GPT-4 used 2-4 epochs.

They may also invest more compute into FLOPs per forward/backward pass relative to tokens.

Expand full comment

Towards AI Newsletter

TAI #110; Llama 3.1’s scaling laws vs 100k…