5 Comments
User's avatar
Benjamin124's avatar

Llama 3.1 and new models show that AI is entering a period of explosive growth in computing power and applications. https://fnfmods.io

Maryann Fisher's avatar

Very intriguing. Much obliged for the updates.

https://doodle-jump.co

Kc's avatar

150 Trillion training tokens?

Who has that?

Kc's avatar

Sorry 140 trillion. After cleaning/ post processing. Text max comes out to 20 trillion tops. Including multimodal (audio, video, image) will

Be 5-10 Trillion max.

Towards AI's avatar

This is an illustration based on LLama 3.1's scaling laws which apply to their data mix on their dense architecture. LLMs at this next scale will use a different data mix and different architecture which most likely will reduce the optimal tokens. They can also use synthetic data and multiple epochs though; GPT-4 used 2-4 epochs.

They may also invest more compute into FLOPs per forward/backward pass relative to tokens.