What happened this week in AI by Louie This week in AI, we were following more model releases in the open-source LLM space, including the recently unveiled Falcon 180B, together with more teasing of upcoming models at the tech giants. The Falcon 180B has already topped the Hugging Face leaderboard and consists of a huge 180 billion parameters, making it the largest openly available language model to date. Its training involved processing a massive 3.5 trillion tokens concurrently on up to 4096 GPUs, utilizing Amazon SageMaker, consuming approximately 7,000,000 GPU hours in the process. Developed as part of the Falcon family by the Technology Innovation Institute in Abu Dhabi, this model's dataset is primarily composed of web data from RefinedWeb (accounting for 85% of the data), supplemented with a carefully curated blend of conversations, technical papers, and a small fraction of code (around 3%). In terms of performance, Falcon 180B reported beating both Llama 2 70B and OpenAI's GPT-3.5 in terms of Mean Multi-Language Understanding (MMLU) and measuring on par with Google's PaLM 2-Large. Falcon 180B is available in the Hugging Face ecosystem, starting with Transformers version 4.33. However, it's essential to note that commercial use of Falcon 180B is currently subject to stringent conditions, with "hosting use" explicitly excluded. While open source is clearly still some distance from challenging GPT-4 in terms of performance and compute intensity, we expect to see increasing availability of open source models to compete with GPT-3.5 and are excited to see what can be built with the increased flexibility this brings.
This AI newsletter is all you need #64
This AI newsletter is all you need #64
This AI newsletter is all you need #64
What happened this week in AI by Louie This week in AI, we were following more model releases in the open-source LLM space, including the recently unveiled Falcon 180B, together with more teasing of upcoming models at the tech giants. The Falcon 180B has already topped the Hugging Face leaderboard and consists of a huge 180 billion parameters, making it the largest openly available language model to date. Its training involved processing a massive 3.5 trillion tokens concurrently on up to 4096 GPUs, utilizing Amazon SageMaker, consuming approximately 7,000,000 GPU hours in the process. Developed as part of the Falcon family by the Technology Innovation Institute in Abu Dhabi, this model's dataset is primarily composed of web data from RefinedWeb (accounting for 85% of the data), supplemented with a carefully curated blend of conversations, technical papers, and a small fraction of code (around 3%). In terms of performance, Falcon 180B reported beating both Llama 2 70B and OpenAI's GPT-3.5 in terms of Mean Multi-Language Understanding (MMLU) and measuring on par with Google's PaLM 2-Large. Falcon 180B is available in the Hugging Face ecosystem, starting with Transformers version 4.33. However, it's essential to note that commercial use of Falcon 180B is currently subject to stringent conditions, with "hosting use" explicitly excluded. While open source is clearly still some distance from challenging GPT-4 in terms of performance and compute intensity, we expect to see increasing availability of open source models to compete with GPT-3.5 and are excited to see what can be built with the increased flexibility this brings.