This AI newsletter is all you need #76
What happened this week in AI by Louie
This week, we were focused on significant AI developments beyond the realm of transformers and large language models (LLMs). While the recent momentum of new video generation diffusion-based model releases continued, what excited us most was DeepMind’s latest materials model, GNoME.
GNoME is a new large-scale graph neural network designed to discover new crystal material structures that dramatically increase the speed and efficiency of discovery. The model’s results were announced and made available by Deepmind this week. Incredibly, this has increased the number of likely stable materials known to humanity ~10x week over week! GNoME’s discovery of 2.2 million materials would be equivalent to about 800 years’ worth of knowledge. Of its 2.2 million predictions, 380,000 are estimated to be stable, making them promising candidates for experimental synthesis. Despite this huge breakthrough in human knowledge, there is still a bottleneck in the number of labs and experts available to produce these materials and test them for useful properties. Positively towards this, a second paper was released demonstrating how AI could also be used to help produce these materials.
Why should you care?
Human history has regularly been segmented and described by the new materials discovered and used, and even today, many new technologies can be driven by the discovery of new materials, from clean energy to computer chips, fusion power, and even room-temperature superconductors. We think there is a high chance there is a game-changing new material contained within Deepmind’s new data release; however, it will still take a lot of time to discover which new materials have useful properties and are manufacturable affordably at scale. More broadly, the success of scaling up graph neural networks here suggests the recent AI GPU buildout is likely to lead to breakthroughs beyond the world of scaling LLMs.
- Louie Peters — Towards AI Co-founder and CEO
Meta introduced a model family that enables end-to-end expressive and multilingual translations, SeamlessM4T v2. Seamless is a system that revolutionizes automatic speech translation. This advanced model translates across 76 languages and preserves the speaker’s unique vocal style and prosody, making conversations more natural.
Stability AI introduces SDXL Turbo, a new text-to-image model that uses adversarial diffusion distillation (ADD) to generate high-quality images in a single step rapidly. It enables the quick and precise creation of 512 x 512 images in just over 200 milliseconds.
Pika Labs has released Pika 1.0, an impressive AI video generation tool. It has advanced features like Text-to-Video and Image-to-Video conversion. The company has also raised $55 million in funding to compete against giants Meta, Adobe, and Stability AI.
Berkeley unveiled Starling-7B, a powerful language model that utilizes Reinforcement Learning from AI Feedback (RLAIF). It harnesses the power of Berkley’s new GPT-4 labelled ranking dataset, Nectar. The model outperforms every model to date on MT-Bench except for OpenAI’s GPT-4 and GPT-4 Turbo.
AWS unveiled its next generation of AI chips — Graviton4 and Trainium 2, for model training and inferencing. Trainium 2 is designed to deliver up to 4x better performance and 2x better energy efficiency, and Graviton4 provides up to 30% better compute performance, 50% more cores, and 75% more memory bandwidth than one previous generation.
This week saw the release of several new generative models for image, audio, and video generation. Which one seems the most promising to you and why? Share it in the comments.
Five 5-minute reads/videos to keep you learning
This article showcases visual and interactive representations of renowned Transformer architectures, including nano GPT, GPT2, and GPT3. It provides clear visuals and illustrates the connections between all the blocks.
This video guides developers and AI enthusiasts on improving LLMs, offering methods for both minor and significant advancements. It also helps choose between training from scratch, fine-tuning, (advanced) prompt engineering, and Retrieval Augmented Generation (RAG) with Activeloop’s Deep Memory.
It has been a year since OpenAI quietly launched ChatGPT. This article traces the timeline of the AI evolution in the past year and how these technologies may upend creative and knowledge work as we know it.
AI wrappers are practical tools that leverage AI APIs to generate output and have proven to be financially rewarding for creators. Examples like Formula Bot and PhotoAI have annual revenues ranging from $200k to $900k.
Prasad Ramakrishnan, CIO of Freshworks, highlights several practical AI use cases for startups. This article explores five ways organizations leverage AI for effective problem-solving, from improving user experience to streamlining onboarding processes and optimizing data platforms.
Repositories & Tools
1. Whisper Zero by Gladia is a complete rework of Whisper ASR to eliminate hallucinations.
2. Taipy is an open-source Python library for building your web application front-end & back-end.
3. GPT-fast is a simple and efficient PyTorch-native transformer text generation in <1000 LOC of Python.
4. GitBook is a technical knowledge management platform that centralizes the knowledge base for teams.
Top Papers of The Week!
GPT-4 has surpassed Med-PaLM 2 in answering medical questions using a new Medprompt methodology. By leveraging three advanced prompting strategies, GPT-4 achieved a remarkable 90.2% accuracy rate on the MedQA dataset.
“Merlin,” a new MLLM supported by FPT and FIT, demonstrates enhanced visual comprehension, future reasoning, and multi-image input analysis. Researchers propose adding future modeling to Multimodal LLMs (MLLMs) to improve their understanding of fundamental principles and subjects’ intentions. They utilize Foresight Pre-Training (FPT) and Foresight Instruction-Tuning (FIT) techniques inspired by existing learning paradigms.
Dolphins is a vision-language model designed as a conversational driving assistant. Trained using video data, text instructions, and historical control signals, it offers a comprehensive understanding of difficult driving scenarios for autonomous vehicles.
This paper introduced the Diffusion State Space Model (DiffuSSM), an architecture that supplants attention mechanisms with a more scalable state space model backbone. This approach effectively handles higher resolutions without global compression, thus preserving detailed image representation throughout the diffusion process.
This is a comprehensive survey of LLM-based agents. It traces the concept of agents from its philosophical origins to its development in AI and explains why LLMs are suitable foundations for agents. It also presents a general framework for LLM-based agents, comprising three main components: brain, perception, and action.
1. Alibaba Cloud introduces the Tongyi Qianwen AI language model with 72 billion parameters. Qwen-72B competes with OpenAI’s ChatGPT and excels in English, Chinese, math, and coding.
2. Nvidia’s CEO Jensen Huang led the company’s AI growth, resulting in a staggering $200B increase in value. With a strong focus on AI and its applications in various industries, Nvidia has surpassed major companies like Walmart.
3. Google is responding to pressure from generative AI tools and legal battles by making changes to its search experience. They are testing a “Notes” feature for public comments on search results and introducing a “Follow” option, allowing users to subscribe to specific search topics.
Who’s Hiring in AI!
Interested in sharing a job opportunity here? Contact firstname.lastname@example.org.
If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!