This AI newsletter is all you need #79

Dec 27, 2023

What happened this week in AI by Louie

This week in AI, we were excited to see the release of the latest generation of the Midjourney model. The MidJourney V6 features enhanced prompt accuracy, improved coherence, and, in particular, greatly improved text generation capabilities.

This comparison from v1 to v6 models shows significant improvement aided by feature updates such as longer prompt length, more control over color, shading, and text, and the ability to fine-tune the output through a conversation. One of the most notable components of V6 is its improved text-drawing ability, with anecdotally far fewer spelling errors than previous leading image generation models. Another notable development is in the way it interprets and understands prompts. For example, it can now understand more nuances in both punctuation and grammar.

In related news, we saw a new draft bill from The AI Foundation Model Transparency Act calling for training data transparency so that copyright holders know when their data is used. The bill explicitly mentions the lawsuits against Stability AI, Midjourney, and Deviant Art for copyright infringement. Generative AI art models are trained on billions of images collected from the web, generally without the creators’ knowledge or consent. There is no simple way to find a picture online and trace it back to its original owners. Whether or not these systems infringe on copyright law is a complicated question.

The bill also highlights that AI developers must report efforts to “red team” the model to prevent it from providing “inaccurate or harmful information” for vulnerable populations, including children. This step comes at a time when concerns about the safety of image generation models are growing, as a Stanford University Cyber Policy Center report disclosed over 3,200 potentially harmful images in the LAION-5B database utilized for training AI generators like Stable Diffusion 1.5.

Why should you care?

As AI models become more powerful and capable, such as with much better (though still imperfect) text generation in Midjourney v6, more human jobs, such as graphic design work, can be significantly impacted by AI. Establishing a firm legal framework for how copyright, design ownership, and permission should function in this new paradigm becomes even more critical.

- Louie Peters — Towards AI Co-founder and CEO

Hottest News

1.Midjourney V6 Is Here With In-Image Text

The MidJourney V6 features enhanced prompt accuracy, improved coherence, and, precisely, text generation capabilities. The latest model shows significant improvement aided by feature updates such as longer prompt length, more control over color, shading, and text, and the ability to fine-tune the output through a conversation.

2. AI Companies Would Be Required To Disclose Copyrighted Training Data Under the New Bill

The AI Foundation Model Transparency Act aims to establish rules for reporting training data transparency. Companies will be required to report sources of training data and how the data is retained during the inference process, describe the limitations or risks of the model, and how the model aligns with NIST’s planned AI Risk Management Framework.

3. OpenAI in Talks To Raise New Funding at $100 Bln Valuation — Bloomberg News

OpenAI is in early talks to raise a fresh round of funding at a valuation at or above $100 billion, Bloomberg News reported. OpenAI has also held discussions to raise funding for a new chip venture with Abu Dhabi-based G42, according to the report.

4. Anthropic Seeking To Raise $750 Mln in Funding Round Led by Menlo Ventures

Anthropic is in discussions to raise $750 million in a funding round led by Menlo Ventures. The company already has investments from Google and Amazon, and the valuation before the funding round came to $18.4 billion.

5. ChatGPT Ends the Year With a Long List of New Competitors

When OpenAI released ChatGPT a year ago, there was nothing else like it. Now, well-funded tech companies are launching AI services that rival ChatGPT almost every month — this year marked the release of powerful open-source and multimodal language models such as Gemini, Mistral, Grok, and more.

Five 5-minute reads/videos to keep you learning

1.The 8 Major AI Moments That Defined 2023

2023 will be looked back on as a significant turning point in the adoption of AI into society. This article shares the most important events of the last 12 months — in terms of the implications they could have on the future of AI and our lives, like the release of GPT-4, the EU AI Act, and more.

2. Explainer: What’s Next for the EU AI Act?

European Union countries and lawmakers agreed on a provisional deal for AI rules to establish guardrails for the rapid development of AI. The Explainer guide presents a timeline, points of agreement, what happens if it becomes a law, and more.

3. Langchain vs LlamaIndex vs OpenAI GPTs: Which One Should You Use?

This video tutorial dives into LLM application development, comparing the paths of building your own framework from scratch with utilizing established platforms like LangChain, LlamaIndex, and OpenAI Assistants.

If you want to learn how to build better RAG-based applications, leverage LlamaIndex, LangChain, or OpenAI toolsets, our Retrieval Augmented Generation for Production with LlamaIndex and LangChain course is for you, and it is completely free!

4. Efficient Data Import Into MySQL and PostgreSQL: Mastering Command-Line Techniques

This article is a guide for understanding the command-line methods for importing data. It also provides a deeper understanding of the underlying processes and covers efficiently importing data into MySQL and PostgreSQL.

5. How Machine Learning Algorithms Work in Face Recognition Deep Learning?

Face recognition deep learning can be used for all kinds of platforms, and their usage is increasing daily due to its remarkable benefits. This article explains the face identification system, how machine learning facial recognition works, and its benefits for security, fraud detection, etc.

Repositories & Tools

1. Aim is an open-source AI metadata tracker. It also enables a UI to compare & observe them and an SDK to query them programmatically.

2. Pythagora-io / gpt-pilot is a Dev tool that writes scalable apps from scratch while the developer oversees the implementation.

3. Helix allows training, fine-tuning, and generating AI using open-source software and personal datasets.

4. TLDR is an IDE plugin that utilizes AI to explain code in plain English to build the mental context of complex methods.

Top Papers of The Week!

1. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

MetaGPT is an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. It encodes Standardized Operating Procedures (SOPs) into prompt sequences for more streamlined workflows, allowing agents with human-like domain expertise to verify results and reduce errors.

2. DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

DiffMorpher is the first approach enabling smooth and natural image interpolation using diffusion models. The key idea is to capture the semantics of the two images by fitting two LoRAs to them and interpolating between the LoRA parameters and the latent noises to ensure a smooth semantic transition.

3. Generative Multimodal Models are In-Context Learners

This paper demonstrates that effective scaling-up can significantly enhance large multimodal models’ task-agnostic in-context learning capabilities. It introduces Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences with a unified autoregressive objective.

4. PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

This paper introduces PowerInfer, a high-speed LLM inference engine on a personal computer with a single consumer-grade GPU. The evaluation shows that PowerInfer attains an average token generation rate of 13.20 tokens/s on a single NVIDIA RTX 4090 GPU, which is only 18% lower than the A100 GPU.

5. StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

This paper introduces StreamDiffusion, a real-time diffusion pipeline for interactive image generation. This approach transforms the original sequential denoising into the batching denoising process. Stream Batch eliminates the conventional wait-and-interact approach and enables fluid and high throughput streams.

Quick Links

1. Humane says it will start shipping its AI pin, an AI-powered wearable device, in March 2024. The AI Pin can make phone calls, text your friends, play music, and catch you up on your email.

2. Apple’s latest AI research could enable more immersive visual experiences and allow complex AI systems to run on consumer devices like the iPhone and iPad.

3. Stable Video Diffusion is now available through its API, allowing third-party developers to incorporate it into their own apps, websites, software, and services.

Who’s Hiring in AI!

AI Technical Writer and Developer for Large Language Models @Towards AI (Remote)

Data QA Lead, Amazon Search @Amazon (Palo Alto, CA, USA)

Junior Software Engineer in AI @robusta (Remote)

Data Science PhD Internship @BetterHelp (Remote)

Full-Stack Software Engineer @super.AI (Remote)

Software Engineer, Machine Learning @Whatnot (Remote)

Junior Data Operations Engineer @Clarity AI (Remote)

Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.

If you are preparing your next machine learning interview, don’t hesitate to check out our leading interview preparation website, confetti!

This AI newsletter is all you need #78 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Towards AI Newsletter

Discussion about this post

Ready for more?