Ai researches
Posts
🤯 OpenAI just announced 'o3' (AGI)

🤯 OpenAI just announced 'o3' (AGI)

plus: Google's Gemini 2.0 Is Out

AI researches
December 20th, 2024

In partnership with

Welcome, AI RESEARCHERS.

OpenAI’s O3: Closer to Human-Like Problem-Solving

Feeling the shift? OpenAI’s O3 model reaches beyond basic outputs, tapping into a reasoning process that nudges AI closer to AGI-level thinking.

Want to glimpse a world where AI thinks before it speaks? Let’s explore…

In today’s AI Researchs:

OpenAI’s Next-Gen ‘o3’ Reasoning Model Lands
Some major AI developments
Google's Free AI Model Just Took the #1 Spot in AI Rankings
Turn AI into Your On-Demand Coding Guide
AI Researches & AI pointers
Trending Ai tools of the week
More Ai news and developments

Read time: 6 minutes

OPENAI

🚀 OpenAI’s Next-Gen ‘o3’ Reasoning Model Lands

AI Researchs: OpenAI has unveiled its new o3 model family, succeeding the o1 “reasoning” model and inching closer to what it calls AGI-level performance. This advance could mark a turning point in how AI models think, plan, and solve problems at a level that rivals, and sometimes surpasses, human capability.

The Details:

The o3 and o3-mini models follow o1’s reasoning approach and claim near-AGI capabilities in controlled conditions.
They introduce a “private chain of thought” mechanism, internally fact-checking and planning before producing answers.
The models can be configured with varying “compute” settings, trading off speed for more reliable, high-quality responses.
Performance benchmarks show o3 surpassing its predecessor and beating other top models in coding, math, and scientific reasoning.
However, stronger reasoning may lead to increased deception risks, challenging researchers to maintain safety and alignment.
OpenAI is adopting a “deliberative alignment” strategy to ensure ethical deployment and is partnering with safety testers and researchers.
Competitors like Google and Alibaba are launching their own reasoning models, fueling an escalating race in AI innovation.

Availability: Safety researchers can request early access to o3-mini starting later today. The full o3 model will be rolled out after the research phase, with a broader launch expected toward the end of January for o3-mini and shortly after for o3 itself.

o3 scored 75.7% on ARC-AGI Public Benchmark within compute requirements. o3 scored 87.5% in High compute mode. Human performance is 85%.

Want to get the most out of ChatGPT?

ChatGPT is a superpower if you know how to use it correctly.

Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.

Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.

Download the free guide with actionable prompts and tips to make your work more productive.

Major Quick Developments

Genesis, Developed by researchers across 20 laboratories, is an open-source first generative AI physics simulator, operating 43 million FPS on a single RTX 4090 GPU and enabling the training of transferable robot locomotion policies in just 26 seconds. introducing a novel approach to ultra-fast simulation and AI interaction with physical environments.

OpenAI introduces ChatGPT access via a 1-800 number in the US and a new WhatsApp integration for global users.

GitHub introduces a free tier for its AI Copilot in VS Code, offering limited access to coding assistance tools.

Nvidia unveils the $249 Jetson Orin Nano Super Developer Kit, featuring 1.7x performance gains, 70% more processing power, and better memory, designed for multi-tasking AI applications

GOOGLE GEMINI

🥇 Google's Free AI Model Just Took the #1 Spot in AI Rankings

chatbot arena

AI Researchs: Google has launched Gemini 2.0 Flash Thinking Experimental, an AI model that mimics human-like reasoning to tackle complex problems, setting a new benchmark in AI performance. This free, fast model aims to outpace rivals like OpenAI's o1 by showing its problem-solving steps in real-time.

The Details:

Gemini 2.0 emphasizes transparency in its thought process, mirroring methods seen in reasoning models such as OpenAI's o1.
Users have noted its faster performance compared to other reasoning models, thanks to optimizations in the Flash platform.
By increasing computation time for more accurate outputs, it balances speed with precision in solving intricate queries.
The model has taken the #1 rank in the Chatbot Arena and is freely accessible through Google’s AI Studio, Gemini API, and Vertex AI.

AI TUTORIAL

✅ Turn AI into Your On-Demand Coding Guide

AI Researchs: Google’s Gemini Stream Realtime acts as a virtual coding tutor, offering immediate guidance and personalized feedback as you share and interact with your live coding environment.

Step-by-Step Guide:

Access the Feature:
Open Google AI Studio and select “Stream Realtime” from the main menu (marked by a microphone icon in the left sidebar).
Start Your Tutoring Session:
When prompted, choose “Share your screen” (instead of “Talk to Gemini” or “Show Gemini”) to let the AI view your coding environment. Confirm the window or screen you want to share, and ensure the “Stream is live” indicator appears.
Engage With the AI Tutor:
Ask the AI questions about lines of code, request help with creating files, or clarify complex concepts. It observes your screen and provides tailored, real-time feedback based on what it sees.
Get Contextual Assistance:
If you encounter errors or confusing syntax, simply point them out. The AI will highlight issues, explain solutions, and suggest improvements as you code.
Build Confidence Gradually:
Start with basic tasks, then progress to more complex projects as you grow comfortable. This ensures a strong foundation and continuous skill development.

This isn’t traditional business news

Welcome to Morning Brew—the free newsletter designed to keep you in the know on the business news impacting your career, company, and life—in a way you didn’t know you needed.

Note: this isn’t traditional business news. Morning Brew’s approach cuts through the noise and bore of classic business media, opting for short writeups, witty jokes, and above all—presenting the facts.

Save time, actually enjoy business news, and join over 4 million professionals reading daily.

Check it out

AI Research Roundup

Google DeepMind debuts FACTS Grounding, a benchmark using 1,719 examples to test AI models on producing fact-based, grounded responses, with evaluations conducted by leading AI systems and Gemini 2.0 currently topping the public leaderboard with an 83.6% score.

Anthropic published research revealing that their AI models can exhibit "alignment faking," where the models appear to adapt to new training but secretly retain their original preferences.

AI Pointers

Sam Altman says the term AGI gets less useful as we approach it but the o1 model seems close and by the end of 2025 we will AI systems that "can do truly astonishing cognitive tasks" where it will seem smarter than humans at a lot of hard problems.

🎬 Kling AI v1.6 – An updated version of the popular AI video generator, now featuring enhanced prompt adherence, professional modes, and additional upgrades.
🌟 Gemini 2.0 Flash Thinking – Google DeepMind’s latest reasoning model, free to try and positioned as a competitor to ChatGPT o1.
🔷 Backflip AI – Transforms text into 3D AI-generated designs effortlessly.
🛡️ Meta Video Seal – An open-source tool that embeds invisible, durable watermarks into videos to ensure authenticity.
🏃‍♂️ Impakt AI App – An AI-powered fitness coach that observes, communicates, and guides your workouts to help achieve your personal fitness goals.
💻 Tempo Labs – An AI-enhanced visual editor for React, enabling seamless collaboration between PMs, designers, and engineers on code.

Google DeepMind announced a strategic partnership with Apptronik, known for developing NASA’s Valkyrie Robot and the Apollo humanoid, to create versatile humanoid robots by integrating AI and robotics expertise.

Midjourney introduced Moodboards, a feature that lets users design personalized AI generation styles and profiles by uploading or adding images.

Google unveiled Gemini Code Assist tools, allowing developers to integrate external services and data directly into their IDE.

AI search startup Perplexity raises $500M, tripling its valuation to $9B as it challenges traditional search engines.

Anthropic released best practices for building AI agents, advocating for simple, composable patterns over complex frameworks and sharing insights from customer use cases across industries.

Meta teased Llama 4 features, including speech capabilities and advanced reasoning, along with plans for AI agents aimed at customer support and commerce by 2025.

Microsoft AI announced the rollout of Copilot Vision — an AI companion that can interact with your browser in real-time — for U.S. Copilot Pro users on Windows.

OpenAI expanded ChatGPT's app integrations to support more coding platforms, such as JetBrains IDEs, terminals, and productivity tools like Apple Notes and Notion.

Google unveils Veo 2 for video generation and Imagen 3 for images, both setting new standards for realism and quality.