AI in 2025

Sean Horgan

This doc is a compilation of rough notes as I review 2024 and think about the world of AI in 2025.

Uncategorized references

⁠

state-of-data-ai-report.pdf⁠

⁠

https://www.sciencedirect.com/science/article/pii/S0010482524012435⁠

⁠

https://youtu.be/J1HruOyuRBE?si=NBEgU46jnccZA9ET⁠

⁠

The 6 key AI variables to watch⁠

⁠

Unlocking the power of time-series data with multimodal models⁠

⁠

What just happened⁠

⁠

Tools will increasingly be built for AI-powered agents

Note: I’m going to follow Simon Willison’s definition of agent which is an LLM with access to tools that work to achieve a goal.

For most of the last 20+ years, we’ve focused on building tools for the human end-users (e.g. websites, apps) or for other human engineers who are building tools for others (e.g. operating systems, APIs). With the emergence of more independent reasoning capabilities in LLMs, there will be a material shift toward building tools for LLMs to use that will require some real changes, e.g. new purchasing flows for autonomous agents.

References

⁠

https://huyenchip.com//2025/01/07/agents.html⁠

⁠

https://simonwillison.net/2025/Jan/10/ai-predictions/⁠

⁠

https://cedricchee.com/blog/the-dna-of-ai-agents/⁠

⁠

https://www.anthropic.com/research/building-effective-agents⁠

⁠

The AI pricing wars will put some companies out of business

Costs are all over the place. Model training costs are going down in some areas as small models gain traction. But overall spend is probably going up.

Tech companies without a cloud offering, e.g. Apple, Facebook, will try to commoditize the cloud offerings wherever they can.

Google is in a unique position with Deepmind, Google Cloud, and Isomorphic. It’s not clear yet how that position will manifest into a durable advantage but it’s a hard combo to follow.

VCs and big tech have largely subsidized the current generation of AI solutions and that probably won't last forever.

From

https://simonwillison.net/2024/Dec/4/amazon-nova/⁠

⁠

References

⁠

Tomasz Tunguz on LinkedIn: 77% of enterprise AI usage are using models that are small models, less… | 16 comments⁠

⁠

More numbers here in

GitHub - ray-project/llm-numbers: Numbers every LLM developer should know⁠

⁠

https://simonwillison.net/2024/Dec/4/amazon-nova/⁠

⁠

Mergers & consolidation will accelerate as companies realize their limits

This current phase of development in AI can be widely characterized as divergent. Many more people can now create their own AI-powered solutions because AI is lowering the barriers, leading to a proliferation of new companies with competing value propositions.

Examples of this in previous eras and domains

Mobile. Think back when Blackberry, Nokia, Ericsson, and Siemens ruled mobile. That now feels like the stone age. There was a time in the early 2000s, pre iphone, that these companies couldn’t be beat.

Going way back...horses used for transportation are replaced by ICE powered vehicles. Uneven emergence of mass transit but many older cities are constrained by street design that hasn’t evolved since the times when horses & blacksmiths dominated the streets.

Healthcare...

Will h/w advances dwarf s/w concerns, similar to Moore’s Law and the CISC/RISC tradeoffs in the 1990s/2000s?

Will a first-mover advantage (e.g. MS-DOS/Windows) be critical as it’s difficult for other layers of the stack to change?

Where in the value chain is this most likely going to take place? OpenAI appears to be gaining traction as the s/w layer between AI models and AI applications.

However, AI itself is lowering the barrier of entry in software development. AI could make it easier to port applications from one stack to another, e.g. OpenAI to Claude, LLMs?

What are the parallels to mobile app development in the 2010s where companies looked to solutions like PhoneGap to build apps across iOS and Android. Or Facebook’s bet on HTML5 which is now widely considered a mistake.

Where does NVIDIA’s CUDA fit in? How wide of a moat does CUDA provide NVIDIA?

⁠

https://huggingface.co/spaces/reach-vb/2024-ai-timeline⁠

⁠

References

⁠

https://semianalysis.com/2023/05/04/google-we-have-no-moat-and-neither/⁠

⁠

Winners will continue to be attacked on all fronts

NVIDIA’s dominance in hardware (GPUs) and software (CUDA)

Home | AI Alliance⁠

⁠

Challengers Are Coming for Nvidia’s Crown⁠

⁠

In response, will NVIDIA increase its compete more directly public clouds like AWS?

⁠

https://www.theinformation.com/articles/nvidia-says-it-could-build-a-cloud-business-rivaling-aws-is-that-possible⁠

AWS, Google and Microsoft are developing their own chips, but also probably account for ~50% of NVIDIA’s revenues (if reporting is accurate)

Governments will spend real time and money trying to get a handle on AI, often with nationalistic aims

⁠

https://bsky.app/profile/thomwolf.bsky.social/post/3leu7ve32mk2u⁠

⁠

From

https://github.com/trending⁠

on 3 Jan 2025:

⁠

References

⁠

India Stack⁠

⁠

Talent & leadership will be real constraints on AI adoption

As technology barriers and costs continue to drop and pressure to put AI to work mounts from boardrooms and investors, many companies will struggle to find people to implement new programs.

From

https://simonwillison.net/2024/Dec/9/llama-33-70b/⁠

, it’s becoming trivial to run insanely sophisticated models on consumer-grade hardware. For example:

Using

https://ollama.com/⁠

, download their package and simply run

ollama run llama3.2

I have an M4 Macbook with just 16GB of RAM so I needed to focus on smaller models.

Data scaling laws favor companies that can amortize training and inference costs

From

https://semianalysis.com/2024/12/11/scaling-laws-o1-pro-architecture-reasoning-training-infrastructure-orion-and-claude-3-5-opus-failures/⁠

⁠

Chinchilla scaling refers to the optimal increases in data versus parameter counts relative to increases in compute. Not enough data causes the model to generalize poorly, while too much data results in overtraining, which wastes compute resources. There are some instances where deviating from the optimal ratio makes sense: over-training models (e.g. GPT-4o and Llama) can decrease inference costs significantly and is preferrable for providers that have a larger user base to serve said model to.

A key point here is that the costs companies are willing to spend on training are directly tied to the value they are capturing on inference. For example, a company like OpenAI or Google can afford to invest heavily in training that reduces inference costs as they directly benefit from those lower serving costs AND they can offer cheaper model services.

Another key factor will be

Jevons paradox⁠

which occurs when the increasing efficiency of resource consumption leads to increased consumption:

⁠

References

⁠

https://lifearchitect.ai/chinchilla/⁠

⁠

Training Compute-Optimal Large Language Models⁠

⁠

https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf⁠

⁠

The shift toward inference over training will shake up the current AI stacks

It appears that NVIDIA’s stack is optimized for training, not for inference. As AI focus shifts toward inference, is there an opportunity for new stacks to take hold? Where does that leave companies like Cerebras?

⁠

https://www.marketwatch.com/story/cerebras-ipo-filing-points-to-a-recurring-concern-in-ai-customer-concentration-ca849a7d?utm_source=chatgpt.com⁠

⁠

From

https://www.anyscale.com/blog/continuous-batching-llm-inference⁠

: this means that LLM inference throughput is largely determined by how large a batch you can fit into high-bandwidth GPU memory. See

this page⁠

in the NVIDIA docs for more details.

As inference demands increase, models will be subsumed further into compound systems. This will be driven by first principles, e.g. prompts and sampling methods are intrinsically connected to specific models, and commercial interests, e.g. Databricks and Snowflake aim to build unique value propositions around open source AI capabilities.

From

The Shift from Models to Compound AI Systems⁠

, a Compound AI System as a system that tackles AI tasks using multiple interacting components, including multiple calls to models, retrievers, or external tools. In contrast, an AI Model is simply a statistical model, e.g., a Transformer that predicts the next token in text.

⁠