You need way more than attention
Attention is all you need is the title of the famous paper that kicked off the work on large language models. While it’s remarkable how much a model can learn from training data alone (intuitively it still feels LLMs shouldn’t work as well as they do), from the way I use LLMs, it’s clear that to get something actually useful that’s not enough. Or better, it might be enough if all you want is to summarise or clean up some text, but for more interesting tasks you definitely want something more: tool calling and guardrails.
Tool calling opens up a lot of opportunities. Web search, for instance, is useful to reduce hallucinations by giving the model a chance of looking up information that it doesn't know. It also allows the model to have knowledge past its training cutoff point. Tool calling also opens up all the agentic workflows, which frankly are more interesting than having to deal with a chat interface.
As soon as you start using agentic AI you'll realise you need guardrails to prevent the models from doing dangerous things. You might have guessed that I am not the kind of person who uses --dangerously-skip-permissions. I don't always let agents write code for me but when I do you can be sure they have to run tests, formatters and linters.
I see both tools and guardrails as a way to drastically rein in the stochastic nature of the models and bring the workflow back into something more deterministic and understandable. I think this is highly desirable and I wouldn't feel very comfortable to let an agent loose without knowing that there is a system in place to catch its mistakes.
This brings us to the question of what's the best way forward to get more useful AI tools. From my experience, the model alone isn't enough. Increasing the number of parameters helps, but I think there is a ceiling to this approach. No amount of new parameters is going to solve hallucinations, and for sure, it's not going to give models knowledge past their cutoff point. This raises the question of when we will reach a model size that, combined with good tool calling and guardrails, will be good enough. I hope that's going to be really soon. And I hope that's going to be a size that can run on laptops and phones that normal people use. Local, relatively small models that make good use of deterministic tools are definitely more appealing to me than depending on an API call to get a stochastic answer.