Agents

Recently, I have been putting more time into building my voice-recognition software called alzo. It uses a SpeechRecognition library which allows modular selection of using a webapi or running a local STT model. STT models as an underlying statistical model, nowadays the state of the art being converting audio to a spectogram, converting spectogram to words, and “cleaning” it up with a Transformer model to generate an output.

The reason I brought my software up is to discuss how one would talk and classify agents in the current field of discourse. I, as many others agree that there is in some sense no one really knows what they mean when they use the term “agent” in a discussion, or rather because there are so many different understandings, using the term isn’t all that helpful. Even before when people were caring about the distinctions between “Artificial Iintelligence”, “Machine Learning”, and “Deep Learning”. The other reason being that I’m thinking of integrating Langchain to see if there is any benefit too it.

Agents vs Models

The distinction I’m willing to commit to between Machine/Deep Learning and AI is a functional distinction. Machine and Deep Learning are thoought of as modelling, you would interact with them by calling some pred(X) function provided by a model library whether that be linear regression or a neural network. Of course what distinguishes machine and deep learning is if the model is a neural network or a “traditional” stats model. Agents in this sense utilizing a model, but doing something extra based on the outputs, whether that is running tools or re-prompting.

Agentic Frameworks

The question I have then is what it means for some AI company to deploy “Agents” rather than models? From my understanding, things like “Skills” or “Tools” are fed into a LLM as part of its context. If the output were to contain a “Tool call”, which is some special formatting for the software to parse and call code, then that is what enables “Agentic” behaviour. So the initial context passed into the LLM is essentially prompting it to “think” about what tools or skills it would need to execute. The software then runs the tool and then prompts an LLM again to then actually generate a response¹.

Reflecting on how you would go about implementing an agentic framework, it seems relatively simple. You associate the string representation i.e. the function name with the function pointer and when the parser recognizes the format it can search through the tool names to find the correct function. In an interpreted language it might be easier as functions themselves automatically contain metadata about it’s name.

An interesting thought being they would have had to train the models to know how to use tools right? ↩