TL;DR - This is the first post in a hands-on series on building your own AI agents. We break down how LLMs work, what makes AI agents powerful, and where their current limitations lie (e.g., hallucinations, cost, slowness, weak math). You’ll learn why agents are ideal for complex, multi-step tasks in finance-and what’s needed to design them, from orchestration to tool integration. In upcoming parts, we’ll guide you step-by-step through building real agents.
We’re adding another team member of QuantJourney - Andy who will focus on AI in Quant Journey - he is Computer Scientist and has 30+ years in IT, an expert in AI architectures and AI agentic systems. He designs practical AI tools and frameworks, currently focusing on AI-driven solutions for financial workflows.
FREE WEBCAST ON AI TOOLS - if you want to learn how to use AI Tools - primary Cursor or Codex to speed up development of your trading algorithms - we have FREE session led by Andy - next week 25th of June - just sign at: https://us02web.zoom.us/meeting/register/7TYqicZqSFSZ9z7Md7mDEg#/registration
Let’s start
Imagine this: it's 3 AM, and while you're sound asleep, your AI agent is diligently reviewing your stock portfolio. It meticulously analyzes market data (OHLVC records), scrutinizes various computed indicators, identifies technical patterns, and cross-references news related to each company. It might even assess the performance of the entire sector and relevant news. It researches sectors you've pre-selected as areas of interest, suggesting alternative stock options. By morning, you wake up to a comprehensive report on your portfolio, complete with actionable suggestions. All of this is accomplished autonomously, without your intervention, and by applying methods you don't need to fully grasp.
This isn't a futuristic fantasy - it's the reality of what well-constructed AI agents are capable of today. But before you create your trading bots making you rich overnight, let's clarify what AI agents truly are, what they are not, and why they should be on your radar in 2025.
In this post, we:
Introduce LLMs and AI in general.
Discuss what AI agents truly are.
Explain the limitations of this technology.
Present how to build AI agents and explain challenges involved in building them.
Understanding AI (LLMs)
If you already know this part you can safely skip ahead.
Before diving into agents, let's first understand AI models, specifically Large Language Models (LLMs). You're likely familiar with OpenAI's ChatGPT, Anthropic's Claude, or Google's Gemini. Most of them are chatbot with message-response interaction - but what exactly happens behind the scenes?
Fundamentally, LLMs are massive neural networks trained on vast amounts of human-derived text data. This training enables them to learn patterns - probabilistic relationships between chunks of characters called tokens - and generate meaningful responses based on these patterns. Through exposure to trillions of tokens, they develop an internal representation of language, concepts, and their relationships. While we understand the general principle, the exact reason why increasing network size produces intelligent-seeming responses remains partially mysterious. This gap in understanding has spawned "mechanistic interpretability," a new field dedicated to decoding what happens inside these models.
LLMs operate in query-response mode: they receive a prompt (text broken down into tokens and converted into numerical values), process it through billions of simulated neurons (called sometimes parameters), and generate a response (predicted tokens converted back to words). Each interaction is independent - the model receives input, processes it, and produces output without any inherent state between calls. Crucially, LLMs have no memory and don't learn during use unlike biological neural networks our brains are made of.
The chatbots we interact with create an illusion of memory through a simple but effective trick: they include the conversation history with each new prompt. When you type a message, the system actually sends your entire conversation so far (or as much as fits) along with your new message. This context allows the model to respond coherently as if it "remembers" what you discussed. However, this memory is limited by the model's context window - the maximum amount of text it can process at once. Modern models can handle 100,000+ tokens, but eventually, older parts of the conversation must be truncated - or summarized.
Another important fact to understand is that models lack data types - they process tables of numbers and poetry identically. Everything becomes tokens, processed through the same neural pathways. Think of all input as being flattened into a text file (possibly with Markdown formatting for structure), vectorized, processed, and returned as structured text (usually with Markdown formatting). This unified approach is both a strength and limitation: LLMs excel at recognizing and reproducing patterns across any text-representable domain, but they don't inherently understand that numbers in a financial statement require different handling than words in a poem.
For financial applications, note that training data for leading LLMs isn't finance-focused. While it includes financial articles, reports, and market data, these represent only a fraction of the total training corpus. Your typical LLM knows more about Shakespeare than S&P futures.
Specialized models can be trained for specific domains. Image models, for instance, train on visual data with text descriptions to link visual patterns to language. This enables them to reverse the process - produce images based on text input. Similarly, a model trained exclusively on trading data and financial reports would excel at finance tasks but struggle with general conversation. Such specialized models require integration with broader systems or general-purpose LLMs to feed them data that would make them practically useful.
Summary of this section:
LLMs are neural networks trained on vast text data, learning probabilistic relationships in this data to generate responses.
They operate in a query-response mode, processing prompts and producing output without inherent memory or learning during use.
Chatbots simulate memory by sending conversation history with each new prompt, limited by context window size.
LLMs process all input as tokens, lacking specific data type understanding, which is both a strength and a limitation.
General LLMs are not finance-focused, and specialized models for domains like finance require integration with broader systems to be practically useful.
What are AI Agents?
The fundamental difference between a simple chatbot and an AI agent lies in the latter's ability to drive its own actions through a series of internal Large Language Model (LLM) calls. Essentially, an AI agent engages in a simulated internal dialogue, autonomously deciding on the next steps to achieve the user's initial goal without continuous human intervention.
At its core, an agent is therefore an AI system that can:
Maintain context and state across multiple operations
Decide what to do next based on intermediate results
Use external function calls (tools) to fetch data, run calculations, or trigger actions
Work toward a goal through multiple steps without constant human intervention
Think of the difference this way: asking a chatbot "What's NVIDIA's P/E ratio?" gets you an answer based on training data, most likely outdated. But telling an agent "What is the current economic status of NVIDIA" sets off a chain of autonomous actions - the agent will fetch current data, calculate metrics, compare to historical ranges, check sector trends, read the news - and only come back to you when it has a complete answer.
This technical magic happens when we give LLMs the ability to generate not just responses for humans, but instructions for themselves or other systems. In an agentic system, the LLM's output drives the subsequent conversation and actions. The AI literally decides what to do next.
This is achieved through two key innovations:
Tool calling - modern LLMs can now reliably recognize when they need external data or capabilities, choose the appropriate tool (function), and provide the right parameters. When analyzing a stock, the agent might decide it needs current prices, so it calls fetch_market_data(). Then based on those results, it might determine it should calculate technical indicators, calling compute_RSI() or detect_patterns(). Each decision is made by the AI based on its understanding of the goal and intermediate results.
Multi-agent orchestration - we can create multiple "personalities" within the system by giving the same or different LLMs specific prompts and roles. One agent might be a fundamental analyst, another a technical analyst, and a third a risk manager. They can pass information between each other, critique each other's analysis, call tools and collectively work toward a solution. Think of it as assembling a virtual team where each member has specialized knowledge and perspectives.
In practice, when you interact with an agentic system, here's what happens under the hood:
Your request triggers the primary agent
It analyzes what information and actions are needed
It may call tools to get data and/or spawn other specialized agents
Each result informs its next decision, next step
The process continues until the goal is achieved or a final answer is produced
This autonomous decision-making and action-taking capability transforms AI from a sophisticated Q&A system into something that can actually work on your behalf - researching, analyzing, monitoring, and alerting based on complex, multi-step workflows that would typically require human intervention at each stage.
Early AI agents followed pre-determined workflows based on established patterns. However, current architectures have evolved significantly. Modern systems can spawn new agents dynamically, create tools on-the-fly by executing generated code, and even learn from their work by adjusting their own prompts and maintaining long-term memory.
In essence, “agents” are sophisticated orchestrations of glue code, carefully crafted prompts, and available tools that enable AI to autonomously decide on actions and shape outputs based on the meaning and interpretation of data encountered along the way.
Limitations
While this technology is able to work wonders there are some limitations and challenges to keep in mind when we want to use it in an investing context.
First, as it was discussed above LLMs excel at processing natural language (e.g., news reports, company reports) and structured data (e.g., JSON-wrapped tool outputs with financial data, code, structured tables). They can comprehend meaning, interpret numerical data, and identify patterns in data streams. Their ability to call tools - deciding which to use and what parameters to provide - is also quite sophisticated. However, LLMs are not adept at calculations. This is so because, despite being computational systems, they generate rather than calculate.
So, when we design an agentic system we can not rely on LLMs to perform correct calculations of indicators quants rely on. These calculations must be performed using traditional programming methods - through developing our own code, by using specialized libraries like QuantJourney or by calling a trusted external APIs like the upcoming QuantJourney API.
The difference is, of course, that it will be AI not human, deciding it needs to look at an indicator and interpreting its value.
Secondly, LLMs - and therefore agents - do not possess ongoing, steady awareness. In other words, despite their name, agents are not capable of being observers, aware but waiting to act. The agentic workflow is always triggered - either manually by a human asking for something or algorithmically, from code, either in response to an event or at a predetermined interval.
Therefore, agentic systems don't replace traditional event-triggered market responses. Instead, they enhance these systems by adding intelligent, adaptive capabilities to what would otherwise be purely algorithmic reactions.
Thirdly, agentic systems are relatively slow - since each agentic workflow involves a number of calls to underlying LLM with increasing amount of data (as the system gathers new data from tools and its own internal dialogue gets longer) the time it takes them to respond will be counted not in seconds but in minutes, sometimes more than 10-15 minutes for more complex tasks.
Fourthly, while they are reasonably proficient at coding LLMs frequently make errors. Therefore, LLMs are best suited for interpreting data and indicators, requesting the calculation of indicators and indices, and even formulating strategies for backtests and interpreting their results. They should not, however, perform calculations or write ad-hoc SQL queries, as the potential for error is too significant.
Fifth, and perhaps most critically, AI agents can hallucinate - generating plausible-sounding but entirely fictitious information. While LLMs have improved dramatically, they can still confidently assert false facts, misinterpret data, or create non-existent correlations. In financial contexts, this poses serious risks. An agent might 'remember' an earnings report that never existed, cite fictional regulatory changes, or misinterpret technical patterns. This is why validation layers and sanity checks are essential - never trust an agent's analysis without verification, especially when real money is at stake. The agent should augment human decision-making, not replace it entirely - at least not yet.
And, finally, agentic systems can be expensive. Since each run requires sending and obtaining large amounts of tokens in subsequent calls - and APIs we use are billed per token - each run might cost not cents, but dollars. If, for example, the portfolio review I outlined at the beginning would cost $20 to run it still seems quite cheap - however, if you run it just every hour while the US markets are open the monthly cost will be around $2700. Of course, costs vary significantly based on model choice, task complexity, and optimization strategies. And for larger deployments using self-hosted models becomes economically viable.
The Implementation Challenge
Now that you understand what AI agents are and their limitations, you might be eager to build that portfolio analyzer from the introduction. But hold on - creating effective AI agents isn't just about stringing together LLM calls and hoping for the best.
The real challenges in building AI agents are:
Finding the right use case - Not every problem benefits from an agentic approach. Simple queries don't need agents, and ultra-high-frequency trading can't afford their latency.
In general there two major benefits which are not mutually exclusive:
Repeatable workflows - like the portfolio evaluation - that AI agent can do much faster than human analyst - key benefit here is speed
Bridging the knowledge gap - the AI agent leverages LLMs knowledge and intelligence allowing the human users to benefit from knowledge they do not have - a good example here would be interpretation of technical patterns or quant indicators
The sweet spot lies therefore in relatively complex, multi-step analytical tasks that require judgment and interpretation - like comprehensive portfolio reviews, market research, or strategy backtesting analysis.
Orchestration that works - You need a system that's flexible enough to handle unexpected scenarios yet predictable enough to trust with your money. How do you ensure your agent doesn't go rogue, making hundreds of unnecessary API calls or getting stuck in loops?
Cost-performance balance - With each run potentially costing dollars, not cents, you need to optimize token usage without sacrificing analysis quality. This means careful prompt engineering, smart context management, and knowing when to summarize versus when to preserve detail. Possibly it would also involve using different LLM providers in the same agentic workflow or even using locally hosted open-source or customized models for some tasks.
Tool integration at scale - Your agent needs access to market data, news feeds, calculation engines, maybe even execution systems. But give an LLM too many tools and it gets confused. Too few, and it can't do its job. Finding the right balance and presenting tools clearly is crucial.
Architectural simplicity - It's tempting to create dozens of specialized agents - one for technicals, one for fundamentals, one for sentiment, and so on - and stacking them in layers. But complexity is not always the right answer. The best systems often use just a handful of well-designed agents with clear responsibilities.
These aren't insurmountable problems - they're engineering challenges which already have some proven solutions despite this being a very new field.
Next steps
In Part 2, we'll explain the frameworks that make building robust AI agents much easier. We'll explore CrewAI's intuitive approach, LangChain's comprehensive toolkit, and especially the AG2 (formerly known as AutoGen), which I've found strikes the best balance between power and simplicity for financial applications.
Please join Andy’s webcast:
FREE WEBCAST ON AI TOOLS - if you want to learn how to use AI Tools - primary Cursor or Codex to speed up development of your trading algorithms - we have FREE session led by Andy - next week 25th of June - just sign at: https://us02web.zoom.us/meeting/register/7TYqicZqSFSZ9z7Md7mDEg#/registration
Happy trading,
Andy & Jakub