1 Introduction to LLM Agents
This section covers the basics of what LLM agents are and how they differ from standard chat interfaces.
1.1 What is an LLM Agent?
Unlike a standard LLM chat (like OpenAI ChatGPT or Google Gemini, or Claude web interface), an agent is a system that uses an LLM as its “reasoning engine” to perform tasks by interacting with the world. Agent essentially adds hands to an LLM, allowing it to execute code, use tools, and access external information.
Most LLM chat interfaces today already have some tools (e.g. they can write and execute (mostly Python) code, or access a search engine). They can also be connected to external tools or data via plugins or APIs. However, they are still primarily designed for conversational interactions and may not be optimized for complex task execution or multi-step reasoning. Though there is nothing fundamentally preventing them from doing that in the future.
1.1.1 Key Characteristics
Tool Use: Agents can use tools such as search engines, calculators, or terminal commands.
Autonomy: They can break down a complex goal into smaller steps and execute them sequentially.
Feedback Loop: They can observe the output of a tool (e.g., a code error, a statistics model output, a plot) and adjust their next action accordingly.
1.2 Why Use Agents for Research?
In academic research, agents can:
Automate Data Cleaning: Write and run scripts to fix inconsistencies in large datasets.
Literature Review: Search, summarize, and synthesize findings from hundreds of papers.
Reproducible Analysis: Generate documented code that follows best practices.