Building an AI Research Assistant with LLM Tool Calling

Building an AI Research Assistant with LLM Tool Calling
Large Language Models (LLMs) have revolutionized how we interact with artificial intelligence, but their true power is unlocked when they can use external tools. In this tutorial, we'll build a practical AI research assistant that helps students search and summarize academic papers from arXiv using Python and Ollama's LLM with tool-calling capabilities.
📁 Full Project Repository
You can access the complete source code for this project on GitLab: AI Research Companion Repository

What We're Building

Our chatbot will serve as an intelligent research companion capable of searching arXiv—an open-access repository for scholarly preprints in fields like physics, mathematics, computer science, and economics. The assistant will not only find relevant papers but also extract and summarize detailed information about specific research articles.

Setting Up the Development Environment

Before we begin coding, we need to set up a proper Python environment with all necessary dependencies.

Creating a Python Virtual Environment

We'll use Python 3.13.2 for this project. Virtual environments help isolate project dependencies and prevent conflicts with system-wide packages.

# Create a new virtual environment
python -m venv venv

# Activate the environment (Windows)
.\venv\Scripts\activate

# For Linux/Mac, use:
# source venv/bin/activate

Installing Required Libraries

Create a requirements.txt file in your project directory with the following dependencies:

arxiv==2.2.0
PyPDF2==3.0.1
python-dotenv==1.1.1
typing==3.7.4.3
ollama==0.6.0

Install all dependencies with a single command:

pip install -r requirements.txt
💡 Pro Tip:
Whenever you install new libraries using pip, update your requirements.txt file automatically with:
pip freeze > requirements.txt

Installing and Configuring Ollama

We're using Ollama with the gpt-oss:20b model because it can run locally on your machine, providing privacy and eliminating API costs. Follow these steps to set it up:

  1. Visit https://ollama.com/download and install Ollama for your operating system
  2. Verify the installation:
ollama --version
  1. Download the language model:
ollama pull gpt-oss:20b
  1. Verify the model is downloaded:
ollama list
⚠️ Important:
If Ollama isn't running automatically, start the server manually with:
ollama serve

Building the Core Functionality

Now we'll create the functions that our LLM will use as tools to search and retrieve paper information.

1. The Paper Search Function

This function searches arXiv for papers matching a given topic and stores their information locally for future reference.

def search_papers(topic: str, max_results: int = 5) -> List[str]:
    """
    Search for papers on arXiv based on a topic and store their information.
    
    Args:
        topic: The topic to search for
        max_results: Maximum number of results to retrieve (default: 5)
    
    Returns:
        List of paper IDs found in the search
    """
    # Initialize the arXiv client
    client = arxiv.Client()
    
    # Configure search parameters
    search = arxiv.Search(
        query = topic,
        max_results = max_results,
        sort_by = arxiv.SortCriterion.Relevance
    )
    
    # Execute the search
    papers = client.results(search)
    
    # Load existing papers information
    papers_info = utils.load_papers_info(topic)
    
    # Process each paper and build metadata
    paper_ids = []
    for paper in papers:
        paper_ids.append(paper.get_short_id())
        paper_info = {
            'title': paper.title,
            'authors': [author.name for author in paper.authors],
            'summary': paper.summary,
            'pdf_url': paper.pdf_url,
            'published': str(paper.published.date())
        }
        papers_info[paper.get_short_id()] = paper_info
    
    # Save the updated information to a JSON file
    file_path = utils.save_papers_info(topic, papers_info)
    
    return paper_ids

This function performs several important tasks: it queries arXiv for relevant papers, extracts key metadata (title, authors, summary, publication date, and PDF URL), stores this information in a structured JSON format, and returns a list of paper IDs for further processing.

2. The Information Extraction Function

This function retrieves detailed information about a specific paper using its unique ID.

def extract_info(paper_id: str) -> str:
    """
    Search for information about a specific paper across all topic directories.
    
    Args:
        paper_id: The ID of the paper to look for
    
    Returns:
        JSON string with paper information if found, error message if not found
    """
    paper_info = utils.find_paper_info_by_id(paper_id)
    
    if paper_info:
        return json.dumps(paper_info, indent=2)
    else:
        return f"There's no saved information related to paper {paper_id}."

3. Tool Execution Framework

To allow the LLM to call these functions dynamically, we need to create a mapping system and an execution function.

# Map tool names to their corresponding functions
mapping_tool_function = {
    "search_papers": search_papers,
    "extract_info": extract_info
}

def execute_tool(tool_name, tool_args):
    """
    Execute a tool function by name with provided arguments.
    
    Args:
        tool_name: Name of the tool to execute
        tool_args: Dictionary of arguments to pass to the tool
    
    Returns:
        String representation of the tool's result
    """
    result = mapping_tool_function[tool_name](**tool_args)
    
    # Handle different return types
    if result is None:
        result = "The operation completed but didn't return any results."
    elif isinstance(result, list):
        result = ', '.join(result)
    elif isinstance(result, dict):
        result = json.dumps(result, indent=2)
    else:
        result = str(result)
    
    return result

The execute_tool function acts as a dispatcher that calls the appropriate function based on the tool name provided by the LLM, ensuring that the results are properly formatted as strings for the LLM to process.

Defining the Tool Schema

For the LLM to understand what tools are available and how to use them, we need to provide a detailed schema in JSON format. This schema describes each tool's name, purpose, parameters, and requirements.

[
    {
        "function": {
            "name": "search_papers",
            "description": "Search for papers on arXiv based on a topic and store their information.",
            "parameters": {
                "type": "object",
                "properties": {
                    "topic": {
                        "type": "string",
                        "description": "The topic to search for"
                    },
                    "max_results": {
                        "type": "integer",
                        "description": "Maximum number of results to retrieve",
                        "default": 5
                    }
                },
                "required": ["topic"]
            }
        },
        "type": "function"
    },
    {
        "function": {
            "name": "extract_info",
            "description": "Search for information about a specific paper across all topic directories.",
            "parameters": {
                "type": "object",
                "properties": {
                    "paper_id": {
                        "type": "string",
                        "description": "The ID of the paper to look for"
                    }
                },
                "required": ["paper_id"]
            }
        },
        "type": "function"
    }
]
Understanding Tool Schema Components:
  • name: The exact function name the LLM should reference
  • description: A clear explanation helping the LLM understand when to use this tool
  • parameters: Defines the input structure, including parameter types, descriptions, default values, and which parameters are required

Implementing the Chatbot

Now we'll bring everything together to create an interactive chatbot that can engage in conversations, decide when to use tools, and provide intelligent responses.

from dotenv import load_dotenv
import ollama
import os
from ai_core.tools.loader import load_tools
from ai_core.core import execute_tool

load_dotenv()

# Define the system prompt to guide the LLM's behavior
system_message = {
    "role": "system",
    "content": """You are a helpful research assistant that finds and summarizes 
academic papers from arXiv.

You have access to two tools:
1. search_papers(topic: str, max_results: int=5): Search for papers on a given topic.
2. extract_info(paper_id: str): Get detailed information about a specific paper.

Behavior rules:
- Start by greeting the user and explaining what you can do.
- When a topic is given, call 'search_papers' with that topic.
- After receiving results, call 'extract_info' for each paper ID you want to summarize.
- Finally, summarize the findings clearly (paper ID, titles, authors, focus, and date)."""
}

messages = []

def process_query(query):
    """Process a user query and handle tool calls."""
    messages.append({'role': 'user', 'content': query})
    
    # Load available tools
    tools = load_tools("search_papers", "extract_info")
    
    # Get initial response from LLM
    response = ollama.chat(model='gpt-oss:20b', messages=messages, tools=tools)
    
    process_query = True
    
    # Loop until we get a final text response
    while process_query:
        if response.message:
            # Check if LLM wants to call a tool
            if response.message.role == 'assistant' and response.message.tool_calls:
                tool_call = response.message.tool_calls[0]
                tool_args = tool_call.function.arguments
                tool_name = tool_call.function.name
                
                # Record the tool call in message history
                messages.append({
                    "role": "assistant",
                    "content": response.message.content,
                    "tool_calls": [{
                        "type": "function",
                        "function": {"name": tool_name, "arguments": tool_args}
                    }]
                })
                
                # Execute the tool
                result = execute_tool(tool_name, tool_args)
                
                # Add tool result to message history
                messages.append({
                    "role": "tool",
                    "name": tool_name,
                    "content": str(result)
                })
                
                # Get LLM's response to the tool result
                response = ollama.chat(
                    model='gpt-oss:20b',
                    tools=tools,
                    messages=messages
                )
            
            # If no more tool calls, print final response
            if response.message and not response.message.tool_calls:
                print(response.message.content)
                process_query = False
        else:
            print(response.message.content)
            if not response.message.tool_calls:
                process_query = False

Understanding the Message Flow

The chatbot uses a message-based system where each message has a specific role:

  • user: Messages from the human user containing queries or requests
  • system: Hardcoded instructions that provide context and behavioral guidelines to the LLM
  • assistant: Responses generated by the LLM, which may include tool call requests
  • tool: Return values from executed tools that the LLM needs to process

Creating the Interactive Loop

To enable continuous conversation, we implement a chat loop that repeatedly processes user queries.

def chat_loop():
    """Main chat loop for continuous interaction."""
    print("Type your queries or 'quit' to exit.")
    messages.clear()
    messages.append(system_message)
    
    while True:
        try:
            query = input("\nQuery: ").strip()
            if query.lower() == 'quit':
                break
            process_query(query)
            print("\n")
        except Exception as e:
            print(f"\nError: {str(e)}")

Example Conversation

Here's what a typical interaction with the chatbot looks like:

Type your queries or 'quit' to exit.

Query: research paper related to machine learning

Hello! 👋 I'm here to help you find and summarize research papers from arXiv 
on any topic you're interested in. Just let me know a keyword or a more specific 
subject area (e.g., "deep learning for medical imaging," "reinforcement learning 
algorithms," etc.) and I'll pull up the latest papers for you.

For now, I ran a quick search for "machine learning" and got the following five 
paper IDs:
- 1909.03550
- 1811.04422
- 1707.04849
- 1909.09246
- 2301.09753

I'm currently unable to retrieve the full metadata for these papers automatically, 
so if you'd like a detailed summary (title, authors, focus, date, etc.) I'll need 
a bit more information from you.

Would you like me to:
1. **Look up these specific papers one by one** (you can tell me which IDs interest 
   you most), or
2. **Refine the search** to a narrower topic or sub‑field within machine learning?

Just let me know how you'd like to proceed!

Query: give me details of the paper 2301.09753

Hello! I'm here to help you find, retrieve, and summarize academic papers from arXiv. 
Let me pull up the details for the paper you requested.

---

### 📄 Paper Details

| Field | Information |
|-------|-------------|
| **arXiv ID** | `2301.09753v1` |
| **Title** | *Towards Modular Machine Learning Solution Development: Benefits and Trade‑offs* |
| **Authors** | Samiyuru Menik, Lakshmish Ramaswamy |
| **Publication Date** | January 23, 2023 |
| **PDF Link** | [Download PDF](http://arxiv.org/pdf/2301.09753v1) |

### 🔍 Focus & Key Takeaways

- **Problem Addressed**: The high cost and low reusability of custom, monolithic 
  machine‑learning (ML) pipelines in industry.
- **Core Idea**: Introduce *modular* ML solutions—composable, reusable components 
  that can be assembled for specific business problems.
- **Benefits Explored**:
  1. **Engineering Efficiency**: Faster iteration, easier debugging, and clearer 
     separation of concerns.
  2. **Cost Reduction**: Reusing existing modules lowers the need to rebuild 
     pipelines from scratch.
  3. **Performance Gains**: Modules can be tuned or swapped independently, 
     potentially improving accuracy or runtime.
  4. **Data Advantages**: Shared modules enable better data sharing and reuse 
     across projects.
- **Trade‑offs**:
  - Potential overhead in designing well‑defined interfaces.
  - Risk of under‑performance if modules are not perfectly aligned.
  - Integration complexity when combining heterogeneous modules.
- **Experimental Validation**: The authors tested the approach on three deep‑learning 
  tasks (one NLP and two computer‑vision problems), demonstrating that modular 
  pipelines matched or exceeded monolithic ones in accuracy while offering 
  engineering benefits.

### 📌 Summary

This paper argues that moving away from monolithic ML systems toward modular, 
component‑based architectures can significantly lower development costs and 
improve system maintainability without sacrificing performance. It provides both 
conceptual motivation and empirical evidence, making it a valuable read for data 
scientists, ML engineers, and product managers looking to scale ML solutions 
across an organization.

---

Let me know if you'd like a deeper dive into any particular section, a comparison 
with related works, or help locating a different paper!

Query: quit

Important Considerations

🎯 Model Selection Matters
The quality of your chatbot's responses directly depends on the LLM model you choose. Not all models support tool calling effectively. Low-performance models may struggle to understand context and provide inaccurate responses, while high-performance models can often infer context from the tools themselves, sometimes making detailed system prompts unnecessary.
🤖 Manual vs. Agentic Tool Calling
In this implementation, we're calling tools manually—the LLM tells us which tool to call and with what parameters, but we write the code to execute the tool and return results. In contrast, Agentic AI systems can understand which tools to call and execute them autonomously without manual intervention, creating more sophisticated and autonomous workflows.

Conclusion

Building an AI research assistant with tool-calling capabilities demonstrates the powerful synergy between LLMs and external data sources. By combining Ollama's local LLM capabilities with the arXiv API, we've created a practical tool that can help students and researchers efficiently discover and analyze academic papers.

This project serves as a foundation that can be extended with additional tools such as PDF parsing, citation network analysis, or integration with other academic databases. The principles of tool schema definition, message handling, and iterative LLM interaction apply broadly to many AI assistant applications.

As you continue developing with LLMs, remember that the effectiveness of your assistant depends on three key factors: choosing an appropriate model, crafting clear tool descriptions, and designing intuitive conversation flows. With these elements in place, you can create AI assistants that are both intelligent and genuinely useful.

Happy coding, and may your AI assistant help unlock new research discoveries! 🚀

Comments

Popular posts from this blog

Building a Local RAG System with Ollama and Gemma: A Complete Guide - Part 1

FIX: Severity Code Description Project File Line Suppression State Error Web deployment task failed.

Building a Local RAG System with Ollama and Gemma: A Complete Guide - Part 2