Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»How to Keep MCPs Useful in Agentic Pipelines
    AI Tools

    How to Keep MCPs Useful in Agentic Pipelines

    AwaisBy AwaisJanuary 3, 2026No Comments10 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    How to Keep MCPs Useful in Agentic Pipelines
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Intro

    applications powered by Large Language Models (LLMs) require integration with external services, for example integration with Google Calendar to set up meetings or integration with PostgreSQL to get access to some data. 

    Function calling

    Initially these kinds of integrations were implemented through function calling: we were building some special functions that can be called by an LLM through some specific tokens (LLM was generating some special tokens to call the function, following patterns we defined), parsing and execution. To make it work we were implementing authorization and API calling methods for each of the tools. Importantly, we had to manage all the instructions for these tools to be called and build internal logic of these functions including default or user-specific parameters. But the hype around “AI” required fast, sometimes brute-force solutions to keep the pace, that is where MCPs were introduced by the Anthropic company. 

    MCPs

    MCP stands for Model Context Protocol and today it is a standard way of providing tools to the majority of the agentic pipelines. MCPs basically manage both integration functions and LLM instructions to use tools. At this point some may argue that Skills and Code execution that were also introduced by the Anthropic lately have killed MCPs, but in fact these features also tend to use MCPs for integration and instruction management (Code execution with MCP — Anthropic). Skills and Code execution are focused on the context management problem and tools orchestration, that is a different problem from what MCPs are focused on.

    MCPs provide a standard way to integrate different services (tools) with LLMs and also provide instructions LLMs use to call the tools. However, here are a couple of problems: 

    1. Current model context protocol supposes all the tool calling parameters to be exposed to the LLM, and all their values are supposed to be generated by the LLM. For example, that means the LLM has to generate user id value if function calling requires it. That is an overhead because the system, application knows user id value without the need for LLM to generate it, moreover to make LLM informed about the user id value we have to put it to the prompt (there is a “hiding arguments” approach in FastMCP from gofastmcp that is focused specifically on this problem, but I haven’t seen it in the original MCP implementation from Anthropic).
    2. No out-of-the-box control over instructions. MCPs provide description for each tool and description for each argument of a tool so these values are just used blindly in the agentic pipelines as an LLM API calling parameters. And the description are provided by the each separate MCP server developer.
    System prompt and tools

    When you are calling LLMs you usually provide tools to the LLM call as an API call parameter. The value of this parameter is retrieved from the MCP’s list_tools function that returns JSON schema for the tools it has.

    At the same time this “tools” parameter is used to put additional information to the model’s system prompt. For example, the Qwen3-VL model has chat_template that manages tools insertion to the system prompt the following way:

    “...You are provided with function signatures within  XML tags:\\n\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}...”

    So the tools descriptions end up in the system prompt of the LLM you are calling.

    The first problem is actually partially solved by the mentioned “hiding arguments” approach from the FastMCP, but still I saw some solutions where values like “user id” were pushed to the model’s system prompt to use it in the tool calling — it is just faster and much simpler to implement from the engineering point of view (actually no engineering required to just put it to the system prompt and rely on a LLM to use it). So here I am focused on the second problem.

    At the same time I am leaving aside the problems related to tons of rubbish MCPs on the market — some of them do not work, some have generated tools description that can be confusing to the model. The problem I focus here on — non-standardised tools and their parameter descriptions that can be the reason why LLMs misbehave with some tools.

    Instead of the conclusion for the introduction part:

    If your agentic LLM-powered pipeline fails with the tools you have, you can:

    1. Just choose a more powerful, modern and expensive LLM API;
    2. Revisit your tools and the instructions overall.

    Both can work. Make your decision or ask your AI-assistant to make a decision for you…

    Formal part of the work — research

    1. Examples of different descriptions

    Based on the search through the real MCPs on the market, checking their tools lists and the descriptions, I could find many examples of the mentioned issue. Here I am providing just a single example from two different MCPs that have different domains as well (in the real life cases the list of MCPs a model uses tend to have different domains):

    Example 1: 

    Tool description: “Generate a area chart to show data trends under continuous independent variables and observe the overall data trend, such as, displacement = velocity (average or instantaneous) × time: s = v × t. If the x-axis is time (t) and the y-axis is velocity (v) at each moment, an area chart allows you to observe the trend of velocity over time and infer the distance traveled by the area’s size.”,

    “Data” property description: “Data for area chart, it should be an array of objects, each object contains a `time` field and a `value` field, such as, [{ time: ‘2015’, value: 23 }, { time: ‘2016’, value: 32 }], when stacking is needed for area, the data should contain a `group` field, such as, [{ time: ‘2015’, value: 23, group: ‘A’ }, { time: ‘2015’, value: 32, group: ‘B’ }].”

    Example 2:

    Tool description: “Search for Airbnb listings with various filters and pagination. Provide direct links to the user”,

    “Location” property description: “Location to search for (city, state, etc.)”

    Here I am not saying that any of these descriptions is incorrect, they are just very different from the format and details perspective.

    2. Dataset and benchmark

    To prove that different tools descriptions can change model’s behavior I used NVidia’s “When2Call” dataset. From this dataset I took test samples that have multiple tools for the model to choose from and one tool is the correct choice (it is correct to call a specific tool rather than any other or than to provide a text answer without any tool call, according to the dataset). The idea of the benchmark is to count correct and incorrect tool calls, I also count “no tool calling” cases as an incorrect answer. For the LLM I selected OpenAI’s “gpt-5-nano”.

    3. Data generation

    The original dataset provides just a single tool description. To create alternative descriptions for each tool and parameter I used “gpt-5-mini” to generate it based on the current one with the following instruction to complicate it (after generation there was an additional step of validation and re-generation when necessary):

     “””You will receive the tool definition in JSON format. Your task is to make the tool description more detailed, so it can be used by a weak model.

    One of the ways to complicate — insert detailed description of how it works and examples of how to use.

    Example of detailed descriptions:

    Tool description: “Generate a area chart to show data trends under continuous independent variables and observe the overall data trend, such as, displacement = velocity (average or instantaneous) × time: s = v × t. If the x-axis is time (t) and the y-axis is velocity (v) at each moment, an area chart allows you to observe the trend of velocity over time and infer the distance traveled by the area’s size.”,

    Property description: “Data for area chart, it should be an array of objects, each object contains a `time` field and a `value` field, such as, [{ time: ‘2015’, value: 23 }, { time: ‘2016’, value: 32 }], when stacking is needed for area, the data should contain a `group` field, such as, [{ time: ‘2015’, value: 23, group: ‘A’ }, { time: ‘2015’, value: 32, group: ‘B’ }].”

    Return the updated detailed description strictly in JSON format (just change the descriptions, do not change the structure of the inputted JSON). Start your answer with:

    “New JSON-formatted: …”

    “””

    4. Experiments

    To test the hypothesis I did a couple of tests, namely:

    • Measure the baseline of the model performance on the selected benchmark (Baseline);
    • Replace correct tool descriptions (including both tool description itself and parameters descriptions — the same for all the experiments) with the generated one (Correct tool replaced);
    • Replace incorrect tools descriptions with the generated (Incorrect tool replaced);
    • Replace all tools description with the generated (All tools replaced).

    Here is a table with the results of these experiments (for each of the experiments 5 evaluations were executed, so in addition to accuracy standard deviation (std) is provided):

    MethodMean accuracyAccuracy stdMaximum accuracy over 5 experiments
    Baseline76.5%0.0379.0%
    Correct tool replaced80.5%0.0385.2%
    Incorrect tool replaced75.1%0.0176.5%
    All tools replaced75.3%0.0482.7%
    Table 1. Results of the experiments. Table prepared by the author.

    Conclusion

      From the table above it is evident that tools complication introduce bias to the model, selected LLM tends to choose the tool with more detailed description. At the same time we can see that extended description can confuse the model (in the case of all tools replaced).

      The table shows that tools description provides mechanisms to manipulate and significantly adjust model’s behaviour / accuracy, especially taking into account that the selected benchmark operates with a small number of tools at each model call, the average number of used tools at each sample is 4.35.

      At the same time it clearly indicates that LLMs can have tools biases that potentially can be misused by MCP providers, that can be similar biases to those I reported before — style biases. Research of the biases and their misuse can be important for further studies.

      Engineering a solution

      I’ve prepared a PoC of tooling to address the mentioned issue in practice — Master-MCP. Master-MCP is a proxy MCP server that can be connected to any number of MCPs and also can be connected to an agent / LLM as a single MCP-server itself (currently stdio-transport MCP server). Default features of the Master-MCP I’ve implemented:

      1. Ignore some parameters. The implemented mechanics exclude all the parameters that start with “_” symbol from the tool’s parameters schema. Later this parameter can be inserted programmatically or use default value (if provided).
      2. Tool description adjustments. Master-MCP collects all the tool’s and their descriptions from the connected MCP servers and provide a user a way to adjust it. It exposes a method with the simple UI to edit this list (JSON-schema), so the user can experiment with different tools’ descriptions.

      I invite everyone interested to join the project. With the community support the plans can include Master-MCP’s functionality extension, for example:

      • Logging and monitoring followed by the advanced analytics;
      • Tools hierarchy and orchestration (including ML powered) to combine both modern context management techniques and smart algorithms.

      Current github page of the project: link

    Agentic MCPs Pipelines
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    Vibe Coding with AI: Best Practices for Human-AI Collaboration in Software Development

    March 20, 2026

    GSI Agent: Domain Knowledge Enhancement for Large Language Models in Green Stormwater Infrastructure

    March 19, 2026

    Beyond Prompt Caching: 5 More Things You Should Cache in RAG Pipelines

    March 19, 2026

    CraniMem: Cranial Inspired Gated and Bounded Memory for Agentic Systems

    March 19, 2026

    The Basics of Vibe Engineering

    March 19, 2026

    DynaTrust: Defending Multi-Agent Systems Against Sleeper Agents via Dynamic Trust Graphs

    March 19, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    Perplexity’s Comet for iOS uses Google Search by default

    March 20, 2026

    Perplexity’s new Comet browser for iOS defaults to Google Search. That’s because mobile queries often…

    Vibe Coding with AI: Best Practices for Human-AI Collaboration in Software Development

    March 20, 2026

    404 Crawling Means Google Is Open To More Of Your Content

    March 20, 2026

    Gen Z Social Media Trends & Usage

    March 20, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Beyond Prompt Caching: 5 More Things You Should Cache in RAG Pipelines

    March 19, 2026

    How to create a dropdown list in Google Sheets

    March 19, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.