How to Transform ML APIs into an LLM-Compatible MCP Server with Python

Good afternoon. My name is Tetsuya Hirata. Today I'll show you how to transform machine learning APIs into LLM-compatible MCP servers using Python. We'll compare three different approaches: REST API, Function Calling, and MCP. I'll demonstrate each approach with live code.

The agenda for today is as follows. First, background on the challenges of integrating multiple ML models. Second, an examination of three approaches: REST API, Function Calling, and MCP. Third, architecture diagrams comparing these approaches. Fourth, a live demonstration showing all three approaches running in the terminal. Fifth, a real-world LINE Bot use case. And finally, future work and next steps. The majority of our time will be spent on the live demonstration, where you'll see actual code and terminal output for each approach.

First, the background.

Machine learning projects typically have three key components. First, train.py, which is where you train and evaluate your models. Second, preprocessing.py and postprocessing.py for data preparation and output formatting. And third, inference.py, which is what runs in production to serve predictions. When you deploy ML systems, you're primarily working with the inference code. This is what your users actually interact with.

The common approach to integrating ML into applications is to wrap each model as a REST API endpoint. For example, you might have /api/translate for text translation, /api/stt for speech-to-text, /api/ocr for optical character recognition, and /api/tts for text-to-speech. This works, but there's a better way to integrate ML models and make them conversational with LLMs.

When you have multiple endpoints, you face manual endpoint management. Each feature requires a dedicated API call. The client code grows with every new endpoint you add. And the client must know all the endpoint URLs. The key question is: what if LLMs could choose the right tool automatically? That's what we'll explore today.

The idea is to transform multiple REST endpoints into unified tools that LLMs can discover and use automatically. What's the result? End users keep the same interface they're used to. But now, the LLM picks the appropriate tool based on the context of what the user is asking for. And you get simplified tool management through a standardised protocol. This is exactly what MCP enables.

Before we dive into the three approaches, let me briefly explain the Python MCP SDK. MCP has three main concepts. First, Prompt. These are reusable instruction templates that the LLM uses as context for better responses. Second, Resource. This gives you access to local files, databases, or API responses. The LLM can read these as context. And third, Tool. These are executable actions like translation, OCR, or TTS. These are what the LLM can actually invoke and execute. In this talk, we'll focus on Tools because that's where ML inference happens.

Now we'll compare three approaches to ML integration.

First, REST API. This is the standard approach. Each model is exposed as an /api/service endpoint. The client makes HTTP requests with JSON payloads. What are the pros? It's simple and straightforward. You don't need any special client. It's just direct HTTP communication. But what are the cons? There's no intelligent tool selection. You have to do manual routing. The client must know all the endpoint URLs. And when you add new endpoints, you have to update your client code. That's tight coupling.

The second approach is Function Calling. This adds LLM intelligence. You define functions with type signatures. The LLM decides which function to call based on the user input. And the SDK can execute these functions either automatically or manually. What are the pros? The LLM handles tool selection for you. You get structured parameters via JSON Schema. And it's great for single-model integration. But what are the cons? It's vendor-specific. If you're using Gemini, you're locked into Gemini. If you're using OpenAI, you're locked into OpenAI. It's less portable across different LLMs. And you still require custom HTTP handling on the client side.

The third approach is MCP, which stands for Model Context Protocol. This is a standardised approach. The server exposes tools via the MCP protocol. The client discovers tools automatically. The LLM selects which tool to use, and you execute it via call_tool. What are the pros? It's a standardised protocol that works across different LLMs. You get automatic tool discovery. There's no URL management needed. And you have loose coupling between client and server. What are the cons? Well, it's a newer protocol, so the ecosystem is still smaller compared to REST APIs. And you do need an MCP-compatible client. But the benefits far outweigh these drawbacks.

Now we'll look at the architecture differences.

This first diagram shows the basic MCP architecture. On the left, you have the Client PC running multiple MCP clients. Each client process, like Claude Desktop or Chat GPT Desktop, has its own MCP client. These clients communicate with MCP servers via the Internet. On the right side, you have MCP servers. MCP Server A provides Prompt templates. MCP Server B provides Resource access to local files, databases, or external web APIs. And MCP Server C provides Tools, which connect to external ML APIs for inference, or to ML modules written inside the server itself for things like STT, automatic speech recognition, translation, and summarisation. At the bottom, you have the LLM, such as Claude Opus, which communicates with all the MCP clients. This is the overall MCP ecosystem.

The next diagram compares REST API and MCP for ML integration. In the top half, you see the REST API approach. The Client PC connects to a Web Application Frontend via the Internet. The frontend then connects to multiple REST API endpoints: Endpoint A, B, and C. Each endpoint connects to a separate Audio ML Module, another Audio ML Module, and an Image ML Module. The key point here is that the Web Application must know all the endpoint URLs. It must hardcode which endpoint to call for which feature. In the bottom half, you see the MCP approach. The Client PC runs an MCP Host with MCP Client A. The client communicates with the LLM, and the LLM is connected to a single MCP Server for Tool. This server provides access to four ML modules: Audio ML Module, Image ML Module, another Audio ML Module, and Text ML Module. The key difference is that the MCP client doesn't need to know which module to use. It discovers all available tools from the server, and the LLM decides which tool to invoke based on the user's request. This is loose coupling.

The final diagram compares Function Calling and MCP. In the top half, you see Function Calling. The Client PC runs an MCP Host with MCP Client A. The client communicates with the LLM. But here, instead of connecting to an MCP server, it connects to a Function Calling endpoint. This Function Calling layer then connects to the four ML modules. One important aspect is that Function Calling is vendor-specific. It's tied to your LLM provider, whether that's Gemini or OpenAI. In the bottom half, you see MCP. The architecture is similar, but now the MCP Client connects directly to an MCP Server for Tool. This server provides the same four ML modules. The advantage is that MCP is a standardised protocol. It works with any MCP-compatible LLM. You're not locked into one vendor. You can switch from Claude to Gemini to any future LLM that supports MCP, and your server code remains the same.

Now for the main part of today's talk: the live demo. I'm going to show you all three approaches running in the terminal. We'll see the code structure and the actual execution for each one. Before we start, let me explain the setup. You might be familiar with Claude Desktop connecting to MCP servers. That's a common use case. But for this presentation, I wanted to demonstrate the full MCP architecture. So I've implemented both the MCP client and MCP server myself, without using an MCP host like Claude Desktop. For the LLM agent, I'm using Gemini.

DEMO PART 1: REST API Let's start with the REST API approach. I have VS Code open with a two-panel layout. On the left, the client code. On the right, the server code. First, we run this to see how it works. I'll switch to iTerm and start the REST server with `python server/rest.py`. The server is starting up. It shows the server URL and the API docs URL. Now I run the REST client with `python client/rest.py`. When I type "Hello", watch what happens. The client calls the translate endpoint directly with the hardcoded URL, and returns the Thai translation: sawatdee. Now let's go back to VS Code and look at the server code on the right panel. Here's `code/server/rest.py`. You can see the translate endpoint at line 68. It's decorated with `@app.post`, and the route is `/api/translate`. The function receives a TranslateRequest, calls `translate_text` with the text and destination language Thai, and returns a TranslateResponse with the translated text. Now the client code on the left panel. Here's `code/client/rest.py`. Look at the `translate_text` function at line 67. It has the full endpoint URL hardcoded at line 79: `http://localhost:8000/api/translate`. The client must know the exact URL. If the server changes this URL, the client code breaks. This is tight coupling. The translation works perfectly. But notice one thing: the client must know exactly which endpoint to call. There's no automatic tool discovery. The client has hardcoded URLs for each feature. DEMO PART 2: Function Calling Now let's see Function Calling with Gemini. This is where we add LLM intelligence. Same VS Code layout: client on the left panel, server on the right panel. First, we run this to see the difference. I'll go back to iTerm, stop the REST server, and start the Function Calling server with `python server/fc.py`. The server starts up. Now I run the Function Calling client with `python client/fc.py`. The client connects to the server and discovers the available tools. You can see it says "Discovered two tools from server: translate and stt". When I type "Hello", watch what happens. Gemini receives the input and the tool definitions. It decides that the translate tool is the right one to use. You can see "Tool selected: translate". Then the client calls the server endpoint. And we get the result: sawatdee. Now let's go back to VS Code and look at the server code on the right panel. Here's `code/server/fc.py`. You can see this endpoint called `/tools` at line 123. It's a GET endpoint that returns a list of ToolMetadata objects. Each tool has three key components. First, name: the tool identifier like "translate". This name corresponds to the actual endpoint path we defined earlier at line 68, which is `/tools/translate`. So the name field connects the tool metadata to the actual implementation. Second, description: what the tool does, like "Translates the given text to Thai language". This helps Gemini understand when to use this tool. And third, parameters: a JSON Schema definition that tells Gemini what arguments this tool expects. For example, the translate tool's parameters specify that it requires a text field of type string. This is crucial because Gemini needs to know not just which tools exist, but also what data each tool needs to function. The server now provides complete tool metadata that Gemini can understand and use to decide which tool to call with the correct parameters. Now the client code on the left panel. Here's `code/client/fc.py`. First, the client fetches the tools metadata from the server by calling `fetch_tools_from_server` at line 70. Then it converts these to Gemini format. And here's the key part: when the user enters input, we call `client.models.generate_content` at line 154 with the Gemini model. We pass the user input and we pass the tools in the config. Notice the `system_instruction` parameter at line 158. It says "You must use the appropriate tool to respond." This forces Gemini to always use a tool rather than responding directly. In a real application, the LLM would decide whether to use a tool or not based on the user's message. But in this demo, I wanted to focus on showing how the LLM selects which tool to use. So we guarantee that a tool is always called. But the important part is: Gemini still decides which tool is appropriate. It looks at the user input, looks at the available tools, and intelligently selects the right one. Then the client executes it manually by making an HTTP call to the server endpoint. Notice the big difference here. Gemini automatically selected the translate tool. We didn't have to write any manual routing logic with if statements or keyword matching. Gemini understood the user's intent and picked the right tool. But one thing to note: we're still managing HTTP endpoints. We still have URLs hardcoded in the client. Let's see how MCP addresses this. DEMO PART 3: MCP Finally, let's see the MCP approach. Same VS Code layout: client on the left panel, server on the right panel. First, we run this to see how different it is. I'll go back to iTerm and stop the Function Calling server. Now watch this: we only need iTerm for this one command because the client automatically starts the server. I run `python client/mcp_client.py`. The client starts up. It automatically launches the server as a child process using stdio communication. The client discovers two tools from the MCP server: translate and stt. When I type "Hello", the client uses the LLM to decide which tool to use. You can see "LLM deciding which tool to use". It selects the translate tool. Then it calls the tool via the MCP protocol. And you can see the result comes back as JSON: `translated_text` is sawatdee. Now let's go back to VS Code and look at the server code on the right panel. Here's `code/server/mcp_server.py`. Look at how simple this is. We have a function called translate at line 90. It's decorated with `@mcp.tool` at line 89. It takes a text parameter as a string. It returns a TranslateOut object. And that's it. Just use the `@mcp.tool` decorator, and FastMCP handles everything else. Look at the comment section above the function starting at line 63. This explains what FastMCP automatically generates. First, the name is extracted from the function name. Second, the description comes from the docstring. And third, the inputSchema is generated from type hints. For example, `text: str` becomes `{"type": "string"}` in JSON Schema. FastMCP also analyzes the return type TranslateOut and its Pydantic fields to generate the output schema. The key advantage here is that this is vendor-neutral. In Function Calling, you have to write tool metadata differently for Gemini, OpenAI, and Claude. Each vendor has their own format. With MCP, you write it once using Python type hints and docstrings, and FastMCP converts it to the standard MCP format. The same server code works with Claude Desktop, Gemini via MCP adapters, or any future MCP-compatible LLM. You don't need to rewrite your metadata for each vendor. Now the client code on the left panel. Here's `code/client/mcp_client.py`. First, we define server parameters at line 121. We specify that we want to run a Python command with the argument `server/mcp_server.py`. This tells the MCP client to automatically start the server as a child process. Then we call `session.list_tools` at line 139 to discover what tools are available. Notice that we're using a `prompt` variable to send a text message to Gemini. This is similar to the `system_instruction` approach in Function Calling, forcing the LLM to always use a tool for demo purposes. And when we want to execute a tool, we just call `session.call_tool` at line 179 with the tool name and arguments. There's no URL management. There's no HTTP client code. It's just pure MCP protocol. Notice three really important differences here. First, server auto-start. The client started the server automatically as a child process via stdio. We didn't have to run the server manually in iTerm separately. Second, JSON output. MCP uses the JSON-RPC protocol, so results are structured data. And third, stdio communication. The communication happens via standard input and output, not HTTP requests. This is exactly the same mechanism that Claude Desktop uses to auto-start MCP servers. But MCP actually supports two communication methods. First, stdio for local process communication, which is what we're using in today's demo. And second, HTTP/SSE for remote server communication, which you would use in production environments when your MCP server is running on a different machine or in the cloud. When you use HTTP/SSE to connect to cloud-hosted MCP servers, HTTP communication does occur under the hood. But here's the key difference: the MCP SDK abstracts all of that away. You still just call `session.call_tool` with the tool name and arguments. You don't have to write custom HTTP client code for each endpoint. You don't have to manage different URLs for different services. The MCP SDK handles the HTTP/SSE transport layer, and you work with a clean, unified interface. So what's the key point here? With MCP, you get automatic tool discovery. You get a standardised protocol that works across different LLM providers. You don't have to write custom HTTP client code for individual endpoints. And it works with any MCP-compatible LLM, whether that's Claude, Gemini via an adapter, or other future LLMs.

Let me show you a real-world application and when MCP actually adds value.

I built a Japanese-Thai translation LINE bot that's serving real users in production. The ML modules I integrated are: OCR to extract text from images, Translation for Japanese to Thai and Thai to Japanese, Gemini Chat to answer general questions, and TTS for text-to-speech conversion. You can try it yourself at jpthaitransai.com/th. Here's an important point: the current B2C version uses direct Google ML API calls with rule-based selection. For example, if the user uploads an image, we call OCR. If they send Japanese text, we call translation. There's no LLM deciding which tool to use because the rules are clear. In this case, MCP doesn't add value. Direct REST API calls are simpler and more efficient. So when does MCP become valuable? When we expand to B2B and build an ecosystem architecture. Imagine multiple enterprises each needing specialized translation tools. A hospital needs medical translation. A law firm needs legal translation. A tech company needs technical translation. With MCP, we get two key benefits. First, easier endpoint management. Instead of hardcoding partner server URLs in the code, we use configuration files. When Company A signs up, we give them access to medical and general translation MCP servers. When Company B signs up, they get legal and technical translation servers. No code deployment needed. Just update their config file. Second, context-aware translation. The LLM analyzes the user's message and selects the appropriate specialized MCP server. If a doctor sends a message about heart disease treatment, the LLM routes it to the medical translation MCP server. If a lawyer sends a message about contract terms, the LLM routes it to the legal translation MCP server. This gives much better translation quality than a generic translation API. This is when MCP shines: not for simple rule-based routing, but for building composable AI systems where context determines which specialized tool to use.

I'll discuss what you need for production readiness and what advanced features you can add.

For production readiness, you'll need several things. First, error handling. You want exponential backoff retry logic so that temporary failures don't break the user experience. Second, logging. You should track tool invocation metrics so you can monitor which tools are being used, how long they take, and whether they're failing. Third, security. You need JWT authentication and rate limiting to prevent abuse. And fourth, testing. You should write pytest tests and integration tests to make sure everything works correctly. For advanced features, there are four main areas to explore. First, streaming. Instead of waiting for the entire result, you can yield partial results for long-running tasks. This gives users faster feedback. Second, tool chaining. You can chain multiple tools together in one flow. For example: OCR to extract text from an image, then translate that text, then convert it to speech with TTS. All in one seamless operation. Third, context. You can maintain conversation state across multiple requests so the bot remembers what you talked about earlier. And fourth, caching. You can use Redis to cache frequently translated phrases so you don't have to call the ML model every time.

To summarise what we've covered today. We examined three approaches to integrating ML models with LLMs. REST API uses manual routing and has tight coupling. Function Calling adds LLM routing but is vendor-specific. And MCP uses a standardised protocol with automatic discovery. The benefits of MCP are clear. You get automatic tool selection by the LLM. You don't have hardcoded URLs in your client. It's cross-LLM compatible so you're not locked into one vendor. And it's straightforward to extend with new tools. The key takeaway is this: MCP makes your ML tools LLM-discoverable. With the Python SDK, you simply use the @app.tool decorator. And MCP supports two communication methods: stdio for local processes, and HTTP/SSE for remote servers in production.

Thank you for your attention. The code and slides are available at github.com/tetsuya0617/pyconthailand2025-public. I'm happy to discuss this further after the session.

How to Transform ML APIs into an LLM-Compatible MCP Server with Python

Agenda

Background

ML Code Structure

Integrating ML into Applications

The Challenge

The Idea

This Presentation Uses Python MCP SDK

Three Approaches to ML Integration

Approach 1: REST API

Approach 2: Function Calling

Approach 3: MCP (Model Context Protocol)

Architecture Comparison

MCP Client / MCP Server Architecture

REST API vs MCP Server (for ML)

Function Calling vs MCP Server (for ML)

Live Demo

Use Case: LINE Bot

Real-World Example: Japanese-Thai Translation LINE Bot

Future Work

Next Steps & Improvements

Summary

Thank You!