The Good Tech Companies - Tool Calling for Local AI Agents in C#

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Tool calling for local AI agents in C. Sharp by Loic Carrere. Tools are a fundamental part of agentic AI, alongside these core capabilities, while language models excel at understanding and generating text. Tools extend their abilities by letting them interact with the real world, searching the web for current information, executing code for calculations, accessing databases, reading files are connecting to external services through APIs. Think of tools as the hands and eyes of an AI agent. They transform a conversational system ento an agent that can accomplish tasks by bridging the gap between reasoning and action. When an agent needs to check the weather,

Starting point is 00:00:44 analyze a spreadsheet or send an email, it invokes the appropriate tool, receives the result, and incorporates that information into its response. This moves AI beyond peer text generation toward practical, real-world problem-solving. Interested in how agents retain and use context over time, explore our deep devion agent memory, why local agents have been hard. Building AI agents that can actually do things locally has been surprisingly hard. You need models that understand when and how to call external functions. Privacy without sending data to the cloud, a runtime that can parse tool calls, validate arguments, and inject results. Model-specific flows because each model has different tool-calling formats and interaction patterns, requiring

Starting point is 00:01:29 custom logic for interception, result injection, and action-ordering. Safety controls to prevent infinite loops and runaway costs. Clear observability so you know what your agent is doing. Until now, most agenic frameworks forced a choice, powerful cloud-based agents with latency and privacy concerns, or limited local models without proper tool support. Today, that changes. Why tool-calling changes everything. With L.M. Kit's new tool-calling capabilities, your local agents can ground answers in real data. No more hallucinated weather forecasts or exchange rates. Agents fetch actual API responses and concise sources. Chain complex workflows. For example, check the weather, convert temperature to the user's preferred units, then suggest activities. All in one conversational

Starting point is 00:02:18 turn. Maintain full privacy. Everything runs on device. Your user's queries, tool arguments and results never leave their machines. Stay deterministic and safe. Type schemas, validated inputs, policy controls, and approval hooks prevent agents from going rogue. Scale with your domain. Add business APIs, internal databases, or external MCP catalogs as tools.

Starting point is 00:02:42 The model learns to use them from descriptions and schemas alone. What's new at a glance? State of the art tool calling, write in chatbot flows. Models decide when to call tools, past structured JSON ARGs, and use results to answer users accurately. Dedicated flow support across model families like Mistral, GPTOS, Quen, Granite, Lama, and more, all via one runtime. Three ways to add tools, implement, annotate methods with, import catalogs from MCP servers,

Starting point is 00:03:12 unified API that runs local SLMs with per turn policy, guardrails, and events for human in the loop and observability at every stage. All function calling modes supported. Simple function, multiple function, parallel function, and parallel multiple function, choose strict sequencing or safe parallelism. Model-aware tool call flow. Modern SLMs emit structured tool calls. LM kit parses calls, routes them to your tools, and feeds results back with correlation and clear result types for a reliable inference path. How it works. Getting started. Here's a complete working example in under 20 lines. The model catalog includes GPTOS and many other families. Let's you pull a NAMD card like. You can also check before you rely on tools. See the model catalog documentation for details.

Starting point is 00:04:02 Try it now. GitHub SAMPLA Production Ready Console Sample Sample demonstrates multi-turn chat with tool calling, currency, weather, unit conversion, per turn policies, progress feedback, and special commands. Jump to create multi-turn chatbot with tools in. Net applications three ways to add tools. One, implement ITool, full control, best when you need clear contracts and custom validation.

Starting point is 00:04:29 This snippet demonstrates implementing the interface so an LLM can call your tool directly. It declares the tool contract, parses JSON ARGs, runs your logic, and returns structure JSON to the model. Why use ITool, complete control over validation, async execution, error handling, and result formatting. 2, annotate methods with LM-F-U-N-C-I-O-N-C-I-O-N-Q-Binding, best for rapid prototyping and simple synchronous tools. What it does? Add to public instance methods.

Starting point is 00:05:00 LM-KID discovers them and exposes each as in, generating a JSON schema from method parameters. How it's wired, reflect and bind with, or, then register the resulting tools via why use LM function, less boilerplate. The binder generates schemas from parameter types and registers everything in one line. Three, import MCP catalogs, external services, best for connecting to third-party tool ecosystems via the model context protocol. What it does? Uses McPee client to establish a JSON-RPC session with an MCP server, fetch its tool catalog, and adapt those tools so your agent can call them. How it's wired, create, optionally set a bearer token, then to import the catalog, L.M. Kit manages, retries, and session persistence. Why use MCP? Instant access to curated tool catalogs.

Starting point is 00:05:52 The server handles and over JSONRPC. LMKit validates schemas locally. See MCP client documentation, execution modes that match your workflow. Choose the right policy for each conversational turn. Simple function one tool, one answer. Example. What is the weather in Tokyo? Calls once in answers. multiple function chain tools sequentially example convert 75 degrees fahrenheit to celsius then tell me if i need a jacket one calls and gets 23 9 degrees celsius two calls and gets conditions three synthesizes answer it is 24 degrees celsius and sunny a light jacket should be fine quote dot parallel function execute multiple tools concurrently example compare weather in in Paris, London, and Berlin, calls simultaneously, waits for all results, compares, and answers.

Starting point is 00:06:48 Only enable if your tools are item potent and thread safe. Parallel multiple function combine chaining and parallelism. Example. Check weather in three cities. Convert all temps to Fahrenheit. Andre commend which to visit. One. Parallel. Fetches weather for three cities. 2. Parallel converts all temperatures. 3. Sequential. Recommends based on results. See tool call policy documentation for all options including in. Defaults are conservative. Parallel off, max calls capped. Safety, control, and observability. Policy controls configure safe defaults and per turn limits. See tool call policy documentation. Human in the LOOP review, approve, or block tool execution. Hooks. Before tool invocation.

Starting point is 00:07:35 after tool invocation, before token sampling, memory recall. Structured data flow every call flows through a typed pipeline for reproducibility and clear logs. Incoming with stable and outgoing with and success or error. Try it. Multi-turned chat sample. Create multi-turned chatbot with tools in. Net application's purpose demonstrates LM kit. Net-sagentic tool calling.

Starting point is 00:08:01 During a conversation, the model can decide to call one or multiple. tools to fetch data or run computations, pass JSON arguments that match each tools, and use each tools JSON result to produce a grounded reply while preserving full multi-turn context. Tools implement and are managed by a registry per turn behavior is shaped via why tools in chatbots, reliable, source backed answers, weather, fx, conversions, business APIs. Agentic chaining. Call several tools in one turn and combine results. Determinism and safety. Type schemas, clear failure modes, policy control. Extensibility. Implement for domain logic. Keep code auditable. Efficiency. Offload math. Look up to tools. Keep the model focused on

Starting point is 00:08:49 reasoning. Target audience. Product and platform teams, dev ops and internal tools. B2BAPs, educators and demos. Problem solved. Actionable answers, deterministic conversions, quotes, multi-turn memory, easy extensibility. Sample app lets you choose a local model or a customary. Registers three tools, currency, weather, unit conversion. Runs a multi-turned chat where the model decides when to call tools. Prints generation stats, tokens, stop reason, speed, context usage. Key features. Tool calling via JSON arguments.

Starting point is 00:09:27 Full dialogue memory, progress feedback, download, load bars, special commands. Multiple tool calls per turn and across turns. Built-in tools tool name purpose online. Notes ECB rates via Frankfurter. Latest are historical plus optional trend. Yes, no API key. Business days. Rounding and date support open media current weather plus optional short hourly forecast.

Starting point is 00:09:51 Yes, no API key. Geocoding plus metric. S.S.I. Offline conversions. Length, mass, temperature, speed, etc. No temperature is nonlinear. can list supported units tools implement, JSON schema, and returning JSON. Extend with your own tool. Use unique, stable, lowercase names. Supported models, pick per hardware. Mistral Nemo 24-712. 2B, around 7.7 gigabytes Vram. Meta Lama 3.18B, around 6 gigabyte Vram. Google Gemma 3,000

Starting point is 00:10:26 3-4B Medium, around 4GB V-RAM, Microsoft Phi 4 Mini 3, 82B, around 3, 3 gigabites Vram, Alibaba Kwen 3-8B, around 5, 6 gigabytes Vram, Microsoft Phi 414, 7B, around 11 gigabytes VRAM, IBM Granite 47B, around 6 gigabytes VRAM, Open AIGPT-PT-AOS 20B, around 16 gigabytes VRAM or provide a custom model RURI, G-GUF, let me know. Commands. Clear conversation. Continue last assistant message. New answer for last user input. Example prompts. Convert 125 United States dollars to year and show a seven-day trend. Quote dot. Whether and to lose next six hours, metric. Quote dot, convert 65 miles per hour to kilometer per hour. List pressure units. Quote dot. Now 75.5.

Starting point is 00:11:23 degrees Fahrenheit to degree C then two kilometers to miles quote dot behavior and policies quick reference tool selection policy by default the sample lets the model decide you can require forbid force a specific tool per turn multiple tool calls support several tool invocations per turn outputs are injected back into context schemas matter precise plus concise improve argument construction. Networking. Currency and weather require internet. Unit conversion is offline. Errors. Clear exceptions for invalid inputs, units, dates, locations. Getting started prerequisites. Net Framework 4.6.2 or Net 6.0 download. Run. Then pick a model or paste a custom hurry. Chat naturally. The assistant will cologne or multiple tools is needed. Use anytime.

Starting point is 00:12:18 project link GitHub repository complete example all three integration paths why go local with LM kit versus cloud agent framework zero API costs no per token charges run unlimited conversations complete privacy user data never leaves the device GDPR HIPAA friendly sub 100 mislatency local inference eliminates network round trips entirely works offline Agents function without internet connectivity. No rate limits. Scale to millions of requests without throttling. Full control. Own the stack. No vendor lock-in or API deprecations versus basic prompt engineering type safe schemas. Jason schema validation catches bad arguments before execution. Deterministic results. Clear success, error states, not fragile reg X

Starting point is 00:13:12 parsing. Parallel execution. Run multiple tools concurrently when safe. full observability structured events at every stage not log archaeology testable contracts mock tools inject results replay conversations error boundaries graceful failures with retry logic and fallbacks versus manual function calling model decides agent autonomously picks tools and arguments no brittle if else chains auto chaining multiple tool calls per turn results fed back automatically 90% less boilerplate. Register tools once, not per model or per prompt. Built in safety. Loop Prevention. Max calls limits. Approval hooks out of the box. Model agnostic API. Same code works across Mistral, Lama, Quinn, Granite, GPT offs. Progressive enhancement. Add tools without refactoring

Starting point is 00:14:07 conversation logic. Performance and limitations. Performance expectations tool in vocation overhead. Around 2 to 5 milliseconds per call, parsing plus validation. Network tools. 50 to 500 milliseconds depending on API. Local tools. Less than 1 MIZ. Model inference remains the primary latency factor. Requirements models must support tool calling, check. Network dependent tools require internet connectivity. Parallel execution requires thread safe. Item potent tools. Recommended GPU memory, 6 to 16 gigabytes VRAM depending on model size. Known limitations tool selection quality depends on clear descriptions and schemas. Complex nested objects and arguments may confuse smaller models. Very long tool chains, more than 10 calls, may exceed context windows. Ready to build?

Starting point is 00:15:00 1. Clone the sample. 2. Pick your integration approach need full control? Use. Prototyping quickly? use using external catalogs use three add your domain logic replace demo tools with your APIs databases or business logic four set policies that fit your use case simple lookups complex workflows with approval hooks five ship agents that actually work on device private reliable observable start building agentic workflows that respect user privacy run anywhere and stay under your control thank you for listening to this hackernoon story read by artificial intelligence. Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - Tool Calling for Local AI Agents in C#

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.