With a Master’s degree in Artificial Intelligence in hand, software developer Kelly Hellinx from Teal Partners set out to explore the practical applications of language models for their clients. Intern Mathijs Custers, a student of Applied Computer Science specializing in AI, assisted him in this effort. For his internship project, Mathijs built a chatbot powered by AI.
Mathijs: “The goal of my internship was to build a proof-of-concept around LLM tool calling. Is it technically feasible to integrate a language model into a chatbot architecture via APIs and enrich it with external functions so that it performs actions independently?”
Kelly: “We wanted to determine to what extent a chatbot responds more intelligently thanks to artificial intelligence. Many companies today have a chatbot, but few use AI. As a result, conversations with those bots are often still awkward, let alone that they can perform the correct task for you. This field of research is in its infancy.”
The user asks the chatbot in natural language to book a vacation. It asks for confirmation and completes the task flawlessly.
Kelly: “The chatbots we know so far are rule-based. They respond to keywords entered by the user. As a developer, you have to program all possible variations of those words, or the chatbot won't understand you. Think of synonyms, spelling mistakes, colloquial language, and so on.
“LLMs understand the context and meaning of a language in an unparalleled way. That is a major advantage in a chatbot. We wanted to test how this works in practice in combination with tool calling, where the language model automatically calls the correct function in the software.”
Mathijs: “For this research, we focused on Buddy software that Teal Partners develops for SD Worx. We concentrated on the core HR part of the software, specifically on the user portal for employees. An employee enters their leave or expenses, adjusts their data, or manages their benefits. They navigate through the screens themselves, but they could also ask a chatbot for help. That's where our research starts.”
Kelly: “We partnered with the strongest language model from OpenAI for our experiment. GPT-4o is the most advanced and integrates best with Microsoft technology. We disregarded other models.”
If the month is not explicitly mentioned when requesting a day off, the system is smart enough to choose the current month.
Mathijs: “It works surprisingly well. The language model independently searches for context, makes connections, and retrieves data. As a developer, you need to ensure that the model can assess whether it has enough information to take action or if it should first ask the user for additional input. We even tested multi-layered functions, such as adding a dependent child or complex expenses with VAT splits. That also worked fine. I set up a workflow with numerical values to measure the model's accuracy. The results were impressive.”
The user asks the chatbot to register his daughter in the system. The bot independently requests additional information it needs. The task is successfully completed.
Kelly: “We had our colleagues at Teal Partners test the chatbot. They were also impressed by the quality of the responses and the tasks performed. The chatbot was able to book leave, enter expenses, and modify personal data upon request. Language models are getting better at reasoning and successfully executing the correct functions. However, it does require a lot of power.”
Mathijs: “Some actions, such as adding expenses with multiple elements, required a lot of context. This sometimes resulted in longer response times. The consistency of data formats, such as date notations, also proved to be a point of attention. And, of course, there is always a risk of ‘hallucinations’ – situations where the model invents something. That's why we built in a security layer: functions with a higher risk require explicit user approval.”
Kelly: “Not so fast. This research taught us that tool calling via a language model works well for our specific application. But there's so much more involved.”
Kelly: “Every question to a language model costs money. As a company, you have to determine if it's worth the cost. The more complex the question, the more expensive it becomes. We focus on the application for employees. But employers also use the Buddy software. They perform different, more complex tasks. That requires more data, and thus more computing power. We completely disregarded the commercial aspect. The same goes for privacy, ethics, or security.”
Kelly: “Absolutely. As for Mathijs's internship: that task has been completed and the results exceeded our expectations. Now we will see how we can build on these results within Teal Partners.”