Structured MCP Server evaluation from a live sandbox — 60% timeout rate on semantic search

Forum|Forum|2 months ago
April 30, 2026
4 replies
150 views

vegard.ofstaas
Contributor II

We've run a structured, hands-on evaluation of the Docebo MCP server beta against our Visma Learning Universe sandbox instance. Our platform serves product and partner training for Payroll, Visma Net, and Business NXT, primarily in Norwegian — so this is a real-world enterprise test with non-English content.

How we tested: We used Claude (Anthropic) in Cowork mode as the MCP client, which gave us a realistic end-to-end view of how the server performs when consumed by an actual AI assistant. We tested all four tools with domain-specific queries, cross-verified results against the Docebo UI, and ran security probes including prompt injection and SQL injection attempts.

What impressed us: The permission model is excellent — properly user-scoped, read-only, no admin data leakage. The semantic search returns genuinely useful step-by-step answers from our training material when it works. Cross-language search (English query returning Norwegian content) is a nice bonus.

What blocks us from going further:

ask_learning_content timeout rate: 60% — 3 of 5 semantic queries timed out at 180s in our second test round. The queries that failed were core use cases for our learners ("Hvordan kjører jeg lønn i Payroll?"). This is the single biggest blocker.
No pagination on get_my_learning_enrollments — our users have 200+ enrollments, producing 150K+ character responses that overflow LLM context windows.
UTF-8 encoding issues in RAG chunks — Norwegian characters (åøæ) render as mojibake in retrieved content.

The full report (attached) includes a complete test appendix with every query, its result, and a cross-verification table against the sandbox UI. Happy to run additional tests or jump on a call if the team wants to dig into any of the findings.

Vegard Øfstaas Visma Software

luca.latini
Docebian
Forum|Forum|2 months ago
April 30, 2026

We've run a structured, hands-on evaluation of the Docebo MCP server beta against our Visma Learning Universe sandbox instance. Our platform serves product and partner training for Payroll, Visma Net, and Business NXT, primarily in Norwegian — so this is a real-world enterprise test with non-English content.

How we tested: We used Claude (Anthropic) in Cowork mode as the MCP client, which gave us a realistic end-to-end view of how the server performs when consumed by an actual AI assistant. We tested all four tools with domain-specific queries, cross-verified results against the Docebo UI, and ran security probes including prompt injection and SQL injection attempts.

What impressed us: The permission model is excellent — properly user-scoped, read-only, no admin data leakage. The semantic search returns genuinely useful step-by-step answers from our training material when it works. Cross-language search (English query returning Norwegian content) is a nice bonus.

What blocks us from going further:

ask_learning_content timeout rate: 60% — 3 of 5 semantic queries timed out at 180s in our second test round. The queries that failed were core use cases for our learners ("Hvordan kjører jeg lønn i Payroll?"). This is the single biggest blocker.
No pagination on get_my_learning_enrollments — our users have 200+ enrollments, producing 150K+ character responses that overflow LLM context windows.
UTF-8 encoding issues in RAG chunks — Norwegian characters (åøæ) render as mojibake in retrieved content.

The full report (attached) includes a complete test appendix with every query, its result, and a cross-verification table against the sandbox UI. Happy to run additional tests or jump on a call if the team wants to dig into any of the findings.

Vegard Øfstaas Visma Software

Hi @vegard.ofstaas ,

thanks for this, it's exactly the kind of real-world testing that helps us improve toward the general availability release!

Here's my initial response. We'll dig into the logs and I will come back with a more detailed follow-up.

ask_learning_content timeouts: This is not expected behavior and we haven't seen timeout rates like this before. It may be a transient issue in your sandbox environment, but we can't rule out other causes without a closer look. We'll investigate and get back to you.
Pagination on get_my_learning_enrollments: This is a known limitation of the current implementation. The scenario you described (200+ enrollments) is exactly the kind of case we're working to handle better. It's on our radar.
UTF-8 encoding: First report we've had on this. We'll dig in.

Thanks again, the quality of your report is genuinely helpful.

Like

S

Sahra
Newcomer
Forum|Forum|16 days ago
July 9, 2026

Hi @vegard.ofstaas thank for sharing this! What format are your training materials in when it comes to the semantic queries you tested? The “ask_learning_content” tool has the most value for our org.

Like

vegard.ofstaas
Author
Contributor II
Forum|Forum|16 days ago
July 9, 2026

Hi @vegard.ofstaas thank for sharing this! What format are your training materials in when it comes to the semantic queries you tested? The “ask_learning_content” tool has the most value for our org.

Our courses are built in Articulate Rise and published as SCORM packages to Docebo. The content is primarily text-rich interactive modules with embedded images — Rise's typical output of scrollable, block-based lessons.

For the ask_learning_content testing, the RAG system returned plain text chunks that appear to be extracted from within the Rise/SCORM packages — essentially the text content from the lesson blocks. The chunks come back with just title, chunk (the text), and url, with no metadata about source format.

A few observations that might be useful:

The tool indexes the text layer from Rise content reasonably well — when it works. Our main challenge was reliability: we saw a ~60% timeout rate on ask_learning_content calls, which was the most critical finding. When it did return results, the text was recognizable and relevant to the query.

We also hit encoding issues (mojibake) — our content is in Norwegian, and characters like å, ø, æ came back garbled in the chunks. Worth flagging if your content uses non-ASCII characters.

One open question I have: Rise courses can contain video blocks, labeled graphics, accordion components, etc. We didn't specifically test how well the RAG handles those non-text elements vs. pure text blocks. If your Rise courses are heavy on interactive components with embedded text, that could affect retrieval quality.

Like

luca.latini
Docebian
Forum|Forum|16 days ago
July 9, 2026

Hi @Sahra @vegard.ofstaas thanks again for taking the time to share your evaluation. Feedbacks like yours have been invaluable in helping shape the product over the past couple of months.

I also wanted to follow up with an update based on the improvements we've recently released in sandbox soon in production as for this post.

@vegard.ofstaas some of the areas you highlighted have already been addressed in our latest sandbox release:

Scalable learner lists: get_my_learning_enrollments (and certifications) now support scalable retrieval, allowing AI assistants to navigate large enrollment sets without overwhelming the model context.
Improved content retrieval: We've evolved ask_learning_content into fetch_learning_item, which lets AI assistants retrieve and reason over the content of specific learning items using the same parsing engine that powers Docebo Harmony. This provides a more deterministic and reliable retrieval flow for content-based questions.

Please keep the feedback coming as you test newer builds!

Like

Still haven't found what you're looking for?

Sign up for Docebo Community

Log in to Docebo Community