Getting started with SCORM Intelligence – Filtered

Most organisations don't have a learning content problem, they have a visibility problem. The content exists. It's just buried inside thousands of SCORM packages that nobody can search, tag, or make sense of without enormous manual effort. SCORM Intelligence solves that. This article explains what it does, how to use it, and what you get out the other end.

The Problem It Solves

Large enterprises typically manage around 5,000 SCORM packages. The majority are poorly tagged, duplicative, or simply obsolete, and yet someone has to manage them. That burden typically costs around 2.5 FTE, or roughly £250k per year, before you've improved a single learner's experience.

The issue isn't that the content is bad. It's that nobody has a clear view of what they have, where it overlaps, or what's worth keeping. SCORM packages are opaque by design, everything is locked inside the package and invisible to your LMS or anyone trying to audit the library.

SCORM Intelligence opens them up.

How It Works: Four Steps

SCORM Intelligence is a self-hosted AI pipeline. Your content never leaves your environment. Here's what happens:

1. Upload: Connect your source LMS and ingest SCORM packages directly. SCORM 1.2, SCORM 2004, and legacy formats are all supported. No prep work is needed.

2. Transcribe: Self-hosted AI transcribes every lesson, slide, and interaction inside each package. Because this runs on your own infrastructure and we are using our internally hosted AI, nothing is transmitted to a third party.

3. Analyse: Relevance scoring, skills mapping, and similarity detection are applied across your full library. This is where you start to see which content overlaps, where gaps exist, and what's worth keeping.

4. Deliver: Modules are tagged to your skills framework, exported clean and structured, and ready for LMS import or search. The key outcome: SCORM unbundling at 10× lower cost than manual consultants.

What SCORM Intelligence Produces

Taking a SCORM package through the pipeline gives you:

Module-level breakdown - individual lessons and modules extracted from each package, not just the course as a whole

Skills tags - content tagged to relevant skills at a granular level, generated automatically

Assessment detection - assessments identified and mapped to the modules they belong to

AI-generated transcripts - full transcripts for each module, ready for search and AI integration

Similarity scores - a "Transcript Similarity" view that flags where content across your library meaningfully overlaps, enabling deduplication and gap analysis at scale

For example: a single "Introduction to Data Science" SCORM package processed through the pipeline might yield 3 modules, 8 skills tags, and 1 detected assessment all mapped and searchable, where before there was a single opaque file.

Efficiency Gains

The time savings are significant once you stop doing this manually.

Task	Before SCORM Intelligence	With SCORM Intelligence
Analysis to prioritise updates	2–3 weeks	20–30 minutes
Identify and review content needing change	2–3 months	60–90 minutes
Implement updates	Same	Same
SCORM metadata searchable by all users	No	Yes

Manual curation of 5,000 packages, at 15 minutes per package, takes roughly 1,250 hours of staff time. SCORM Intelligence processes 10,000 packages in minutes.

Why Teams Use It

Visibility of skills overlap You can see which skills each lesson covers and find out where content overlaps across your library. Gaps and duplicates become obvious immediately, rather than after months of spreadsheet work.

Microlesson-level personalisation A 1-hour monolithic course becomes 20 × 3-minute modules. Learners can be pointed at exactly what they need rather than completing something they only needed 10% of.

Turnkey RAG solution Extracted data feeds via API directly into AI chatbots. Employees query your content in plain language and get the exact module they need; not a link to a 1-hour course.

Works with terrible data Just 0 - 25 words of metadata needed to start. Zero training data required. The pipeline is designed to handle the kind of inconsistent, poorly-documented libraries that most organisations actually have.

Manual Curation vs. SCORM Intelligence

Category	Manual Curation	SCORM Intelligence
Time required	15 min/pkg = 1,250 hrs for 5k packages	10,000 packages in minutes
Cost	£100k+ in staff time, ongoing burden	5x–10x cost saving, automated updates
Metadata requirements	Needs complete info — bottleneck at scale	No metadata needed
AI integration	Manual, requires extra development	Turnkey RAG, API-ready for chatbots

Where Unbundled Content Goes

Once SCORM Intelligence has processed your library, the output feeds into three places:

Back into your LMS Every unbundled module is imported clean, skills-tagged, and structured. Learners track progress at module level from day one, not just at course level.

Into your AI and search tools Unbundled modules feed your AI chatbot via API. Instead of being directed to a 1-hour course, employees get the specific module that answers their question.

Into L&D decision-making Relevance scores surface what to keep, what to update, and what to retire, backed by data rather than instinct. The skills overlap view shows exactly where you have duplication and where gaps exist across your library.

Using SCORM Intelligence in a Migration

SCORM Intelligence fits naturally into an LMS migration, and it delivers value before, during, and after go-live.

Phase 1: Pre-Migration Upload and transcribe your SCORM library. Make keep, retire, and unbundle decisions based on actual data. Skills tags are auto-generated at scale. You'll typically reduce your data volume by 30-50% before you move anything.

Phase 2: LMS Migration Export clean, tagged data directly to your new LMS. Your skills framework is active from day one. Modular tracking is enabled at launch. Reduced volume means lower vendor costs.

Phase 3: Ongoing Value New content is auto-ingested and tagged as it enters the library. AI search and RAG are powered by clean, structured data. The library stays current without manual administration.

A Note on Security

SCORM Intelligence runs on self-hosted infrastructure. The AI that transcribes and processes your content operates within your own environment; nothing is sent to a third-party AI provider. Tags are produced by Filtered's proprietary machine learning algorithm. Content reasoning uses an open Anthropic model running on Filtered's own Amazon servers.

The ROI Case

Your organisation currently spends roughly 2.5 FTE managing SCORM gunk. With SCORM Intelligence, that burden disappears. Content becomes discoverable, skills gaps become visible, and your L&D team can focus on strategy rather than spreadsheets. The ROI is immediate: 5x-10x cost savings from day one.

The content you've already paid to create shouldn't be sitting in an unreadable black box. SCORM Intelligence makes it usable.

FAQs

Can we see the end-to-end solution?

Please see our full video walkthrough here: https://share.descript.com/view/JJsmESBXpuo - this covers the complete pipeline from upload through to skills tagging, AI-assisted search, and the learner response layer.

Can you walk us through parsing of various other formats like PDF, and share a roadmap and any early findings?

The Q2 release will add native parsing for PDF, video, audio, and PPT/PPTX. None of these present significant technical challenges, robust Python libraries already handle each format well, so extending coverage is straightforward. Samantha, if you'd like to send sample content in any of these formats rather than video, we'll run it through the pipeline and include the output in your preview.

Parsing seems easier said than done. PPTs are unstructured data and require a lot of context to process correctly, versus simple text in a PDF. Are there consolidated learnings about content formats and their implications for content processing?

The concern about PPT being unstructured doesn't apply here: PPTX is an XML-based format and is therefore highly structured and machine-readable. Our approach takes a screenshot of each slide and uses computer vision to cross-validate the text extracted from the XML; this is current best practice and our tests confirm it works reliably across all real-world PPT styles. Please send your sample files and we'll demonstrate directly against your content.

In addition to the content processing side, could you share how answers are served to learners?

The learner response layer is best seen in action. This demonstration of the extracted SCORM in Teams provides a sense of how that works: https://share.descript.com/view/nExbJ759WHb

We'd like to see the end response against a learner's query; for example, simple text, text plus links, or text plus a marker for further learning. Are there implications for each type, for example does text plus further learning require a very different content processing method?

See video above. However, output format is entirely controlled by how the LLM consuming our API is prompted, which gives you full flexibility. Our standard pattern is a narrative response accompanied by direct links to the relevant source content for further learning. The only variable across response types is LLM token cost — longer outputs cost proportionally more - but there is no difference in the underlying content processing method regardless of output format.

What will the similarity score look like in the product?

Are you using the SCORM internal bookmarking to separate out sections within an asset?

We don't rely on SCORM's internal bookmarking flags, which are inconsistently implemented across authoring tools. Instead, we infer section boundaries from the course structure itself using the reasoning capability of the LLM. This is a contextually intelligent interpretation rather than a hard rule, and it produces accurate segmentation across a wide range of content architectures.

With the demo, it looks like the tool is breaking the SCORM package into sections and showing metadata. How much of that metadata is extracted directly from the asset and how much is generated?

We prioritise transcribed data throughout, this is the highest-value signal for search and skills tagging. The LLM is deployed as a reasoning tool to accurately locate and surface real transcript sources; we never surface generated transcript data or treat it as authoritative. All module titles are transcribed. We do generate metadata and descriptions to support the analysis layer, but generated content is not surfaced to the user at all. We also use AI to infer module length and tag the content which adds metadata without generating new written content.

If metadata is generated, what are you using to create it - an open model or a proprietary one?

Tags are produced based on Filtered’s proprietary machine learning algorithm for search and indexing. To generate content and reason, we use a recent open Anthropic model, currently Claude Opus 4.6, running on self-hosted infrastructure. The model weights run on our own Amazon servers, meaning no data is transmitted to Anthropic or any third-party AI provider.

AIt doesn't appear that the output is structured in a way an LLM could consume directly. Is that correct?

Yes, fully LLM-consumable. Our Teams integration demo and the in-Filtered AI-assisted Search both take the extracted SCORM data and feed it directly to an LLM to generate answers in real time. You can also download the full dataset as a CSV and integrate it into any RAG architecture via vector embeddings, or, where infrastructure supports it, use it to fine-tune a custom model at the weight level.

The screenshot shows a "Transcript Similarity" section that isn't in the video. Is this part of the forthcoming functionality, alongside PPT and PDF support?

Transcript Similarity is part of the Q1 release. It is the analytical engine behind the module similarity assessment, identifying where content across your SCORM library meaningfully overlaps at the transcript level. This powers deduplication, content gap analysis, and library rationalisation at scale.

What taxonomy are you using for skills and topics? Would we have control over it, or is it a shared or fixed taxonomy?

The taxonomy is entirely custom to your organisation. You define your own skills, skill labels, proficiency levels, and job roles - aligned to your existing competency framework or built from scratch. You can upload your framework directly into Filtered, or we can generate a recommended scale as a starting point for you to refine.

Do you have questions? Contact success@filtered.com