Can You Use an LLM to Write an OCA Schema?

July 4, 2025

Can You Use an LLM to Write an OCA Schema?

Short answer: Not really — but also, kind of.

Why you can’t just use an LLM to write a schema

At first glance, writing an OCA (Overlays Capture Architecture) schema might seem simple. After all, it’s just JSON, and tools like ChatGPT or Microsoft Copilot are great at generating structured text. But when it comes to OCA schemas, large language models (LLMs) run into two big limitations:

LLMs struggle with exact syntax.
LLMs don’t truly “understand” JSON or schema structures — they generate text by predicting what comes next based on patterns. This means their output might look right but contain subtle errors like missing brackets, incorrect fields, or made-up syntax. Fixing these issues often requires manual correction.
LLMs can’t calculate digests.
OCA schemas use cryptographic digests — unique strings calculated from the exact contents of the schema. If the schema changes, even slightly, the digest must be recalculated. But LLMs can’t compute these digests — that requires separate code. Without the correct digests, an OCA schema isn’t valid.

Why you kind of can

That said, LLMs can still play a useful role in the schema-writing process.

With the right prompt, an LLM can generate a nearly-correct OCA JSON schema package. While it won’t include valid digests (and may need a few syntax tweaks to fix it enough to be recognized by the Semantic Engine), the Semantic Engine can import this “almost right” schema and help correct remaining errors. Once inside the Semantic Engine, it can calculate the proper digests and export a valid OCA schema package.

This approach is especially helpful if you already have schema information in a structured format — like an Excel table — and want to save time converting it into JSON.

What does a prompt look like?

Here’s an example of a prompt that works well with LLMs to create OCA schema packages. You may need to adjust it for your specific case, but if you’ve got structured schema data, it can be a great starting point for working with the Semantic Engine.

Webpage containing LLM prompt to be copied in two parts.

In short, while you can’t use an LLM to fully generate a valid OCA schema on its own, you can use it to speed up the process — as long as you’re ready to do a bit of post-processing using a tool such as JSON formatter to validate and fix syntax and use the Semantic Engine to fill in the gaps.

Written by Carly Huitema