Global recognition of SAIDs
Content-Derived Identifiers in the Semantic Engine
Built into the Semantic Engine is a particular kind of identifier called a SAID (Self-Addressing Identifier). Unlike traditional identifiers that are assigned to a resource, SAIDs are derived directly from the content itself. They are computed—typically using cryptographic hashing—so the identifier is intrinsically bound to the exact bytes of the resource it represents.
These identifiers are not designed to be human-friendly. They are long, opaque strings. But that trade-off enables something more important for research and data systems: verification. If a resource is referenced by a SAID, you can independently confirm that what you have is exactly what was intended. If the content changes, the identifier no longer matches. In that sense, SAIDs are tamper-evident and self-authenticating.
Why Identifier Types Matter in Standards
Many specifications—particularly in research data and interoperability frameworks—depend on identifiers and are explicit about what types are allowed. This ensures consistency, portability, and long-term usability across systems.
One commonly accepted class is the URN (Uniform Resource Name). Because URNs are standardized and designed for persistence, they are frequently permitted in specifications where long-lived, location-independent identifiers are required.
IANA and Global Recognition
The Internet Assigned Numbers Authority (IANA) is responsible for coordinating key elements of the internet’s infrastructure, including identifier namespaces. When IANA registers a namespace, it becomes part of the globally recognized technical foundation used across systems and standards.
SAIDs have now been formally registered with IANA as a new URN namespace: urn:said. This elevates them from an ecosystem-specific mechanism to a globally recognized identifier scheme.
URNs vs URLs
A URN identifies what something is, while a URL (Uniform Resource Locator) identifies where something is located.
URNs are not inherently resolvable—you cannot simply use one to retrieve a resource without additional infrastructure. Instead, they are designed to be persistent names that systems can interpret.
SAIDs fit naturally into this model but add an important property: because they are content-derived, they can be independently verified. Anyone can build a resolver that retrieves content and checks whether it matches the SAID. Trust does not depend on the resolver—it depends on the content itself.
Implications for Research Data Infrastructure
The registration of urn:said means that SAIDs can now be used anywhere URNs are accepted. This has direct implications for research data standards and infrastructure.
The Semantic Engine already uses SAIDs to generate secure, tamper-evident identifiers. With official URN recognition, those identifiers can now integrate cleanly into broader ecosystems—supporting interoperability across repositories, metadata standards, and distributed workflows.
This represents a shift in how identifiers function within research systems. Instead of relying solely on assigned names backed by registries, systems can incorporate identifiers that are self-verifying by design. For research data—where integrity, provenance, and reproducibility are central concerns—this provides a stronger and more flexible foundation.
– Written by Carly Huitema
