Statement Mapping
By default, we perform steps to help map Schema-related data into their conventional types. Given the wide range of data sources and generation methods, these mappings help establish a minimal baseline to make other, application-specific assumptions. The following steps are performed on a best-effort basis, and any unrecognized syntax, types, and values will be ignored.
- Canonicalize IRIs — e.g.
http://schema.org/
intohttps://schema.org/
- Drop Empty Literals — e.g. empty string dropped
- Map Enumerations — e.g.
InStock
string intoschema:InStock
IRI - Cast Data Types — e.g.
2020-04-08
string intoschema:Date
datatype
Once processed, the resulting graph may be further analyzed for errors through the validation step. These two steps try to follow Schema recommendations fairly strictly to help publish compatible data for consumers. Additional transformations are available through the normalization step which makes further corrections and restructures some data into more consistent structures for consumers.
Steps
Canonicalize IRIs
The Schema.org ontology is often used with one of several different base IRIs. It is more important to use one consistently through a data platform, but we have chosen to use and recommend the secure, apex form. Predicates, object IRIs, and literal datatypes that have the following prefixes will be converted to https://schema.org/
.
Alternate Base IRI |
---|
http://schema.org/ |
http://www.schema.org/ |
https://www.schema.org/ |
Extension domains, such as pending.schema.org
, are currently ignored.
Drop Empty Literals
By convention, empty literal statements will be dropped from the graph if:
- The datatype is Schema-related or an XSD primitive; and
- The lexical form is empty after collapsing any white space.
The following table shows some examples of how objects will be evaluated.
Input | Behavior | Note |
---|---|---|
"" | Dropped | |
" \t " | Dropped | Only white space |
""^^schema:Text | Dropped | |
""^^example:Type | No change | Unrelated datatype |
"0" | No change | Not empty |
<> | No change | IRI, non-literal |
example: | No change | Prefixed name IRI, non-literal |
This has the notable effect that a Schema-related resource cannot have an empty-valued property. However, this helps handle the common practice of publishers including empty property values simply because it's the easier method to generate templated, structured data.
Map Enumerations
All properties which support an enumeration for the statement object will be evaluated to prefer its canonical IRI form. The following table shows examples of evaluating the object of a schema:availability
property.
Input | Output | Note |
---|---|---|
"InStock" | schema:InStock | |
"https://schema.org/InStock" | schema:InStock | String to IRI |
"http://www.schema.org/InStock" | schema:InStock | String with Alternate Base IRI to IRI |
"ExampleUnknown" | "ExampleUnknown" | Unknown enumeration path, no change |
"http://example.com/InStock" | "http://example.com/InStock" | Unknown enumeration, no change |
schema:InStock | schema:InStock | IRI, no change |
schema:False | schema:False | IRI, no change |
schema:ExampleUnknown | schema:ExampleUnknown | IRI, no change |
As a reminder, this step does not generate any errors for unrecognized enumeration values — this may be handled by the Validator.
Cast Data Types
All objects of Schema properties will be evaluated against their expected data types and updated to their Schema-specific data type, as appropriate.