EDML - Electronic Data Markup Language Specification
Revision History
Date: 19/1/98
Version: 0.6
Comments: first full draft;
for comment
© I A Galbraith and D W Galbraith 1998. EDML is a trademark of OMS Services Ltd. Permission is given for this document to be copied or distributed in whole or in part subject to explicit acknowledgement being given in every case to the authors as the source and copyright owners of the material.
2. EDML Definitions
Syntactic definitions are provided (in Augmented BNF) in section 3, EDML Syntax.
2.1 Data Item
A Data Item is an HTML NAME/VALUE pair, where NAME
is EDML compatible, and NAME and VALUE are HTML compatible character strings
(CDATA).
2.2 EDML Type
An EDML Type is an application of EDML which defines
message templates, coding structures and codes for creating EDML Messages and
interpreting data transferred by these Messages. EDML Types may be based
upon existing Source Standards or may be independent of any existing
standard. An EDML Type which is based upon a Source Standard must allow
direct conversion of an EDML Message conforming to that EDML Type Definition to
a message in the Source Standard format via software, without any table look-up
being required.
2.3 EDML Type Definition
An EDML Type Definition includes the
specification of the Message syntax for that EDML Type, a set of Message
Templates, and a set of directories for all Entities included in that EDML Type.
The directories should include full text descriptions of every item which may be
included in an EDML Message. In the case of EDML Types produced from
Source Standards, these descriptions will normally be the same as (or similar
to) those used within the Source Standard documentation.
2.4 Entity
An EDML Entity is any part of a Message which may be
separately identified within an EDML Type. For example, within EDIFACT,
Groups, Segments, Composite Data Elements, Simple Data Elements, and Component
Data Elements are all ‘Entities’ in the EDML sense; in HL7, Segments, Fields,
Components, Subcomponents become entities in EDML; in GEDCOM, Records,
Substructures and Primitive Elements become EDML entities.
Entities may be repeated, eg a list of names and addresses would contain repeated occurrences of the entity specifying ‘name & address’, in which case each occurrence will have a unique identifier within EDML (see Instance Code).
Entities may be nested. Nested entities may be of the same or different types: eg the same entity type may be used to specify data relating to a container, crate, box, but these entities may be nested to express the nested relationship container - crate - box; different entity types might be used to specify data referring to the nested relationship chapter - paragraph - sentence.
2.5 Entity Code
An EDML Entity Code is the code which uniquely
identifies an Entity within an Entity Group and allows semantic interpretation
of the data associated with that entity (ie the Value in the Name/Value
pair. Every EDML Entity has an Entity Code. EDML Entity Codes are
usually numeric (except in the EDML Header and Trailer Records), to allow
determination of the correct ordering of Data Items, and to facilitate indexing
into directory lists of Entity definitions.
Since the same Entity Type may used in multiple locations within a message, an Entity Code itself is not necessarily unique within a message; uniqueness is conveyed by including the Entity Code with other identifying information in the Name component of the Name/Value pair (see Name). The same Entity Code may be used to refer to different Entities within a different Entity Group (qv). Conversely, the same Entity Type used within different Entity Groups may have different Entity Codes. In practice, this is not a problem, since an Entity Code is usually significant only in combination with other (higher level) Entity Codes.
2.6 Entity Delimiter; Entity Group Delimiter
The Entity Delimiter
is a character used to separate Entity IDs in an Entity Group ID, to indicate
nesting, and allow correct parsing of Entity Group IDs. The number of
Entity Delimiters in an Entity Group ID specifies the level of nesting of the
last Entity identified in that Entity Group with respect to the first Entity in
that Entity Group. The total number of Entity Delimiters in a Name
specifies the level of nesting of the last Entity identified in that Name (and
to which the Value is attached). The default Entity Delimiter is "-".
The Entity Group Delimiter is a character used to separate Entity Group IDs in an EDML Name, to allow correct parsing of the Name; the default Entity Group Delimiter is ".".
In contrast to the Entity Delimiter, the Entity Group Delimiter does not imply nesting. For example, the following EDML Name strings, each imply the same nesting level (level 1, numbering nesting levels from 0) for the final Entity identified by the string.
1-2-3
2-3.4-1.2.1
In the first example, reading the string from left to right, the highest level Entity identified by "1" has an Entity "2" nested within it; the Entity Code "2" signifies the second nested Entity. The string "1-2" is now the Entity Group ID for Entity "2" nested within "1". The Entity Code "3" indicates that this Entity is the third Entity which is nested within the Entity identified by "1-2". The string "1-2-3" is an Entity Group ID identifying the third Entity. Note that it is the complete string "1-2-3" which is the Entity identifier; the string "2-2-3" would identify a quite different Entity.
In the second example, the first Entity Group ID "2-3" identifies the third Entity nested within the Entity with the Entity Code "2". The "." Entity Group Delimiter following this string implies that the next Entity (with Entity Code "4") is not nested, ie that the "3" identifies an Entity Group, the "4" implies the fourth Entity within that Entity Group; "4-1" identifies the first Entity nested within the previous Entity, the period indicating that this Entity is actually an Entity Group. The next Entity has the code "2", and is followed by a period, indicating that this is another Entity Group, and the final "1" indicates the first Entity in this Group. This explanation may appear complex in the abstract context, but the EDML coding structure maps quite simply on to ‘real’ message structure: see, for example, Appendix A.3.1 EDML/GEDCOM.
Since EDML is designed for use with HTTP, to avoid the need in CGI scripts to interpret ‘escaped’ characters the default delimiters used in EDML are not ‘escaped’ in HTTP transfers. The option is provided to define different delimiters within an EDML message.
2.7 Entity Group
An EDML Entity Group is a set of Entities defined
within a single structure, each of which is nested within the higher level
Entity. A set of nested Entities within an Entity Group are identified by
an Entity Group ID (qv).
All Entities within an Entity Group except the last Entity must be of the same type. If the last Entity in the Group is of a different type, and defines a substructure, the Entities within that substructure must form another Entity Group. For example, in EDIFACT, Segment and Segment Groups may be nested within an EDML Entity Groups; the Data Elements within an EDIFACT Segment will form another Entity Group. Data Elements are not nested with a Segment: a Segment is comprised of Data Elements.
What constitutes an Entity Group is application dependent, and may be self-defining (ie already defined in a source standard) or may be defined on pragmatic grounds. For example, since an EDIFACT Segment Group may itself contain nested Groups, such an EDIFACT Segment Group would appropriately be defined as an Entity Group in EDML. The nested Entities specifying data relating to crate-container-box, in the example above, may be defined as an Entity Group; equally the nested entities specifying data relating to container-crate may be considered an Entity Group. Considering the relationship container->crate->crate_dimensions, container-crate would be defined as an Entity Group linked to a crate_dimensions Entity Group; although crate is nested within container, crate_dimensions is not nested within crate, but linked directly to it.
Entity Groups may repeat and may themselves be nested.
The same Entity Group definition may be used in multiple locations within an EDML message; each will be given uniqueness by the EDML naming convention (eg an Entity Group nested within another Entity Group is intrinsically different from the ‘same’ Entity Group defined elsewhere in a message.
2.8 Entity Group ID
An EDML Entity Group ID is effectively the
concatenation of the Entity IDs of each nested Entity in the Group. An
Entity Group ID is unique within a higher level Entity within which it is
nested; it need not be unique within a message.
An Entity Group ID serves to identify sets of data items.
2.9 Entity ID
An EDML Entity ID is a string which uniquely
identifies an occurrence (or instance) of an Entity where there are multiple
appearances of an Entity at a particular point (ie at the same nesting level) in
an EDML message: eg repeated order lines in a purchase order. The Entity
ID is composed of the Entity Code plus an Instance Code (qv), and optionally an
Entity Tag (qv).
An Entity ID is not necessarily unique within a message; uniqueness is conveyed by the concatenation with the Entity IDs of the higher level Entities in a nested structure, to form the Name of the Data Item.
2.10 Entity Tag
An EDML Entity Tag is a string which may be
included with an Entity ID to allow conversion of an EDML Type message to a
Source Standard message without table look-up being required. The Tag is a
name used within the Source Standard as an identifier for the Entity Type: eg in
EDIFACT each EDIFACT Segment type has a three-alpha tag.
An Entity Tag is not strictly required for identification purposes in EDML - it is an artefact of the Source Standard - but should be included unless users are certain that the EDML messages will not require to be converted to Source Standard format. (Although if table look-up were used in the conversion process the Entity Tag need not appear, EDML has been specifically designed to avoid table look-up.)
2.11 Entity Type
An EDML Entity Type is a particular type of
Entity defined in an EDML Type or a Source Standard: eg in EDML/EDIFACT, Entity
Types include Segment Group, Segment, Composite Data Element, etc; in EDML/HL7
Entity Types include Segment Group, Segment, Field, etc.
The term Entity Type is used primarily for clarity and is essentially interchangeable with the term Entity. There is no essential grammatical difference between the two terms.
2.12 Header
An EDML Header specifies data necessary for
interpretation of a Message, including specification of the EDML Type. The
Header consists of a set of Data Items whose Names all commence with the string
"EDML.HEAD".
2.13 Instances
Repeated occurrences, Instances, of an Entity are
indicated by an Instance Code - normally a sequential number, starting from
1. The Instance Code is separated from the Entity Code by the Instance
Delimiter (default "@").
If an Entity occurs only once at any point within a message sub-structure, the Instance Code may be left out. If an Entity Tag has to be included in an Entity ID, it is separated from the Instance Code by the Instance Delimiter; thus if the Instance Code is not used, the Entity Tag must be separated from the Entity Code by a pair of Instance Delimiters.
Example: 1@2-3
The Entity ID "3" is nested within the second occurrence of the Entity with the Entity Code "1", ie the Entity with the Entity ID "1@2".
Example: 2-3@4.1
The Entity ID "1 is the first component of the fourth occurrence of the substructure (must be a substructure because of the period delimiter) Entity with the Entity Code "3" ( and so Entity ID "3@4") is nested within Entity with the Entity Code "2"; there is only one occurrence of the latter Entity.
2.14 Instance Delimiter
See Instances.
2.15 Message
An EDML Message is any set of Data Items which
conform to the EDML Message syntax. This set of Data Items may be
encapsulated within an HTML Web form, within a data file, or within an email
message. The encapsulation is outside EDML.
Generally, in an EDML Type, a Message will consist of a set of Data Items comprising a Header, one or more sets of Data Items comprising the Body, and a set of Data Items comprising a Trailer.
2.16 Message Body
An EDML Message Body comprises all Data Items in
a Message except Header and Trailer Data Items.
2.17 Message Component
An EDML Message Component is a Data Item,
an Entity or an Entity Group within an EDML message.
2.18 Message Template
An EDML Message Template is a set of
information which allows the creation of a Web form for that Message type.
This information contains:
· the identifying codes used to construct the
Names of all possible Data Items which may be included in the Message,
·
whether a Data Item is mandatory or conditional/optional,
· the maximum
allowable repeats of a Data Item,
· the format of the data which may be
input via the Value component of the Data Item,
· nesting relationships of
Data Items to other Data Items (nesting relationships),
· whether a Data
Item is coded, and the address of the code list (which may be displayed
automatically to the user of a form, so that the selected item may be checked),
· default text descriptions for each Data Item (which may be displayed
within the form; it may be changed within the actual form).
The identifying codes are generally index values into EDML directories derived from Source Standard directories, and which define the various Entities in Source Standard terms.
2.19 Name
An EDML Name provides a unique identifier for a Data
Item in an EDML Message. It specifies the relationship between that Data
Item and any higher level Data Items within which it is nested or to which it
relates. Multiple instances of the same Entity are separately and
unambiguously identified in the Name structure.
The Name consists of effectively of the concatenation of the Entity Codes for each linked Entity in a set of related Entities, plus Instance Codes.
In an EDML Name every Entity Code after the first refers back to the previous Entity Code, so that the Name defines the complete structural context of the Value datum. For example, consider a line in an address. In itself, this has little meaning until we establish the full context: the line may refer to street; the street to a complete address; the complete address to an individual; the individual to a buyer; the buyer to a purchase order message. An address line in another context has no relation to this address line except that it may share the same formatting rules. Thus simply to identify a datum as being a constituent of an ‘address’ is of little or no value; it must be unambiguous what this address relates; EDML Names provide just such unambiguous relationship information.
2.20 Record
An EDML Record is a set of Data Items. Since
EDML syntax deals only with Data Items, a Record is a semantic - not a syntactic
- structure in EDML. Thus in EDML messages the presence of Records must be
inferred from the Data Item Names.
An EDML Message (qv) is also defined as a set of Data Items, but a Message is delimited in EDML by the method used to transfer the Data Items. In the case where Data Items are transferred by Web forms (the usual manner), a Message is delimited by the number of Web pages which the form consists of.
2.21 Reduced Name
In EDML, a Name typically contains Instance
Codes and Entity Tags. The Instance Codes serve only to distinguish
multiple occurrences of the same Entity. Entity Tags are used only to
allow direct conversion of EDML messages to Source Standard format; they are
otherwise redundant in EDML. A Reduced Name is an EDML Name with all
Instance Codes and Entity Tags removed, together with associated Instance
Delimiters. A Reduced Name is a unique identifier for a specific type of data.
For example, in EDML/GEDCOM, the Reduced Name "2-3" always identifies a data
item specifying the sex of an individual, within a GEDCOM
INDIVIDUAL_RECORD. The Name "2@1-3@@SEX" will actually appear, specifying
that the sex is of the individual in the first ("@1") INDIVIDUAL_RECORD; "@@SEX"
appears in order to allow direct conversion to GEDCOM format.
2.22 Source Standard
A Source Standard is an existing standard
which is used to create an EDML Type. An EDML Type based upon a Source
Standard makes use of message types, coding structures and codes defined in that
standard. Source Standards from which EDML Types can be produced include
UN/EDIFACT, ANSI X12, HL7, GEDCOM, ISO 10303.
2.23 Trailer
The Trailer of an EDML Message is a generic Data Item
used to indicate the end of any EDML Message. It consists of a single Data
Item with the Name "EDML.END".