EDML - Electronic Data Markup Language Specification
Revision History
Date: 19/1/98
Version: 0.6
Comments: first full draft;
for comment
© I A Galbraith and D W Galbraith 1998. EDML is a trademark of OMS Services Ltd. Permission is given for this document to be copied or distributed in whole or in part subject to explicit acknowledgement being given in every case to the authors as the source and copyright owners of the material.
1. INTRODUCTION
The Electronic Data Markup Language (EDML) is a metadata coding system for use in defining the NAME component of name/value pairs in an HTML form. EDML was initially conceived for Web forms-based EDI, but is a generic standard applicable to any application involving the transfer of data via Web forms or otherwise, and specifically including database applications. EDML exploits codes, coding structures, and message/file structures defined in existing standards - and allows conversion between EDML representations and such standards - but is not restricted to such use.
EDML is not intended as a competitor to XML, it can be used as a standalone, HTML-based, metadata standard or alongside XML. EDML is designed for use within existing versions of HTML - and so with existing Web browsers - and with data representations and structures which already exist; EDML does not require any new mark-up tags.
1.1 Scope
This specification defines the objectives, general structure, syntax and semantics of EDML, and the means by which EDML data structures may be defined for use within HTML forms. EDML is consistent with HTML version 2.0 and later, in particular as regards character set representation.
EDML subsets (‘EDML Types’) may be defined to ensure compatibility between EDML based forms and other data transfer standards (‘Source Standards’). Sample EDML Types have so far been to provide compatibility with UN/EDIFACT messages, HL7 and the GEDCOM standard for genealogical data; examples showing compatibility between EDML and these standards are included as Appendices. EDML Types for ANSI X12 and ISO 10303-21 are currently being defined.
EDML Types are defined via a conceptually similar mechanism to the use of DTDs within SGML; ie each EDML Type requires a formal definition of an EDML syntax which is a subset of the generic EDML syntax. An EDML Type which is compatible with an existing standard is based upon the code sets, code structures and message/file definitions of that standard.
EDML Types may be defined which are independent of with existing standards. In such cases, appropriate entity directories must be defined. Example applications here may include standards for web information of a particular type such as online classified adverts, job listings, news etc.. EDML implementation will allow creation of standardised web site and web database creation tools and cross-searching accross multiple distributed sites, including multilingual access and standardised thesauri for 'fuzzy' searching.
Although EDML is intended for use in a message-oriented environment (cf EDIFACT), it is strictly a standard relating to data structures, ie relating to the contents of a message (cf the contents of an envelope) rather than to the means of transport or encapsulation (including encryption) of the message. In the EDIFACT (and other EDI standards) context, a hybrid message structure is used, such that transport related information is dealt with within the same syntactic structure as the message contents. Thus an EDML message must also be able to handle message-transport related information. However, this does not mean that EDML is ipso facto concerned with message encapsulation or transportation: an EDML ‘message’ comprises a set of data, encapsulated within an appropriate message container; the encapsulated data may include transport related data for use if the EDML message is converted to, for example, EDIFACT.
It is envisaged that EDML will mainly be used within an HTTP/SSL (SHTTP) environment, but other transport mechanisms could be used, such as FTP, secure FTP or S/MIME. (NB EDML does not require a new EDI MIME type.)
We have used the term ‘EDML message’ throughout, but this is intended to encompass any usage of EDML, including an e-mail message, a Web form, or a data file.
1.2 Objectives
The principle objectives of EDML are:
· to allow unique identification of any item of data within an HTML Web form;
· to allow standardised interpretation of data;
· to allow unambiguous determination of the relationships between data items;
· to allow unambiguous mapping between EDML representations of data sets and other data set representations.
EDML defines the 'meaning' of information accessed via the web. By allowing each component of a message to be unique it allows the 'style' and format of a message to be treated entirely separately.
1.2.1 Unique Identification of Data Items
Since Web form data is
transferred to the server as NAME/VALUE pairs, it is convenient to treat the
NAME component as an identifier to the data contained within the VALUE
component; this is the basis of EDML. If the NAME component is unique
within a form, then it does not matter how the form is presented within the
browser: the server will always be able to determine the ‘meaning’ of the
data. For example, if NAME = "customer telephone number", then the server
software can ‘know’ that the corresponding VALUE must be a customer telephone
number, wherever it appears in the form. The NAME "customer telephone
number" may be built up from entity identifiers for "customer" and for
"telephone number", such that we could also have EDML constructs "customer name"
and "supplier telephone number" in the same Web form; the entity "telephone
number" would have the same definition. This means that the layout of a
Web form can be independent of how the entered data is subsequently
processed. The Web form designer need be concerned solely with the
creation of the form, while at the same time being able to link simple entities
together to form more complex entities, each of which can be uniquely - and
automatically - identified. The identifiers and definitions for the
various entities would be pre-defined in directories for the particular EDML
Type being used.
1.2.2 Standardised Interpretation of Data
If the meanings of the
NAME components are agreed between parties who may wish to use EDML for data
transfer, by use of the same EDML Type, then it does not matter how a form is
structured: the ‘sending’ and ‘receiving’ parties will have the same
understanding of each VALUE component of the transferred data, independently of
the location of that data item within the form. Because EDML allows
unambiguous interpretation of every data value in a form, EDML is very suitable
for transfer of data items destined for incorporation within or for updating
databases.
EDML goes beyond the capabilities of EDIFACT or ANSI X.12, where the interpretation of a data item depends on its position within a message.
1.2.3 Relationships between Data Items
In EDML a data item is
defined as a NAME/VALUE pair, ie a piece of data (which may be null) plus
an identifier for that data.
EDML deals with the following relationships between data items: repetition, nesting, ordering.
Repetition:
EDML allows explicit and unambiguous identification of
repeated occurrences of the same type of data (eg persons’ names, product codes)
which have their own values.
Nesting:
An entity may have a set of ‘lower level’ entities
associated with it. For example, a container may contain a number of
cartons, each of which may contain a number of boxes in each of which there may
be a number of product items; a city may contain a number of streets, which may
contain a number of buildings, which may be divided into different
apartments. Thus to identify data relating to an attribute of a unique
entity instance requires that the NAME indicate the specific instance of each
entity in a nested set of entities.
· For example, the entity identification for a specific box must include identification of the specific carton which contains the box and of the container which contains the carton; eg the fifth product in the third box in the fourth carton in the second container.
The EDML data item naming convention allows explicit incorporation of both nesting and occurrence information into the EDML NAME for each data item.
EDML imposes no intrinsic limit on the complexity of nesting, though particular EDML Types may impose limits.
Ordering:
With repeated data items, an occurrence number (or
‘repeat count’) allows sequential ordering of the data items. With
different data items where the ordering is significant, the EDML naming
convention allows use of numeric coding of individual data items to ensure
correct ordering of data. EDML normally handles repetition via a
sequential numbering scheme, which means that ordering is implicit in EDML
naming, but other schemes may be used.
1.2.4 Conversion between EDML and other Data Representations
With
suitable definitions of data item names within an EDML Type, data items may
easily be converted to other data transfer representations. In cases, such
as EDIFACT, where the interpretation of a data value depends the position of the
data item within the message, the EDML NAME for each data item provides
relationship information between data items to allow determination of the
equivalent EDIFACT position.
Normally, EDML Types will be based upon a ‘Source Standard’, allowing direct conversion of EDML messages to Source Standard messages without any need for mapping or reference to a message template or DTD (cf XML); all that is required for conversion is a relatively simple piece of software. Currently, EDML Types may be defined which allow direct conversion to UN/EDIFACT, ANSI X12, HL7 messages and to GEDCOM and ISO 10303 ‘EXPRESS’ language files. The great variations in syntax and semantics between these standards show the power of EDML as a generic metadata standard. EDML Types for EDIFACT, GEDCOM and HL7 are defined in the Appendices to this document.
1.3 Utilisation
EDML is used within HTML Web forms to generate messages as NAME/VALUE pair datasets. Note EDML is used strictly to facilitate construction of the contents of messages, but it does not deal with the ‘envelope’ which may be used to provide security, authentication, non-repudiation; such requirements are dealt with by other appropriate means (eg use of SSL for secure transfer, X509 certificates for authentication). Nor does EDML prescribe how forms are presented, except that the definitions of VALUES within EDML directories will typically specify formatting restrictions on how users enter data into an EDML-based form.
Though the generating form may be used by the addressee to view a received
message, this is not necessary. For example, suppose a user wishes to send
an order to a supplier. The order form used could have been generated by
either party, buyer or vendor, and may reside at a Web server belonging to
either of these parties or to a third party (ie a service provider). After
a buyer has retrieved an order form, completed it, and submitted it to the
server there are several possibilities, for example:
· The EDML NAME/VALUE pairs may be stored in a separate file, for processing by the vendor’s system.
· The EDML message may be converted into another EDI format, eg EDIFACT.
1.3.1 Generation of EDML Forms and Messages
An EDML form is simply
an HTML form where the NAME in each NAME/VALUE
pair conforms to an EDML Type
definition.
The EDML Type defined for EDIFACT (version 97A) incorporates all EDIFACT segment and data element definitions, and allows the creation of the complete set of EDIFACT messages in EDML format; nearly 150 message types. A set of EDML form templates for creation of EDIFACT compatible messages is being prepared. Each template incorporates the definitions of the EDIFACT message in terms of segment and segment group structure, including nesting structure, repeatability, and data formats (which affects the VALUE components of the form). The templates allow considerable latitude in form design, while ensuring that the NAME/VALUE pairs output via CGI from any form based upon the template may be translated to an EDIFACT compliant message.
Similar sets of codes and form templates are being prepared for ANSI X12, HL7 and for GEDCOM.
NOTE: EDML does not dictate how a particular structure in a Source Standard should be represented in EDML; there may be multiple possible representations. The EDML Types defined in Appendix A could be defined in other ways, with different definitions for Entity Groups.