EDML - Electronic Data Markup Language

EDML - Electronic Data Markup Language Specification

Revision History
Date:  19/1/98
Version:   0.6
Comments:  first full draft; for comment

© I A Galbraith and D W Galbraith 1998. EDML is a trademark of OMS Services Ltd. Permission is given for this document to be copied or distributed in whole or in part subject to explicit acknowledgement being given in every case to the authors as the source and copyright owners of the material.

1. INTRODUCTION

The Electronic Data Markup Language (EDML) is a metadata coding system for use in defining the NAME component of name/value pairs in an HTML form.  EDML was initially conceived for Web forms-based EDI, but is a generic standard applicable to any application involving the transfer of data via Web forms or otherwise, and specifically including database applications.  EDML exploits codes, coding structures, and message/file structures defined in existing standards - and allows conversion between EDML representations and such standards - but is not restricted to such use.

EDML is not intended as a competitor to XML, it can be used as a standalone, HTML-based, metadata standard or alongside XML. EDML is designed for use within existing versions of HTML - and so with existing Web browsers - and with data representations and structures which already exist; EDML does not require any new mark-up tags.

1.1 Scope

This specification defines the objectives, general structure, syntax and semantics of EDML, and the means by which EDML data structures may be defined for use within HTML forms.  EDML is consistent with HTML version 2.0 and later, in particular as regards character set representation.

EDML subsets (‘EDML Types’) may be defined to ensure compatibility between EDML based forms and other data transfer standards (‘Source Standards’).  Sample EDML Types have so far been to provide compatibility with UN/EDIFACT messages, HL7 and the GEDCOM standard for genealogical data; examples showing compatibility between EDML and these standards are included as Appendices.  EDML Types for ANSI X12 and ISO 10303-21 are currently being defined.

EDML Types are defined via a conceptually similar mechanism to the use of DTDs within SGML; ie each EDML Type requires a formal definition of an EDML syntax which is a subset of the generic EDML syntax.  An EDML Type which is compatible with an existing standard is based upon the code sets, code structures and message/file definitions of that standard.

EDML Types may be defined which are independent of with existing standards.  In such cases, appropriate entity directories must be defined. Example applications here may include standards for web  information of a particular type such as online classified adverts, job listings, news etc.. EDML implementation will allow creation of standardised web site and web database creation tools and cross-searching accross multiple distributed sites, including multilingual access and standardised thesauri for 'fuzzy' searching.

Although EDML is intended for use in a message-oriented environment (cf EDIFACT), it is strictly a standard relating to data structures, ie relating to the contents of a message (cf the contents of an envelope) rather than to the means of transport or encapsulation (including encryption) of the message.  In the EDIFACT (and other EDI standards) context, a hybrid message structure is used, such that transport related information is dealt with within the same syntactic structure as the message contents.  Thus an EDML message must also be able to handle message-transport related information.  However, this does not mean that EDML is ipso facto concerned with message encapsulation or transportation: an EDML ‘message’ comprises a set of data, encapsulated within an appropriate message container; the encapsulated data may include transport related data for use if the EDML message is converted to, for example, EDIFACT.

It is envisaged that EDML will mainly be used within an HTTP/SSL (SHTTP) environment, but other transport mechanisms could be used, such as FTP, secure FTP or S/MIME.  (NB EDML does not require a new EDI MIME type.)

We have used the term ‘EDML message’ throughout, but this is intended to encompass any usage of EDML, including an e-mail message, a Web form, or a data file.

1.2 Objectives

The principle objectives of EDML are:

· to allow unique identification of any item of data within an HTML Web form;

· to allow standardised interpretation of data;

· to allow unambiguous determination of the relationships between data items;

· to allow unambiguous mapping between EDML representations of data sets and other data set representations.

EDML defines the 'meaning' of information accessed via the web. By allowing each component of a message to be unique it allows the 'style' and format of a message to be treated entirely separately.

1.2.1 Unique Identification of Data Items
Since Web form data is transferred to the server as NAME/VALUE pairs, it is convenient to treat the NAME component as an identifier to the data contained within the VALUE component; this is the basis of EDML.  If the NAME component is unique within a form, then it does not matter how the form is presented within the browser: the server will always be able to determine the ‘meaning’ of the data.  For example, if NAME = "customer telephone number", then the server software can ‘know’ that the corresponding VALUE must be a customer telephone number, wherever it appears in the form.  The NAME "customer telephone number" may be built up from entity identifiers for "customer" and for "telephone number", such that we could also have EDML constructs "customer name" and "supplier telephone number" in the same Web form; the entity "telephone number" would have the same definition.  This means that the layout of a Web form can be independent of how the entered data is subsequently processed.  The Web form designer need be concerned solely with the creation of the form, while at the same time being able to link simple entities together to form more complex entities, each of which can be uniquely - and automatically - identified.  The identifiers and definitions for the various entities would be pre-defined in directories for the particular EDML Type being used.

1.2.2 Standardised Interpretation of Data
If the meanings of the NAME components are agreed between parties who may wish to use EDML for data transfer, by use of the same EDML Type, then it does not matter how a form is structured: the ‘sending’ and ‘receiving’ parties will have the same understanding of each VALUE component of the transferred data, independently of the location of that data item within the form.  Because EDML allows unambiguous interpretation of every data value in a form, EDML is very suitable for transfer of data items destined for incorporation within or for updating databases.

EDML goes beyond the capabilities of EDIFACT or ANSI X.12, where the interpretation of a data item depends on its position within a message.

1.2.3 Relationships between Data Items
In EDML a data item is defined as a NAME/VALUE pair, ie a piece of data  (which may be null) plus an identifier for that data.

EDML deals with the following relationships between data items: repetition, nesting, ordering.

Repetition:
EDML allows explicit and unambiguous identification of repeated occurrences of the same type of data (eg persons’ names, product codes) which have their own values.

Nesting:
An entity may have a set of ‘lower level’ entities associated with it.  For example, a container may contain a number of cartons, each of which may contain a number of boxes in each of which there may be a number of product items; a city may contain a number of streets, which may contain a number of buildings, which may be divided into different apartments.  Thus to identify data relating to an attribute of a unique entity instance requires that the NAME indicate the specific instance of each entity in a nested set of entities.

· For example, the entity identification for a specific box must include identification of the specific carton which contains the box and of the container which contains the carton; eg the fifth product in the third box in the fourth carton in the second container.

The EDML data item naming convention allows explicit incorporation of both nesting and occurrence information into the EDML NAME for each data item.

EDML imposes no intrinsic limit on the complexity of nesting, though particular EDML Types may impose limits.

Ordering:
With repeated data items, an occurrence number (or ‘repeat count’) allows sequential ordering of the data items.  With different data items where the ordering is significant, the EDML naming convention allows use of numeric coding of individual data items to ensure correct ordering of data.  EDML normally handles repetition via a sequential numbering scheme, which means that ordering is implicit in EDML naming, but other schemes may be used.

1.2.4 Conversion between EDML and other Data Representations
With suitable definitions of data item names within an EDML Type, data items may easily be converted to other data transfer representations.  In cases, such as EDIFACT, where the interpretation of a data value depends the position of the data item within the message, the EDML NAME for each data item provides relationship information between data items to allow determination of the equivalent EDIFACT position.

Normally, EDML Types will be based upon a ‘Source Standard’, allowing direct conversion of EDML messages to Source Standard messages without any need for mapping or reference to a message template or DTD (cf XML); all that is required for conversion is a relatively simple piece of software.  Currently, EDML Types may be defined which allow direct conversion to UN/EDIFACT, ANSI X12, HL7 messages and to GEDCOM and ISO 10303 ‘EXPRESS’ language files.  The great variations in syntax and semantics between these standards show the power of EDML as a generic metadata standard.  EDML Types for EDIFACT, GEDCOM and HL7 are defined in the Appendices to this document.

1.3 Utilisation

EDML is used within HTML Web forms to generate messages as NAME/VALUE pair datasets.  Note EDML is used strictly to facilitate construction of the contents of messages, but it does not deal with the ‘envelope’ which may be used to provide security, authentication, non-repudiation; such requirements are dealt with by other appropriate means (eg use of SSL for secure transfer, X509 certificates for authentication).  Nor does EDML prescribe how forms are presented, except that the definitions of VALUES within EDML directories will typically specify formatting restrictions on how users enter data into an EDML-based form.

Though the generating form may be used by the addressee to view a received message, this is not necessary.  For example, suppose a user wishes to send an order to a supplier.  The order form used could have been generated by either party, buyer or vendor, and may reside at a Web server belonging to either of these parties or to a third party (ie a service provider).  After a buyer has retrieved an order form, completed it, and submitted it to the server there are several possibilities, for example:
 

We anticipate that EDML will be used within a ‘third party’ communications structure, such that the EDML transfers are made via a service company which would provide various value-added services.

1.3.1 Generation of EDML Forms and Messages
An EDML form is simply an HTML form where the NAME in each NAME/VALUE
pair conforms to an EDML Type definition.

The EDML Type defined for EDIFACT (version 97A) incorporates all EDIFACT segment and data element definitions, and allows the creation of the complete set of EDIFACT messages in EDML format; nearly 150 message types.  A set of EDML form templates for creation of EDIFACT compatible messages is being prepared.  Each template incorporates the definitions of the EDIFACT message in terms of segment and segment group structure, including nesting structure, repeatability, and data formats (which affects the VALUE components of the form).  The templates allow considerable latitude in form design, while ensuring that the NAME/VALUE pairs output via CGI from any form based upon the template may be translated to an EDIFACT compliant message.

Similar sets of codes and form templates are being prepared for ANSI X12, HL7 and for GEDCOM.

NOTE: EDML does not dictate how a particular structure in a Source Standard should be represented in EDML; there may be multiple possible representations.  The EDML Types defined in Appendix A could be defined in other ways, with different definitions for Entity Groups.