Click here to get this post in PDF
8 April 1998
Acknowledgements
This document has been produced by editing together parts of documents by David Arctur, Grant Ruwoldt and all others who contributed to the email discussions on persistent OIDs in 1997.
Brief summary of the motivations for needing feature OIDs
It is asserted that the following practical issues can best be tackled by creating appropriate mechanisms based on persistent Feature OIDs
GeoInformation production and controlled Dissemination
- Historical tracking (data lineage)
- Versioning features and collections
- Collaborative work environments
- Distributed, replicated data
- Temporal movement (feature location editing)
- Identity testing
- Incremental updates
- Value-adding information
- Cataloging (and metadata for finding geodata)
These requirements all arise from the administrative and practical mechanisms required by teams of people creating, editing , assembling and publishing geodata over an extended period of time.
Controlling a collaborative work environment means managing multiple sets of different versions of the same data. This requires that the same feature be identifiable between versions, replicated or alternative copies when some of its attribute values have changed; including the case when the feature’s location and geometry have changed with time.
Incremental updates, or third-parties wanting to add value to existing data, must have some way of identifying only those features which need to be updated.
It is important to be able to trace where data has come from for understanding its reliability, and how it fits into the set being collaboratively managed, but also as a means of cataloging it.
Motivation as a technical aid to other requirements
We believe that feature OIDs are essential for these technical requirements, which have other user-lead motivations (not discussed further here).
Relationship marking
- Relationships will not be discussed here in any detail. Persistent OIDs are required for their underlying implementations, but whether the OIDs used should be a “visible” attribute value domain (as specified in AS 3.2.1.4) is another matter.
Compound features
- There are many uses to which persistent aggregations of features could be put. Aggregations can be thought of as an encapsulation or as just another type of relationship between features.
Issues arising to be addressed in an RFP
Once it is agreed that persistent Feature OIDs are necessary to meet the requirements/motivations (listed above), it is then expected that the following issues will then arise from their use which will need practical resolution:
- scope of domain for uniqueness (session, database, all geospatial db’s, etc.)
- temporal scope of uniqueness
- persistence (storage issue)
- permanence (across session, database, enterprise, etc.)
- handling copies (non-synchronized/replicated)
- how to find the feature from the ID
- how to find feature’s complete implementation
- registry vs. algorithm-based ID; URL/URN/CORBA trader service
- digital vs. real-world feature identity
- type and size of ID content; “well known feature ID” structure?
- feature relationship IDs: reifying relationships
Concepts from the Abstract Specification
The Abstract Specification (98-105r1a) defines and/or uses a number of terms which will be used in the discussion.
- Geospatial Information Community (GIC)
- Project World
- Project Schema
- Domain of reachability 3.2.1.2,
- Naming context (3.2.1.5)
- Feature Schema (3.2.1.1)
- Feature Types (2.10.1.1, 3.1.1, 3.2)
- Type substitutability (3.2.1.6)
- Attribute Schema (2.10.1.2)
- Role name (3.1.2.1)
- Feature Collection (2.11, 3.3)
We must use these concepts in the discussion as some of them are currently ill-defined and the discussion will make them clearer.
Feature OIDs’ Domain of Reachability
The abstract specification uses the phrase “domain of reachability” to mean the scope in time and space of a feature OID but does not define it. Assuming that we do in fact need feature OIDs, let us consider an extreme but not unreasonable scenario:
A project team all working on the same data must be in the same scope of feature OIDs over space and over time if they are to manage versions and updates properly. This will include perhaps several databases comprising several datasets each of which could be a tree of versions and clones. There will also be isolated files sent to remote offices and individual workspaces (dataset segments) being worked on concurrently.
A necessary part of the system is a way of associating lineage metadata with geodata ‘chunks’ (datasets) so that version check-in/out and updates are synchronized properly.
Such a team could be producing a number of different products from the same core data, some of these products will be converted to existing standard export formats. If the format does not provide the capability, another file will have to contain the lineage information so that corrections fed back from these copies can be properly integrated back into the central streams. This lineage information would also contain the metadata to be used for cataloging.
What else is common to this same collection of datasets? This collection of datasets must have the same (evolving) Feature Schema comprising feature type definitions, type substitutability statements, attribute schema and role names. (If the feature schema is stored in objects which themselves have feature OIDs then this definition become somewhat circular.)
Can we define different feature OID scopes (domains of reachability) that use the same Feature Schema? Yes. A Feature Schema designed for a particular purpose, e.g. Hungarian land ownership, could be part of a package sold to 5,000 Hungarian town offices each of which could manage their own data with their own set of feature OIDs.
A more useful definition for feature collections is now possible: they could be defined to be strict subsets of the scope of feature OIDs (domain of reachability) since the individual features would otherwise have incompatibly defined OIDs.
Conclusion
The connecting conceptual thread that joins datasets which share a feature OID scope is lineage and this thread has a physical existence: either as explicit metadata catalogs or as long transaction logs within a version-controlled GIS installation. The persistence or permanence of a feature OID is linked to the way in which the lineage metadata is stored.
Extreme case: minimum persistence
The feature OIDs could be different for each “session” of clients reading a shared dataset if the lineage metadata, but the lineage metadata would need to keep a complete record of all OIDs used (issued) during that session, and how they matched OIDs used in all preceding sessions back to some common, starting session. The starting session would be provided (published) by another organisation who would keep records of OID equivalences up to the point of publishing. Such a This clarifies an issue previously discussed: how long should a feature OID be? The answer is, that it needs to be just long enough to provide uniqueness within the scope of a particular chunk of lineage metadata; thereafter any correspondence with feature OIDs in other datasets can be resolved through the lineage metadata. The “Well Known” OID structure must be partly a dataset specific number (64 bit?) and partly a metadata lineage description in whatever formal grammar is available for metadata.
Extreme case: maximum persistence
The feature OIDs are persistent and permanent for the life of the dataset. This makes handling long transactions, replication and incremental update much easier.
Persistent OIDs raise the following minor problems:
- how are the OIDs are created (issued) to ensure uniqueness in distributed workgroups?
- in the case where it is impossible to ensure uniqueness on creation, how are two features with the same OID reassigned different OIDs?
- how do we handle the case where two features with different OIDs are discovered to be, or reclassified to be, the same feature?
All these can be handled by the same mechanism described above: retain in the lineage metadata tables of correspondences of OIDs. In this case, however, the amount of data is tiny compared with the minimum persistence case and becomes reasonable to implement. (In the case of version-controlled GISs, this metadata is stored in binary form within the software, rather than being put in a metadata file.)
Real world correspondence
Some GICs would like to define actual concrete OIDs to correspond to real world objects, e.g. gazetteers of streets and town names. What format would such a list take and how would a geographic dataset state that it conformed to such a naming scheme ? One answer is that the lineage metadata for a published dataset already has to state the scope and origin of its OIDs, so the only issue is to define a universal naming convention for standard naming schemes. The URL/URN research are W3C would seem appropriate for this.
Finding a feature from an OID
Given that a user has been given some feature data, including its OID, how can the user find that feature? The OID only has meaning within a scope, and that scope must therefore accompany the OID. The scope must match some lineage metadata, so an appropriate format for the scope information would be as a query designed to find datasets matching that metadata. (A query written in OGC catalog/metadata standard form.)
Finding a feature’s type
Several different datasets may share the same feature OID scope, and share the same Feature Schema, but differ in the detail of that Feature Schema. We must consider the possibility that a central data repository retains much more detailed classification information than is actually published in some products. Thus, given a feature OID (and lineage metadata) from a published dataset, a user may wish to find more sophisticated repositories covering the same OID. Finding merely compatible datasets reduces to the same metadata query described above, finding more sophisticated repositories implies that the lineage metadata should always contain some explicit information about the sophistication/simplification history of the feature schema.
Registry vs. algorithm-based ID
What does this mean ? Has it been covered already ?
This work performed at the European Commission Joint Research Centre.
This entry was posted on Wednesday, April 8th, 1998 at 4:00 pm and is filed under Internet, Software. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or from your own site.