Pathfinder: Building the Enterprise Data Map
Abstract
Pathfinder is a novel way of discovering, curating and disseminating heterogeneous metadata from multiple distributed systems at the scale of an entire enterprise. The management of metadata at scale has aspects that are similar to big data processing systems in regard to the need for handling high volumes of fast changing data derived from very varied sources. Pathfinder uses well known techniques from the data processing world such as schema-on-read, immutable logs, eventing, loose coupling etc. that are currently not widely used in metadata systems. At the core of Pathfinder is a Write-Ahead Log (WAL) for the entire set of metadata changes in an enterprise. This log contains a graph of an enterprise's data ecosystem that we call the Enterprise Data Map (EDM). The EDM is used by multiple independent processing systems that extract actionable information from the gathered metadata.