Master Data Management (MDM)
Introduction
record linkage or record linkage (also known as data matching, entity resolution, and many other terms) is the task of finding "Record (database)" records in a set of data that refer to the same entity in different data sources (e.g., data files, books, websites, and databases). Record linking is necessary when joining together different data sets based on entities that may or may not share a common identifier (e.g., database key, URI, national identification number), which may be due to differences in record form, storage location, or retention style or preference. A data set that has undergone RL-oriented reconciliation can be called cross-linked. Record linking is also known as data linking in many jurisdictions, but the two are the same process.
Naming convention
"Record linkage" is the term used by statisticians, epidemiologists, and historians, among others, to describe the process of joining records from one data source to another that describes the same entity. However, many other terms are used for this process. Unfortunately, this profusion of terminology has led to few cross-references between these research communities.[1][2].
Computer scientists often refer to this as "data matching" or the "object identity problem." Commercial mail and database applications call this "merge/purge processing" or "list scrubbing." Other names used to describe the same concept are: "coreference/entity/identity/name/record resolution", "entity disambiguation/linking"), "fuzzy matching", "duplicate detection", "deduplication", "record matching", "(reference) reconciliation", "object identification", "data/information integration", and "conflagration".
Although they share similar names, record linking and Linked Data are two different approaches to data processing and structuring. Although both involve identifying matching entities in different data sets, record linkage standardly equates “entities” with human individuals; Linked Data is instead based on the possibility of interconnecting any web resource between data sets, using a correspondingly broader concept of identifier, namely a URI.