Many companies have more than thousands of assets in their data estate. Imagine you are a project leader in this kind of company and want to find the data assets from different resources that are related to one project. It is slow and inefficient to look through all the resources and find them one by one. Data catalog in Purview can help you to deal with this problem. You can set up the glossary term by the project and find the data you need in a minute! Data catalog is used for data discovery and glossary management. It shows all the metadata from resources and the lineage between different data. Purview data catalog includes three parts: browse data asset, lineage, and business glossary.
Browse and search assets
When the key users have a group of target tables, and they don’t know the whole data model structure, they can use browse asset to find the data. Users can browse the assets by collection or datasets. It can list all the resources and the hierarchy of the collection. When searching for a certain table by collection, you can filter the assets by classification, glossary, label, and so on. When the collection has a massive amount of assets, this filter can be a fast and simple way. The result is sorted by relevance. After finding the table, you can click on the table to see more detailed information such as schema classification, lineage, and schema.
If we know the table is from which kind of source, we can also find it by source type. In this way, all the assets are listed in a hierarchy structure. After we click on one resource type, we can see the list of all the databases in this resource type. For example, if we choose a storage account, then all the containers all be shown on left panel, and the child asset for a container can be shown on the right side.
To speed up the process of finding a certain table, we can also directly use the search bar in data catalog. Purview can show the relevant result based on the key work users put in. The keyword can be the classification, glossary term, or data type of the assert.
Lineage is one of the most important features that purview provides. It can show the process between two data assets. The sources like data factory and Power BI can capture these processes for assets and provide the visualized track for data. After we scan Power BI or the pipeline is triggered in the data factory, this lineage can be found of the relevant assets of data and processes.
Inside the lineage for an asset, it also shows the schema for the asset on the left panel. We can click the column name to find out how this column is called and generated from the previous steps. In another word, lineage can not only track the data at the table level but also at the column level. For example, we can see from in the picture below that there are 4 columns in the customer_master.csv file which generate the column ‘costumer_id’ after the data flow activity in the data factory. If you want to check the other assets that are in this lineage, just click this asset and switch asst to see the detailed information.
Azure purview allows users to create business glossary terms to enrich their data. A glossary can categorize different business terms and help key users to understand more about what these terms mean in different situations and contexts. These terms can map to different resources, tables, and columns. The terms can be created in the hierarchy format which means the data estate can have a better-structured business glossary.
When we add a new term into the glossary, we can use the system default template or create a new one. Default template includes Name as a must, Definition, Data stewards, Data, experts, Parent, Acronym, Synonyms, Related terms, Resources as optional terms. In the custom term template, we can add attributes for a date, text, single or multiple choices according to need. The attribute can also be marked as required. After a term is created, the responsible worker can check the content and approve the glossary term.
All the glossary terms can be shown in a hierarchy view. It gives key users a good understanding of the glossary structure. Inside the term, you can find information like the parent term, definition, contact, and attribute info. By clicking on view assets, you can find all the assets which belong to this term.
In an organization, multiple terms can represent the same object from a different view, they can have relationships with each other. The same term can represent also more than one object. We can use Synonyms to connect other terms which have a similar definition and use Related terms to bridge the terms with a different definition, for example, groups from different departments.
Purview data catalog provides the information about the whole data estate assets and enables the key users to explore the valuable datasets. Users can use browse asset, lineage view, and business glossary to get a deep understanding of the data model and get in touch with the other department which also responses for part of the data model. Now, start the Purview trip to know your data estate easily and fast!