10 Tips for building Self Service Datasets.

Going beyond a single use dataset to a re-usable asset.


Creating datasets for re-use adds another whole dimension to the challenge - instead of trying to solve one problem the dataset might be used to solve many problems, some of which might not yet be known. Consideration needs to be given to the longevity of the dataset and how it will be maintained.


Understand your data

Before you start building your dataset, make sure you understand the data you are working with. This includes the structure of the data, the relationships between different tables, and any data quality issues that need to be addressed.

Get to know your power users

It's a recurring challenge to bridge the gap between data experts and busines experts, the closer the two operate the higher the efficacy of a self-service dataset.

The power users are a great resource to assist with this task and are the primary user group for self-service data so can assist with shaping the offering.

Clean your data

Data quality is critical for accurate analysis. Make sure your data is clean, consistent, and free of errors. You can use Power Query to clean and transform your data.

Consider the structure of your dataset

A data model is a collection of tables, columns, and relationships that define how your data is organized. Spend time designing a data model that is easy to understand and use.

The Kimball approach is an excelled way of ensuring you produce a dataset that can be handle a variety of use-cases, even those that might yet to be thought of.

Use calculated columns

Calculated columns allow you to create new columns based on existing data. Use calculated columns to create new metrics or dimensions that are not available in your source data.

Use data categories

Data categories allow you to specify the type of data in each column. Use data categories to enable features like map visualizations and date hierarchies automatically for your report-authors, they will thank you for making their lives easier, trust us!

Set up version control

A dataset for self-service will be around for a long time and will need to evolve over time, these evolutions should be tracked and shared with report authors.

One approach is to embed a table into the dataset to track version history, given report authors the opportunity to include history into reports, passing the information on to report viewers.

Use data refresh

Data refresh ensures that your data is up-to-date. Datasets for self-service can be heavily utilised, with that in mind it it best to avoid DirectQuery unless the underlying source is able to handle the unpredictable loads, consider instead using import, which can be combined with incremental refresh to reduce the load on large datasets.

Document your dataset

Documenting your dataset is critical for ensuring that others can understand and use your work. Use the description field to provide context and explanations for your dataset. Our toolkit can help with this process and sharing documentation about your dataset to ensure people know what is available.

Let everyone know the self-serve dataset exists

Promoting Power BI datasets within your organization can be done in several ways.

One way is to use endorsement. Endorsement is a way to promote or certify content to make it easier for users to find and to let them know that it is a trustworthy source of data. Endorsed content is clearly labeled, both in Power BI and in other places where users look for Power BI content (such as Excel). It is also given priority in some searches, and you can sort it for in some lists.

There are two kinds of endorsement: promotion and certification.

Promotion enables users to highlight content that they think is valuable, worthwhile, and ready for others to use. It encourages the collaborative spread of content within the organization. Any content owner, or any member with write permissions on the workspace where the content is located, can simply promote the content when they think it’s good enough for sharing.

Certification means that the content meets the organization’s quality standards and can be regarded as reliable, authoritative content that is ready for use across the organization. Only a select group of reviewers (defined by the Power BI administrator) is authorized to certify content. Content owners who wish to see their content certified and are not authorized to certify it themselves need to follow their organization’s guidelines about getting their content certified (Certification is available only if a Power BI administrator has enabled and configured it for your organization).