
The Clearly Podcast
How Many Tables are Enough?
Summary
The discussion centers on the temptation to import all data into a system during a data project, highlighting several key issues with this approach. Importing all data can create clutter, making it difficult to find relevant information, especially in systems like Dynamics 365 F&O that have thousands of tables. This complexity can lead to inefficiencies and challenges in data retrieval.
Using all available tables can slow down data refresh processes, particularly when clients require frequent updates. Incremental refreshes, while somewhat mitigating this, still involve complex calculations. Overloading users with too much data can overwhelm them, making it harder for them to create their own reports effectively.
Field naming conventions can add to the confusion, especially when fields are coded or not clearly named, making it challenging for untrained users to navigate. Additionally, error handling becomes more difficult with more data, as spotting and fixing errors in numerous fields is more time-consuming.
There's also an environmental consideration; storing and processing unnecessary data consumes significant resources, contributing to environmental concerns. Future data migrations become complicated when unnecessary data is stored, making the process more time-consuming and costly.
The top reasons to avoid importing all data include ensuring data relevance and focus, reducing implementation time and costs, managing processing costs, enhancing security, simplifying error handling, and mitigating environmental impact. The recommendation is to start with the end in mind, identifying necessary reports and working backward to determine essential data, thereby finding a balance and avoiding extremes in data importation.
You can download Power BI Desktop from here.
If you already use Power BI, or are considering it, we strongly recommend you join your local Power BI user group here.
Transcript
Andy: Today's show is all about how much data is enough. Tom's our expert on this one. Tom, can you explain what we mean by "Chuck it all in"?
Tom: Sure. The idea of "Chuck it all in" refers to the temptation to import all possible data into a system when starting a data project. This means pulling data from various sources and loading it into a data warehouse or data lake without being selective. The thought is that by having all the data, you won't need to gather more later for reporting purposes. However, this approach can lead to complications.
Andy: Devil's advocate here, but why not import everything? Storage is relatively cheap these days, and we have the expertise to build out our data lake or warehouse.
Tom: Let's take some examples. For instance, Dynamics 365 F&O has around 7000 tables out of the box. Most implementations don't use all these tables. For example, if you're not a manufacturing business, you won't use the production orders tables. Importing everything can create clutter, making it hard to find relevant data. With thousands of tables, you might only use a small fraction, leading to inefficiencies and complications in data retrieval.
Andy: And finding the right tables can be tricky, right?
Tom: Absolutely. Tables can have similar names, making it challenging to identify the correct one. This increases the time and effort needed to find relevant data. Some tables only support application functionality and don't hold any data you want to analyze.
Shailan: Another issue is refresh rates. If you import all tables, your data refresh process will be slower and more complex. Many clients now want hourly updates instead of traditional overnight runs. If you have thousands of tables, the refresh process can take a long time, especially with high transactional volumes.
Andy: That makes sense. What about incremental refreshes?
Shailan: Incremental refreshes can still be complex as they involve calculations to determine what data is new. It's better to import only the necessary tables and fields to keep things manageable.
Tom: Exactly. Only importing what you need helps users create their own reports more effectively. Too much data overwhelms users and makes it harder for them to find what they need.
Andy: There are also issues with field naming, right?
Tom: Yes, some applications have fields with codes instead of clear names, making it difficult for untrained users to understand the data. This complexity can deter users from effectively using the data.
Shailan: Another consideration is error handling. With fewer fields, it's easier to spot errors. If you import everything, identifying and fixing errors becomes much more challenging.
Andy: And there's an environmental impact as well, right?
Tom: Yes, storing and processing unnecessary data consumes resources. Data centers require energy and generate heat, contributing to environmental concerns. Being selective with data storage can help mitigate this impact.
Shailan: Also, when we conduct health checks on environments, we often find performance issues due to unnecessary data being stored and processed. This slows down data retrieval and processing, causing users to become frustrated.
Tom: Another issue is future migrations. If you want to move your data to a different system, having unnecessary data complicates the process. It's better to start with what you need and add more as required.
Andy: So, in summary, what are the top reasons to avoid importing all data?
Tom:
Focus and relevance: Curating data ensures you can find what you need.
Implementation time and cost: Importing everything is time-consuming and expensive.
Processing costs: Storage might be cheap, but processing large volumes of data is not.
Shailan: And also:
Security: More data means more layers to secure.
Error handling: Fewer fields make error detection easier.
Environmental impact: Storing unnecessary data consumes resources.
Andy: Great points. If someone insists on importing everything, we should guide them to start with the end in mind. Look at the reports they need and work backward to determine the necessary data.
Tom: Exactly. It's about finding a balance and avoiding the extremes of importing too much or too little data.
Andy: Thanks, everyone. Next week, we'll dive into discussing Fabric.
Tom & Shailan: Looking forward to it. See you next week.