The Clearly Podcast

Machine Learning Pt.2 - Are You Ready?

Summary

Our highly anticipated Part Two to our Season Two Grand Finale on Machine Learning!

This week we discuss how to decide if machine learning is the right choice for you; Do you have enough data? Are you prepared to be transparent about how you analyse customer data? Will your data create an AI monster?! 

Next week, in the last of our mini-series, we will look at how you can use Machine Learning inside Power BI and other tools.

Artificial Intelligence in a more general sense is also the topic of this year's Reith Lectures.  Just to prove how important and topical The Clearly Podcast really is.

If you already use Power BI, or are considering it, we strongly recommend you join your local Power BI user group here.

Transcript

Andy
Shailan Chudasama is back on the podcast today, making it better than ever.

Shailan
Was I missed?

Andy
Well, umm...

Shailan
Oh no.

Andy
Anyway, let's talk about the singularity and when it's going to take over humanity. Isn't that our topic?

Shailan
That's it.

Tom
Yeah, but didn't we conclude that we couldn't cover that due to our lack of real intelligence? We've got plenty of artificial intelligence, though.

Andy
Good point. So instead, we'll discuss, "Are you ready for machine learning?" We'll share our experiences and the considerations we guide our customers through when they decide to undertake machine learning projects. No Cyberdyne Systems or Terminators today, as much as I'd love to.

Shailan
This is the second part of our ML series. Next, we'll talk about ML and Power BI.

Andy
To kick things off, how do conversations with customers about machine learning usually start, Tom?

Tom
Customers often want to explore machine learning because it's a buzzword. Like cubes were 15 years ago, machine learning is now the trendy technology. The main things to consider are whether you have enough data to build a meaningful model and if that data is good enough. You need a big enough dataset to split for training, validation, and testing. Data quality is crucial; feeding bad data into a model leads to bad decisions.

Andy
What size of dataset are we talking about, and how do you ensure there's no bias when cleaning the data?

Tom
The size depends on the problem. Small datasets can create skews. Typically, you need at least tens of thousands of rows, but sometimes hundreds of thousands, depending on the features. Data quality is a process, and understanding what good data looks like is essential. Bias can be an issue, and you must ensure your data covers the full range of scenarios you want to test.

Andy
With hundreds of thousands of transactions, it's hard to introduce bias, right?

Tom
It’s possible, as seen with some algorithms like Twitter's picture recognition. Bias can still sneak in.

Shailan
We once ran a geospatial analysis with millions of records for predicting route movement in military field training. The more data you have, the better.

Tom
Absolutely. Use as much data as you can. The more you feed the model, the better it will be.

Andy
Does this mean ML isn't suitable for industries with fewer transactions, like petrochemicals?

Shailan
It depends on the use case. Smaller datasets can still be meaningful, but you must manage outliers. Larger datasets help reduce noise from outliers.

Tom
Also, for industries like oil and gas, you might need external data sets, like weather data, to make meaningful predictions. It’s about finding the right data sources.

Andy
So, what comes next after assessing the dataset size and quality?

Tom
Governance is crucial. You need to be transparent with your customers about how their data is used. Ethical concerns and potential backlash need to be managed carefully.

Andy
There's a lot of unethical tracking happening, like targeted advertising using shadow profiles. It’s important to anonymize personal data.

Tom
Exactly. Transparency and governance are key to maintaining trust.

Shailan
Often, customers see ML and AI as nice-to-haves. It's important to have a clear scenario and purpose for using ML. We once used ML for sentiment analysis on survey data, which provided new insights and was more efficient than manual processing.

Andy
Did the customer accept the ML results?

Shailan
Initially, no. But once we drilled down into the data, they accepted it. Benchmarking against previous methods helped in understanding the new insights.

Andy
People often try to open large datasets in Excel, which isn’t feasible. How do you handle that?

Shailan
People need to trust the ML engine to handle big datasets. Opening millions of records in Excel isn’t practical.

Tom
There's a challenge with ML and AI because people want to understand the decisions made by algorithms. But with large datasets, it’s not always possible. Setting expectations upfront is crucial.

Andy
What are the best practices for engaging in an ML project?

Tom

  1. Consider bias in training datasets.

  2. Ensure transparency with data usage.

  3. Establish governance over the ML process.

Shailan

  1. Clearly understand the use case.

  2. Benchmark against previous methods.

  3. Ensure the data is meaningful and accurate.

Andy
When should you steer someone away from ML and towards traditional analytics?

Tom
When dealing with smaller datasets or when human judgment is necessary for decision-making.

Andy
Anything else to add?

Tom
We'll be back, as we’ve made several Terminator references today.

Andy
Hasta la vista, baby.