Pattern Pattern
PODCAST

Data and AI with David Colls

coc-ep-77-og

One of the ways organisations are looking to unlock more avenues for business and operational growth is through data. Couple it with AI, the possibilities are almost endless. 

In this round of cocktails, ThoughtWorks Australia’s Director of Data and AI Practice David Colls talks to us about data and AI capabilities and how they can be tapped for product development and business strategies. He also shares the ethical and societal implications of the growth of big data and AI and what our responsibilities are in the process.

Transcript

David Brown

Welcome to Episode 77 of the Coding Over Cocktails podcast, my name is David Brown.

Our guest for today is an innovative technology leader, with over 20 years’ experience leading the strategic and technical delivery of data and AI, digital strategy and change solutions. He combines diverse experience delivering complex technology solutions with his passion for customer outcomes to develop high performing teams capable of solving our clients’ complex problems. 

Currently, he works as a Director of Data and AI Practice at Thoughtworks Australia, and oversees all facets of developing data and AI capabilities service offerings.

Joining us for a round of cocktails is David Colls! Hi David, great to have you on the show.

David Colls

Hi David, thank you for having me! Thank you for the intro.

David Brown

Absolutely. Our pleasure. Can you run us through your role at ThoughtWorks to get us started?

David Colls

Yeah, sure. You provided a good overview there and while I described “overseeing all aspects of data and AI capability,” I think that as our team has grown, my role has become more about planting seeds and pruning what germinates from them. We're working with the specialist in our team across the range of services we provide. 

So, if I think about the dimensions of the role, it's partly… One of the dimensions is the different services that we provide from data strategy through data engineering and platform development, data governance and data mesh intelligent products and AI solutions. 

And so, one dimension of the role is being across what we do in those different services and how those services complement one another, and also complement the broader offerings of ThoughtWorks as a digital delivery and advisory consultancy.

David Brown

We’ve had an interesting number of guests from ThoughtWorks and you know, when you say specialists, you guys really are and you tend to be thought leaders in a number of spaces. I'd be interested to get your perspective. You're another fellow Australian based in Melbourne in Australia. I'd be interested in your perspective of the Australian landscape versus the international landscape, how we're doing here as Australian companies in terms of embracing digital transformation initiatives and data and AI.

David Colls

That’s a great question and it's one that comes up as we collaborate globally within ThoughtWorks around our data and AI services. What we do find is that the market in Australia is a bit different in terms of the composition of our economy and industries compared to, say, the global north markets like North America and Europe. But we find as much innovation in Australia as in those other markets. 

And so, you know, we see a lot of innovation in our digital native homegrown exports – Atlassian, MYOB, Xero, REA Group are organisations that we work with.We see that those organisations are leading the way in data engineering and ML practises as much as other organisations globally. However, with a different industry focus here.

David Brown

Interesting. That's good to hear. We'll talk about data capabilities within organisations. It's a phrase that you've used yourself. What are you encompassing in data capabilities? It sounds very broad.

David Colls 

But in our experience, this has to be driven by also the effective application of data to solve business problems. And that's not just a technology problem.

It is very broad indeed. And I think it comes down to probably effective use of data to achieve an organisation's objective. Being able to do that efficiently as well at scale so as to make it economically and and also to be able to do that safely. So, to understand safety aspects of applications, but also safety aspects of handling data.

And so, I think that all of those come together to create an organisational capability to work with data. And if you're looking at developing that capability, then you're looking at developing those axes together. Often there's a lot of focus on the tooling, the data platform tooling that can make data use sufficient at scale. But in our experience, this has to be driven by also the effective application of data to solve business problems. And that's not just a technology problem. That's people and process and organisational design and culture problems to address as well.

David Brown

We’ve talked about that extensively, the cultural aspect of these initiatives as well. And I know you like to embrace a lot of machine learning and AI in your data initiatives as well, data capabilities. We are going to talk about that, but before we get into that, I'd like to talk to you about data mesh. You have recently been talking a lot about data mesh. In fact, we've talked on this podcast – about data mesh. And I've had some leading experts on what a data mesh is, so we don't need to go too much into what a data mesh is. But I would be interested in getting your perspective about how it's being used. What are some of the practical applications of data mesh?

David Colls

When it comes to bringing all of those capabilities to bear, to deliver new data initiatives in short time frames, we're finding organisations limited by their existing infrastructure and that's where they're looking towards data mesh to support those use cases.

What we're finding towards that description of organisational data capability is that being able to respond, build effective data solutions but also in a timely manner as a key driver for organisations looking to pick up data mesh. So often, they have large data sources internally, they have some technology to manage it, but when it comes to bringing all of those capabilities to bear, to deliver new data initiatives in short time frames, we're finding organisations limited by their existing infrastructure and that's where they're looking towards data mesh to support those use cases. 

David Brown

When you say they did by the infrastructure, what sort of limitations are you referring to?

David Colls

In the digital world, I think we've seen over the last decade the problem of rapidly bringing digital products to market more or less solved the problem. Now, you know, we have a range of approaches to identify customer needs to rapidly test and refine products and, we have continuous delivery practices and architectures that support rapid evolution and scale. But in the data world, we're often coupled to more legacy, centralised data infrastructure that makes it hard for teams to autonomously make the small changes they need to with short feedback cycles to deliver data products or digital products driven by data features.

David Brown

So, I'd like you to, if you could give me a practical example. For example you've talked about [how] the airline industry is where they have these enormous data silos and some of the practical benefits of retaining those close to the owners of those data silos and how the data mesh can overcome some of those challenges. Can you run me through some of the practical implementation?

David Colls

So often, with a centralised data program or data infrastructure, there are multiple hops for a team to go through to be able to deliver a piece of functionality. And each of those brings some delay. And so the first hop might even be identifying who owns the data internally and who can authorise its use for another purpose. And you know, that can take weeks or months which is the type of delay that can severely compromise product development activities. 

And then, even when the data is identified, there might be an additional process of building the pipelines to get the data from A to B. And so, you know, that can add more delay into the process. Then when the data is available in an environment to build a product of it, then often there's another team involved to actually build algorithms and serve the data into a consuming application. And so in that paradigm, we see delays at every step of the way and the silos in the airline industry would be an example of that. 

And it's a gradual transition to bring those data sets to bring ownership and autonomy over how those are published closer to the source, to be able to rapidly validate the consumers are getting value from the data and to be able to do that in a decentralised way to build governance into data products and into the infrastructure as well. 

And so these are things like it doesn't publish the information that consumers need to use. Can we see its consumption patterns? Does it conform to standards within the organisation and inevitably decentralised? They'll be competing objectives as well. So how do we work through those competing objectives and resolve those?

David Brown

And is it data mesh versus data warehouse? Or is it horses for courses because both architectures solve different problems?

David Colls

Yeah, I think it is horses for courses to an extent. I think that we see that like, a data warehouse might become a part of a data mesh, so it might be a solution for a particular domain where there is a need for that type of data modelling that a data warehouse supports. But it also depends a bit on the size of the organisation as well, so smaller organisations and organisations that are building their digital maturity might find that the overhead of running a data mesh doesn't provide the additional organisational agility that they're looking for.

But I think in terms of that digital maturity, I think the tooling will improve over time and we'll see it becoming easier and easier to stand up.

David Brown

It reminds me of the arguments for and against monolithic architectures versus microservices. When you're starting out, a monolithic architecture is actually easier and a perfectly viable architecture. But when you get to scale, perhaps a microservices architecture can offer you the benefits when you're at that stage.

David Colls

Yeah, I think there's a lot of parallels to it.

David Brown

Well, let's talk about AI. So I know you are very interested in AI in the use of data and creating product opportunities out of AI. Now it's traditionally been used for analysing historical data to improve the customer experience. But in the last few years, you've been promoting this concept of using AI for business planning or product development. 

These areas have normally been the domain of a business manager or someone with expertise or accumulated knowledge with regards to that product knowledge or business knowledge to make those strategic decisions. So they're using their accumulated knowledge plus some intuition about the market and the customer expectations. How is AI changing this approach?

David Colls

I think the key to using them is not to outsource all the design and decision into AIs, but to do it in a collaborative fashion with the human expertise.

Yeah. It's a great area to address. Especially currently, as you know, business conditions are so dynamic. That question of business strategy and product development, that has been addressed differently in different industries over time. And prior to ThoughtWorks, I've worked in the automotive design space where with a new vehicle design, it was possible to generate thousands of variants to simulate their handling of their aerodynamics and the crash performance all inside a computer, all in a matter of days to get very early feedback on product performance and what good product designs might look like. 

And so you know, that sort of virtual prototyping approach has been around in some industries for decades. But what we're seeing with now more powerful, more flexible AI tools, greater ecosystem of open source solutions and data means that that sort of approach to product development and business strategy as well can be applied in a much wider range of circumstances that almost any business can take advantage of. 

And so we've seen it used in designing call centre systems, in optimising airport operations, even in developing new blends of whiskey, and sustainable supply chains as well. So these are all opportunities to apply AI techniques in product development of a business strategy. And I think the key to using them is not to outsource all the design and decision into AIs, but to do it in a collaborative fashion with the human expertise. So, we don't lose the human expertise. We just formalise it and what we see is using AI techniques allows us to explore more designs or more possible features that we wouldn't have considered otherwise. It also gives us that fast feedback loop potentially, as in the vehicle design than we would have otherwise.

David Brown

So, obviously there's going to be lots of inputs of data to this process. So, if you're designing a whiskey using AI, what would be the inputs in that process?

David Colls

So, basically all the decisions that go into making a product are in scope for this type of approach. But as you work through iteratively, you'll pick thin slices and identify the areas where you can have the most impact on the decision making process. But for a whiskey example, it could be sourcing all the raw ingredients where those ingredients are sourced from. It can be all of the brewing and distillation and ageing processes as well. The time, the process parameters, pressure and temperature and everything that goes into actually producing the final product can have an impact on its performance in the market and the performance in this case comes down to taste. 

And so, we were able to establish a relationship between some of these design parameters and the expected taste, all the dimensions of the tasting notes of the whiskey. And so then, provide a model which could generate new inputs and even to the degree that choosing between single mark whiskeys to create a blend with appropriate tasting properties and then provide that tooling to the master distillers who are ultimately responsible for producing a product that they think the market is going to enjoy. 

So, all of those inputs can be can be brought together and presented to the human experts responsible for these product design decisions.

David Brown

It's a really interesting use case. I'm interested to hear something about that call centre example you're talking about as well, but I'm just interested in developing whiskey using AI. And you mentioned a number of inputs, such as ingredients and distillery methods and presumably customer preference in terms of taste. What kind of algorithm are you using for that? You're sourcing one wishes off the shelf. Are you writing one from scratch? Where do you start?

David Colls

In this case, a general set of techniques. A generative adversarial network was applied in this case but yeah, depending on the products and depending on how you want to assess it, there are a range of different techniques you might use. For the call centre, for example, that was using an agent based simulation. So each caller was modelled as an individual agent who had some patience to wait on hold, who had some reason that they were calling and who had some tolerance for being transferred from agent to agent until the query was resolved and as a result, generate NPS scores for that interaction. 

And similarly, on the organisational side, there were agents representing the real contact centre. Agents who had similar behaviours that could be simulated add to that actual model of the planned production systems that would determine which calls were sent to which queues service by which agents. And then you have, in this case, actually a custom simulation model based on custom agent behaviours and custom routing systems and then that's a model that can be run with different parameters. We can establish the trade off between cost to service and customer satisfaction with various designs.

David Brown

Well that customer service example, the call centre example, you're talking about a lot of historical data being used to create those models to train a machine learning algorithm. You typically have a good data set of historical data. Now, some organisations won't have that data. Many will have historical data which they can feed into this algorithm. What are the options if you don't have an existing data set, is AI still an option for your business?

David Colls

Yes, I would say it absolutely is. And we really encourage people not to be constrained by the existence of historical data sets. What we do find is that even having a historical data set is not a guarantee of success. So, there may be missing data. There may be quality problems. There may be a lot of feature engineering required to get a historical data set into the shape that's required for supervised machine learning. There may be biases in the historical data, including how it may no longer be relevant, especially [because] we've seen over the last three years with COVID changing consumer behaviour again and again. Old data, historical data has become less relevant in making predictions. 

And also we find that, once you start running a solution, the data will be evolving over time as well. So we always encourage people to think in terms of an ongoing curation process for a data set rather than starting from mining and existing the vein or resource of historical data. But it's a process that will be actively ongoing and evolving. 

David Brown

So, what is that curation process you're talking about? Developing data through experimentation or…

David Colls

Yeah. And depending on the application as well. Or it might be a process of starting to collect data in a form that you want to be able to use it and that's amenable. So, in this case you might consider that if there's a system that users are interacting with that you wish to be able to add some AI capability or machine learning features to. 

And start thinking about how those users can provide implicit or explicit guidance as to the right decisions to be made. So, that kind of labelling interface, there's a range of ways. So we can start, for instance, a recommendation system without any historical data. We use a reinforcement learning approach where the solution will run its own experiments and learn what works and what doesn't work.

David Brown

So I think some people wouldn't be familiar with that term, the “reinforcement learning.” Run me through that.

David Colls

So it's a process there, I guess. We describe the three major classes of machine learning solutions and those are unsupervised or self-supervised learning, which allows us to identify clusters. For instance, it might be used in segmenting a customer base behaviorally to identify different patterns of behaviour that might identify behavioural segments. 

The second is supervised learning, which is what a lot of people think of when they think of machine learning, which is, we provide a labelled set of examples of what good looks like. So this transaction is fraud, this transaction is not fraud, this customer is like literature and this customer is not like literature and then a model can learn how to predict those labels or other features on new data that wasn't part of the training set. 

And then the third paradigm is called reinforcement learning. And in that model we treat the system as an agent that tries to figure out how to get the best reward in an environment. And so the agent has to run some experiments and see what happens as a result. And so we might have a reinforcement learning agent say we have a video streaming service and there's a preview for each piece of media. The reinforcement learning agent might have half a dozen different choices for a preview image for each piece of media and then it would learn based on some attributes of the user, what they're most likely to respond to and over time through running, through trialling different preview images. It would eventually select the one that leads to the highest engagement in the content.

David Brown

Yeah, it's basically creating that labelling process on the fly. 

David Colls

That’s a good way to think about it.

David Brown:

Okay, so interesting. There's so much opportunity. In terms of product development and business planning through AI, is this something that you see occurring a lot in organisations or is this still in its infancy where you're educating organisations on this?

David Colls

Yeah, I think there are pockets where the maturity is quite high and these techniques are being explored but across the breadth of organisations, there are still a lot of opportunities to adopt these techniques.

David Brown

ThoughtWorks recently published a playbook called the Modern Data Engineering Playbook. It aims to help deliver greater value to businesses by pursuing data engineering projects no matter the industry. Can you give us a run through of the themes in the Modern Engineering Playbook?

David Colls

Yeah, it's about a technology-agnostic approach to data engineering. So, we focus on product management for data delivery practices, technical practices and team organisation for data initiatives. And we wanted to structure it this way because in our client work, we work with a range of technology stacks but we find consistent practices in how we deliver to be effective regardless of the technology stack. 

David Brown

That makes sense. Is this Playbook available to everyone or is this exclusively available to ThoughtWorks clients?

David Colls

No, it's available to everyone. It's published on our website currently and we've released the first five or seven chapters and there are two more. The final two coming as fast followers. So it was a team effort.

David Brown

I'm looking at it. It was a team effort you say?

David Colls

Yes. So lots of different authors and a collaborative review process. It really was bringing that thinking from different people and different engagements together into a playbook.

David Brown

It seems quite comprehensive. I've got it open in front of me right now. So, let’s just run through those seven chapters. You've got “Treating Data as a Product” which is already published. “Data Engineering Practices” is also published, “Getting The Most Value from Your Team,” “Data Delivery Principles”, “Data Quality” are all published. The ones coming soon are “Project Architecture” and “Security and Privacy.” 

I would also like to ask you about the ethical and societal implications of the growth of big data, which has been a big driver of machine learning practices in AI and the practical use cases of AI. We've seen certain things like… I forget the terminology used but in China they have this social record or social status which is used basically analysing big data and applying it through algorithms. What are some of the concerns and implications of this data? I know also the data privacy legislation in the EU also has potential implications on how you can use AI and you have to explicitly advise people how their data is being used as well.

David Colls

Yeah, I think there are a lot of ethical implications of using data and you know, they should be front of mind for organisations. And this is kind of the safe element of building data capability because you can think through different levels or different layers if it helps as well. But I think that the general theme is that poorly governed automated decision-making can disproportionately affect vulnerable people and they are least able to redress the impacts of the automated decision-making. 

And so if we keep that frame in mind we can look at it from different levels, the kind of the structural societal level of, do we want to make these decisions with automated systems and so, that leads into the questions of ubiquitous surveillance. Is that something that we want to accept as a society? How do we protect workers in an environment where you know that a lot of their agency might be controlled by automated decision making as well as a first layer. 

And then we kind of come to the question of data security and privacy as you said. If an organisation is a custodian of its customers’ data or other other people's data, how are they a responsible custodian? How do they ensure that they're taking the right steps to preserve the security and privacy of that data?

David Brown

It's not just an ethical consideration either. There are legal implications such as GDPR and the EU and the like, which is actually expanding and scope to cover these types of things. Is that right?

David Colls

Yeah. Absolutely. And I think that the legal implications are almost a formalised form of ethics that provide really clear guidance about how you avoid doing harm to the stakeholders with data. And then we can go beyond that to the ethics of algorithmic decision making so, you know, we consider what decisions we think we should be making. If we consider that, [we can] manage the handling of the data. 

Then we've got concerns around garbage-in-garbage-out with algorithmic decision making. We've got concerns around historical biases being reflected, being perpetuated and even being exacerbated by algorithmic decision making. And then we actually got the case that algorithms fail as well. So they're not perfect. At scale, those failures may be not insignificant.

David Brown

So what role or who in the organisation should be responsible for that ethical oversight?

David Colls

Yeah it's a great question. And I think, as technologists, there’s an element of all of us being responsible technologists and understanding the impact of our actions. I think yeah there's a role for government regulation and frameworks as well. And then there's a role for tools and processes to support us in those objectives. So, bias testing tools and so on. 

But I think throughout, we see it as like a process that gets built in or an activity that's built in throughout the delivery cycle. So it's not necessarily a sandwiched activity up front. We set the ethical path and then we check for compliance at the end of the project. But it's an ongoing process. And so everyone involved in delivery would have a role to play in that and research in our experience shows that diverse teams have better outcomes in that regard as well. They're able to consider more scenarios.

David Brown

We've talked about that on our podcast actually. It's quite interesting, the benefits of diversity in that regard. Are you actually seeing like an ethics officer to oversee such projects as well?

David Colls

Yeah, I know we've worked with organisations that have that function. And I can see it becoming a bigger part of delivery as more AI gets built into more services.

David Brown

David Colls, it was really interesting to talk to you today. Where can our listeners follow you on social media?

David Colls

Great! You can find most active on LinkedIn. You can find me there.

David Brown

And your handle is David Colls.

David Colls

 That's correct.

David Brown

And of course you've also written some blogs on ThoughtWorks.com as well.

David Colls

Yes, that's correct. You can find our Data and AI Practice. We have a series of short blogs that we publish. We aim to publish quite frequently as well. So I can provide those links if that helps.

David Brown

Terrific. Thank you very much for your time today, David! 

David Colls

Thanks, David!


Listen on your favourite platform


Other podcasts you might like

cta-left cta-right
Demo

Want a ringside seat to the action?

Book a demo to see how our fully integrated platform could revolutionise your organisation and help you wrangle your data for good!

Book demo