The article is about Extreme Data-Science, but let me start with a story in which there is no machine learning or data science.
Once upon a time, a company had a massive project that aimed to replace a mess of several payload applications with a single system, powerful and efficient enough to support all payroll processing for tens of thousands of employees.
But three years later, the system had not printed a single paycheck, and no software release had been made since the beginning of the project.
So, at this point, there was clearly a need for a hero to break the curse of this project, and the company hired a hero. He analyzed the situation, and he noticed some bewitched links in the chain of the whole process.
The first one was a habit where a client gives the specifications of the whole product, and then leaves the development team alone until the final demo.
Everyone works under the illusion that if one spends a lot of time in the beginning writing specifications, they will be detailed enough to eliminate any questions during the development phase. Obviously, this utopian thinking has never worked. Without a customer on-site, the development team resolved any questions that arose on their own, which was never what the client expected. Moreover, because the final users were never involved in the process, the chances that the product would be used and embraced by the targeted public were very low.
The hero also noticed the team members worked in bubbles created by the Witch of Specialization.
For example, developers created the pieces of software adhering to the specification, making guesses to address the fuzzy points in the absence of the client. They then gave these pieces of software to QA people to make sure that it would be ready to go into production. In cases where it was not good enough for release, the software went back and forth between developers and QA, so the whole process was extremely long, messy, and completely inefficient. Nobody felt himself responsible for the quality of the software; developers blamed the QA people, QA people blamed the developers, and, in a rare show of unity, everyone blamed the customer.
The quality of the product was evaluated only with external properties, tests, documentation, etc., but nobody paid a lot of attention to the quality of the code itself. And, in the short term, that was no problem, but in the long run, poor code slowed the evolution of the software. Every new feature or change took much more time to implement, so the curve depicting the dependency between time and cost of change grew exponentially.
It was difficult to recognize being outside of code, as other parties were not familiar with code development, and the curse of specialization didn’t help. There were some developers dedicated to specific parts of software, so the bus factor, the risk of losing a key developer (by being hit by a bus!), only worsened this curve growth.
All these bewitched elements created a situation where the product was never ready for the user, and neither the customer nor the project team had visibility on the ultimate project delivery date.
So what the the hero proposed in this catastrophic situation was to throw out the existing code and rewrite the software from scratch following his magical approach. A year later, they celebrated their first delivery.
The magic approach was Extreme Programming, the hero was Kent Beck, and the company Chrysler Corporation.
So, what is that magical approach, Extreme Programming (XP)?
XP defines several practices applied on different levels of organization: practices related to programming are depicted in the center of the circle, practices related to internal collaboration of the team lie on the middle circle, and practices related to project management and relationships with customers lie on the most external circle.
These practices were not invented by Kent Beck, but he suggested to push these practices to their extreme in order to make the most of them.
We know that it’s important to write automation tests for our code, but instead of waiting for days while QA writes and runs tests, developers should assume responsibility for the quality of their code and write the tests even before the code is developed. Following this approach first allows the developers to better understand the problem, and secondly, leads to progress in the development of a test battery that will help avoid regressions later.
We also recognize that the code review is a good practice, so let’s push at its extreme too, by coding in pairs. This will enhance another XP practice, collective code ownership, thereby reducing the bus factor and raising the quality of code.
To give better visibility to a customer, we understand that we should keep him informed about the project’s progress. To do that, the team yields and shares small functional releases on a regular and frequent basis, and each release is discussed with the customer, who suggests the enhancement features and evolution of the next release based on the previous one.
It is possible to do each of these practices in isolation, but some exercises reinforce and encourage the others, so they are more powerful when done in conjunction.
And what about Data Science? Are Data Science projects different from the traditional software projects?
The most important difference arises from the fact that traditional software relies on deterministic processes, and machine learning software on probabilistic ones.
In traditional software, we hard code the rules that process the input data; in that way, we can anticipate different data inputs so the rules of the software will not be influenced by the data changes. But in Machine Learning, as you all know, the rules are created by data. We collect the input and output examples for our training datasets, and with different machine learning algorithms, we create the rules, called models. Without bringing a lot of attention to the engineering quality of the whole system all throughout the development process, data-driven rules can be a source for the future bugs and regressions that are tricky to find and fix. The experimental approach, typical for data science workflow, didn’t ease the task, generating dead experimental paths and pipeline jungles that are difficult and costly to manage.
The machine learning systems are often evaluated by external factors like accuracy of the predictions, speed of the inference, or training, but rarely by the maintainability and evolvability of the code behind this model. This complexity, coupled with the lack of attention to internal quality, generates technical debt that is much more challenging to deal with compared to the technical debt of traditional software.
While creating or improving machine learning systems, data scientists spend a lot of time prototyping ML solutions; they research and benchmark different algorithms, their hyper parameters, etc, It can take an infinite amount of time for the sake of a 0.001 percent of improvement that ultimately could be not even useful for the final user. Moreover, one thing is to prototype the ML solution, another thing is to put it into production. According to the VentureBeat article, 87% of Data Science projects never make it into production. So, under these circumstances, it’s very difficult, or sometimes even impossible, to say when a machine learning system will be released.
A part of the problem resides in the isolation of data scientists. The lack of collaboration between different team members and stakeholders results in silos of data, knowledge, and skills. This gap ends up with the data and code going back and forth between data engineers and data scientist, data scientist and software engineers, all with everyone speaking different languages. We can witness here the same problem of eternal buck passing and blaming we saw at Chrysler Corporation.
A customer is not only excluded from the development process, but very often he doesn’t understand what data scientists are doing, and how it can impact the business. Since AI became a buzzworld mandatory in every modern product to be credible and marketable, the business people often think that investing in AI is enough to miraculously transform the data they have into valuable products.
We can see that all the bewitched links that Kent Beck noticed at Chrysler are present in machine learning projects, and even worsened by their data-driven nature.
But don’t worry, Extreme Data Science, inspired by Extreme Programming, can change the game.
From the point of view of Extreme Data Science, as well as Extreme Programming, the success of the project depends on the collective effort, and successful collaboration plays a key role in a win of the game.
First of all, to be able to collaborate efficiently, we have to speak the same language. Our first XP practice suggests using metaphors instead of special technical terms, so you can explain your system to all stakeholders of the project.
In TabMo, we used neural networks for click prediction, but most non-data scientists had no idea what a neural network was and what we, as data scientists, were really doing.
We organized a workshop where the data scientists worked together with front-end developers to create the posters that demystify machine learning by using metaphors.
It turns out it was much clearer than the presentations or posters prepared exclusively by data scientists. It can be very challenging to avoid technical terms; the concepts that are obvious for data scientists often break the minds of non-data scientists, so the metaphor created by a whole team turned out to be a really powerful tool to understand each other.
The metaphors drive us to another XP practice, User Stories.
User stories should fully describe the functionality of the product but stay understandable for the customer. The important point, especially for data science projects, is that such stories can be used to describe a wide variety of requirements, not only functional but also technical ones.
It becomes convenient because, as we saw previously, the customer doesn’t precisely define the problem he wants to solve with machine learning. But if she must state a problem to solve with an epic story, a large body of work, and then break that down into simpler problems described by user stories, it wipes away a lot of uncertainty and helps data scientists to start work efficiently.
The problem statement that is broken into smaller user stories helps to prioritize the work, and this brings us to another XP practice, frequent releases.
Frequent small releases consist of releasing the minimum valuable product (MVP) quickly and then enriching it with more functionalities by making small and incremental updates.
It allows us to give better visibility to customers, receive their feedback, monitor how the solution works in production, and detect bugs earlier and easier.
The necessity to release MVP quickly will inevitably limit the choice of solutions we want to start with. And it’s rather good news. Because instead of choosing challenging but very complex and resource-consuming solutions like deep neural networks, we will start with simpler ones like linear regression or moving averages. These could be useful immediately for the end-user and give us valuable feedback and ideas for further directions and research.
It goes along with another important XP practice, simple design. The best solution is the simplest one that works. Moreover, we should create code only for the user stories that we are implementing currently without over-engineering it. For example, your company creates electronic components for different electrical equipment. You have the epic that tells you to detect the anomalies giving the consumption of hot water tanks. But you think it’s a good idea to make it generic and be suitable for any kind of device, not only hot water tanks.
In this case, you run a great risk to spend a lot of time trying to create a generic solution, while there is no need for it at all. In the meantime, you lose your customers, who will quit you for your competitor who is less ambitious, but does exactly what your customer needs.
Refactoring practice declares that developers should not fear removing the code or technical solutions not suitable for a new context or customer requirements. It gives you an opportunity to release the functionalities quickly without compromising the quality because you can always come back to your previous design and remove the outdated or not working parts without any regret or risk.
At a different time, I organized a workshop at Comwatt to implement a Whole Team Practice. The workshop consisted of three steps. During the first step, I explained to my entire Comwatt team, consisting of developers, product people, energy engineers, the different data science fields using metaphors. For the second step, I left them a poster that had three columns, corresponding to three data science fields I presented. I asked everyone to find use-cases related to our company business that could be solved by the methods of the respective data science fields. Primarily, it gave me feedback on how well my colleagues understood the metaphors, and secondly, it gave me an opportunity to get it to the third step. During the third step, we evaluated the use-cases with technical complexity and product value criteria, and then we had a long discussion where we analyzed the result in order to separate science fiction from reality. It helped us understand the business values of different use-cases, and at the same time, it gave everyone a better understanding of machine learning opportunities and limitations.
This kind of collaboration, involving all the stakeholders, is much needed on a regular basis. It’s very important that data scientists work closely with the customer and end-users to understand the business requirements clearly. Every time data scientists want to improve the machine learning metric, they should correlate them with business.
At my previous start-up, we started a project where our customers, advertisement ops, wanted a tool to find the best configuration of the ads campaign. Firstly, we thought of a quite complex algorithmic solution, and, using metaphors, we presented the benefits of this solution to our end-users.
We were a little bit disappointed because it turned out that they didn’t care at all about the metrics and benefits we thought would be important for them. Finally, we abandoned our first complex solution and came up with a very simple SQL query that resolved their problem and brought them the value they needed. This workshop saved an enormous amount of time and resources that otherwise would have been spent on a solution that would certainly be interesting to work on but not useful at all for our users.
Another XP practice that accompanies the Whole Team approach is Pair Programming.
Pair programming consists of two colleagues sharing a single workstation, one typing the code, the other giving him indications, switching the roles frequently during the session.
The pair programming that involves a developer and a data scientist is even more valuable, as each one has a different background and approach to solve the problem, so they focus on different goals. This exercise eases the transfer of skills and dramatically increases the quality of code. The pair-programming requires the respect of another XP practice, coding standards. Suppose developers and data scientists are agreed upon a set of rules such as code structuring, naming conventions, error handling, and formatting. In that case, it allows focus during the session on problem-solving, without compromising the consistent style of code, making it easier to understand and maintain by all the team members.
All the team must understand and maintain the code base, as stated by collective code ownership practice. Developers can and actually should review and update code created by data scientists, and vice versa. Collective code ownership avoids blaming one another, eliminating scenarios like “if this module is not working, it’s because of data scientists.” Instead, it encourages cooperation, empathy, and new ideas from different perspectives.
As I said in the beginning, the fundamental difference between data science software and a traditional one lies in the fact that in data science, data input changes can impact the rules, and the rules, in turn, impact the outputs. For that reason, it can be challenging to write the tests before writing the code of some parts of the machine learning systems, but it’s still mandatory whenever it’s possible. Nevertheless, to avoid bugs and regressions, it’s not sufficient any more to test only code; we also have to test the input data properties and the model performance.
It’s especially important because the machine learning bugs can be silent. The user may not even realize the problem with predictions. Even the smallest change in data, code, or configurations could cause a regression, so we have to integrate the tests of model performance and data properties and the casual unit tests of code in the continuous integration pipeline, which is another XP practice.
I find it especially useful given the complexity of machine learning systems. Imagine you have a team of several people, with each one working on a different part of a machine learning system. For example, one is working on the feature-engineering pipeline, another is working on neural network architecture, and the third one is working on optimizing the speed of inference. Integrating their work continuously ensures that everyone is working toward the same goal, and the work of one person is not invalidating the work of others.
These practices were not intended to be applied in a mechanical manner, because they are not an end in itself. But they help to favorably improve a team’s efficiency with respect to the values of Extreme Programming: Communication, Simplicity, Feedback, and Courage.
Communication that recalls the importance of capitalization of knowledge and efficient usage of data and skills across the whole team, avoiding the creation of dangerous silos.
Simplicity that guarantees the productivity of a team, saving time and effort without sacrificing customer satisfaction.
Feedback that helps to reduce multiple risks, becoming a factor of technical and functional quality. It also helps to keep a constant speed of development because data scientists always know that they have mechanisms to make sure they move in the right direction.
And the last value, Courage, is the most important one. Because you need courage to take responsibility for a code you have not written, and you need courage to expose your vulnerabilities and lack of your knowledge during the pair-programming sessions. You need courage to choose a less interesting and challenging solution, but ensures an early release, and you need the courage to keep the process fully transparent by frequently communicating with users and customers.
In my opinion, the effort is worth it, and I hope I gave you a good idea of how useful Extreme Data Science could be and why XP practices are even more critical to apply in the data science context.
Many thanks to Marina Bobyreva for watercolor illustrations in this publication.