This article is contributed. See the original author and article here.

In this series on DevOps for Data Science, I’m covering the Team Data Science Process, a definition of DevOps, and why Data Scientists need to think about DevOps. It’s interesting that most definitions of DevOps deal more with what it isn’t than what it is. Mine, as you recall, is quite simple:


 


DevOps is including all parties involved in getting an application deployed and maintained to think about all the phases that follow  and precede their part of the solution


 


Now, to do that, there are defined processes, technologies, and professionals involved – something DevOps calls “People, Process and Products”. And there are a LOT of products to choose from, for each phase of the software development life-cycle (SDLC). 


 


But some folks tend to focus too much on the technologies – referred to as the DevOps “Toolchain”. Understanding each of these technologies is indeed useful, although there are a LOT of them out there from software vendors (including Microsoft)  and of course the playing field for Open Source Software (OSS) is at least as large, and contains multiple forks.


 


While knowing a set of technologies is important, it’s not the primary issue. I tend to focus on what I need to do first, then on how I might accomplish it. I let the problem select the technology, and then I go off and learn that as well as I need to so that I can get my work done. I try not to get too focused on a given technology stack – I grab what I need, whether that’s Microsoft or OSS. I choose the requirements and constraints for my solution, and pick the best fit. Sometimes one of those constraints is that everything needs to work together well, so I may stay in a “family” of technologies for a given area. In any case, it’s the problem that we are trying to solve, not the choice of tech.


 


That being said, knowing the tech is a very good thing. It will help you “shift left” as you work through the process – even as a Data Scientist. It wouldn’t be a bad idea to work through some of these technologies to learn the process.


 


I have a handy learning plan you can use to start with here: https://github.com/BuckWoody/LearningPaths/blob/master/IT%20Architect/Learning%20Path%20-%20Devops%20for%20Data%20Science.md. (Careful – working through all the references I have here could take a while – but it’s a good list). There are some other more compact references at the end of this article. 


 


See you in the next installment on the DevOps for Data Science series. 



 


For Data Science, I find this progression works best – taking these one step at a time, and building on the previous step – the entire series is here:


 



  1. Infrastructure as Code (IaC)

  2. Continuous Integration (CI) and Automated Testing

  3. Continuous Delivery (CD)

  4. Release Management (RM)

  5. Application Performance Monitoring

  6. Load Testing and Auto-Scale


In the articles in this series that follows, I’ll help you implement each of these in turn.


 


(If you’d like to implement DevOps, Microsoft has a site to assist. You can even get a free offering for Open-Source and other projects: https://azure.microsoft.com/en-us/pricing/details/devops/azure-devops-services/)





Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.