top of page

The Data Science Workforce - Steady State

  • Writer: Joe
    Joe
  • Jan 26, 2020
  • 7 min read

In the digital age, data has become king. In every industry, statistical learning methods have begun to take hold as important aspects of business. The world’s analysis is becoming increasingly quantitative, and as methods continue to grow, the potential of companies’ data has wildly increased both demand for data services and speculation of its potential.

With the change promised by data science, there will inevitably be a change in the way industries run. We’ve seen this story before with things like automated manufacturing and telecommunications. And as technology develops, so does the employment landscape. Once-profitable careers disappear, replaced by a mix of machines and the new jobs required to develop and service them.

There are really two possible ways to explore this transformation. The first is imagining the “steady-state,” or a point “far" in the future when the world’s approach to these technologies is more or less constant. The second exploration is of the transient dynamics that take us from where we are now to that eventual steady state.

Both of these explorations are borrowed from engineering, where they are used to characterize the response of systems. The corollary to this in the social sciences is System Dynamics, a field that applies concepts from controls theory (dynamic systems) to a wide variety of problems.

In the spring semester, I’ll be taking a research seminar, where I hope to use System Dynamics to develop a dynamic model for this process. This post, then, is going to be significantly more qualitative and, in a word, imaginative. As a result of that, and because predicting the future is pretty hard, the hypotheses I lay out here should be taken with a grain of salt. Finally, because this is such a rich topic, and in the interest of keeping this post digestable, I’ll stick to the steady-state version of this system for this post, and explore the transient ones later. Through the semester, I'll use this string of blog posts as a chance to debrief on my research progress and report some intermediate results. This post, then, is both an introduction and a sort of kick-off to the project.

Let’s dive in. What will the future look like?

With questions like these, it’s usually best to start asking some smaller questions. I’ll start with this one: What exactly will we be able to do with data? This is a question that people have already begun to answer.

In a few words, data makes it possible to make better predictions. In industries today, this is often a prediction of demand. With better demand predictions, the resulting decisions can be made to reduce waste, cut costs, and increase consumer satisfaction. These methods are also becoming increasingly available and easy to use.

Beyond this, the deep learning revolution is extending data methods into other tools. A simple example is healthcare, where convolutional neural networks are used to help doctors classify images of tissue samples. Classification methods have gotten better and better at determining whether an email is spam.

I think these methods will reach a little farther than we’re currently giving them credit for.

It’s not crazy to think that, in the next decade or so, our insurance companies will require us to do a digital pre-screening before we go to the doctor’s office for a checkup. As natural language processing advances, it’s not hard to imagine a future where lawyers doing case research are assisted by machines.

An example of where this technology has already reached that kind of steady state is with social media filtering. Advertising now is almost entirely driven by AI, not by humans. And for companies like Google, so are things like censorship and identifying hate speech.

In a way, these methods are geared toward “recognizing” phenomena. In the lingo of Daniel Kahneman, these methods are being directed to “System 1” thinking, where access to vast amounts of data can augment or replace the intuition of an “expert” with limited experience and a closet full of cognitive biases.

Here’s the punchline. Data methods aren’t going to fully replace employees or render them obsolete. But if the economy is flown by analysis, our machines will shift into the pilot’s position, while humans will slide to the co-pilot's seat. In these scenarios, humans will be responsible for watching over machine-driven decisions, understanding them, and communicating those processes to other parties.

An interesting distinction between this shift and what we’ve seen in the past is the kind of work that’s going to be adjusted. In the past, it seemed as if automation had the largest impact on unskilled labor. Robots can exceed the physical capacity of humans, especially in dangerous or difficult environments, and often for a smaller long-term price.

But these data methods promise this radical change for a different subset of the population. Many of these jobs will be the kinds of analysis roles that require a college degree to apply for. Instead of replacing unskilled labor, these methods will vastly change the way educated workers do their jobs.

As AI shapes our work and replaces, augments, or radically changes our skilled work, the workforce and its education will accordingly have to adjust.

The workforce will have a very different education from today’s. If you look around, programming is a universally valuable but not universally developed skill. Beyond that, understanding analytics methods puts you at the innovative frontier of the workforce, where in the future it will put in in the middle of its ranks. I expect to see a radical shift in the education in response to the demands of the 21st century workforce.

There’s also a lot of structure that has to be built to support the use of these methods. Our increased reliance on them will place increasing value on the process of developing and continually testing these methods. These things will require an explosion of computer-related infrastructure, from IT services to cloud computing. In turn, these will create some physical demands that out current infrastructure may not be set up to handle. As the world’s businesses become more data-driven, the services related to making this happen will grow accordingly.

Finally, companies will have to radically change their workflows. Today, there are relatively fixed and well-defined productivity estimates for an hour of work. The output of work is essentially constant through your workflow. Working for 20 hours will give you about half as much as working 40. In more precise terms, the fixed time investment is low, while the marginal benefit of time investment is constant.

With data driven methods, development typically requires a very large up-front time investment. During this time, there’s not a lot of “output.” But when a tool is fully developed, the marginal cost of using it is incredibly small. This, of course, assumes the tool has been correctly developed. Correcting mistakes often brings a team back to the “initial investment” phase while they debug and fix the issue. The key to an organization’s ability to unlock the power of AI, then, is to develop processes that minimize that initial time investment without sacrificing on the quality of the end product.

Beyond that, these methods have assumptions that require constant stress-testing. A sort of implicit but ubiquitous assumption of data-driven methods is that the phenomena are “i.i.d," or independent, identically distributed. A simpler way of explaining this is to say that the reality described by the data must be the same as the reality we’re making decisions in. While I argued before [https://josephzaghrini.wixsite.com/website/blog/ten-minute-talks-roundabouts-and-reality] that this process of justifying assumptions is important in any area, it’s especially vital with data. Small adjustments in reality can totally change the results and efficacy of a model. By leaving it unchecked, companies open themselves up to the risk of making completely misguided decisions. And beyond that, being just a step behind may force companies into a costly rebuild of their model, causing dangerous analysis downtime.

Perhaps a good analogy for this analytics development process is manufacturing. Perhaps I’ll save that for a future post.

This digital revolution may still leave some particular jobs untouched. It’s not a secret that there’s currently a shortage of trade workers. This is largely due to the attractiveness of a university education. As machines replace some of these jobs and radically change others, a college education in many areas will likely be worth less, pushing people into trade schools as an alternative.

Beyond that, trade work is very difficult to replace with a machine. For one, there is not a wealth of detailed data available for these kinds of problems. Often, the most skill-intensive part of the process is diagnosing a problem. This requires trade workers to reason through the symptoms of the issue, tracing it back, and eventually searching for a flaw. Often, it also demands a careful visual inspection and attention to detail, especially given the varied nature of these problems.

In other words, there are many aspects of these problems that would be very difficult for a machine to learn just to determine what a problem is. Even after that, technicians have to fix the problem, too.

Perhaps an easy comparison is with driving, where machines are on the verge of driving as well as (or better than) humans. In countries like America, driving is a very predictable process. There are strict, narrow rules that define the realm of possibility. Beyond that, driving in different cities or areas follows the same set of rules, so it’s possible to train a single machine to work in a lot of different situations.

None these things are true for things like plumbing. And beyond that, it’s not as easy to bring your robot along for a plumbing call as it is for a ride-along. Perhaps the proof is in the requirements for humans to do each of these tasks. To be a driver takes a few months of practice and passing a simple test. To become skilled at a trade requires specialized education and years of experience.

The bottom line, then, is that the demand for trade workers isn’t going anywhere. As the population gets further and further from hands-on, the need for this work will only increase. Machines won’t be filling in anytime soon.

To sum up, the future will be heavily influenced by the adoption and proliferation of data science methods. Business analysis will be supplemented (and supplanted) by the methods being developed today. The infrastructure around these data methods will grow with their applications, creating new demands on the companies that employ them.

 
 
 

Recent Posts

See All

Bình luận


Post: Blog2_Post

281-658-3686

Subscribe Form

Thanks for submitting!

©2018 by Joe Zaghrini. Proudly created with Wix.com

bottom of page