What I’m Up To - Finishing Grad School

Joe
Jul 10, 2020
6 min read

My blog is many things. Regular is not one of them. It’s been almost three months since I’ve posted, and that’s because I’ve had a lot going on. On the fun side, I’ve been spending time with family, playing Fortnite, and listening to a lot of Weezer. Family has been more available now that I’m back in Texas. I was never big on video games before April, but playing online with friends has elements of real-time teamwork that I’ve missed during this basketball-less time. Finally, I can’t relate to pretty much any of Weezer’s lyrics but I just can’t resist the drum lines and guitar riffs.

Along with these newfound hobbies, I'm also working.

The first thing that comes up in conversations about my summer is my capstone project with General Motors. I describe it differently every time I mention it, but the most accurate description is a real-world thesis. My capstone partner Tim and I are working with General Motors on a real project with real data, and by the end of it we’ll produce a nice poster and paper for course credit.

Our project this year has to do with Zero Emissions, one of the three Zero Initiatives announced by GM’s CEO. To narrow it down, we’re focusing on using GM’s vast collection of connected car data to develop routing for the purpose of minimizing fuel consumption on a trip.

There are essentially two parts to the project. The first part is predictive, taking some information about the car and trip and predicting fuel consumption along the route. The second part uses that model to intelligently (and hopefully efficiently) develop the most fuel-efficient possible route from the origin to the destination.

It’s already been a learning experience in many ways. In an academic sense, it offers the opportunity to bring together different models in ways that we haven’t seen in the classroom. When we learned about predictive modeling, we just learned about predictive modeling. And when we learned about graph-based methods (routing), we just learned about graph-based methods. Putting these together into a solution framework has already been a novel and exciting process for us.

Because we’re doing this project on real data, there are plenty of other ways this differs from a classroom experience. Our dataset is enormous and largely unexplored; that means we’ve spent a significant amount of time digging through it. What kinds of cars are represented? Which data is missing? Is it missing at random or or is there a pattern?

Beyond that, there’s a significant amount of infrastructure that we have to develop for this to work. Typically, routing is based on a “network” of intersections and roads. Performing this on the data we’re given requires us to either create our own network and match our empirical car data to it, or match that data to an existing map. In either case, there’s a lot of data engineering between us and our goal.

All of this is wrapped in the additional difficulty of having a LOT of data. Having so much data definitely opens up possibilities for more powerful methods and analysis. At the same time, it gets pretty cumbersome. Even the simplest operations take significant forethought so they don’t take hours or days to run. The size of the data comes with Spark, which opens up more functionality at the cost of user-friendliness compared to the typical Pandas/SKLearn data science stack.

I’m obviously learning a ton from this internship. On a shallow level, I’m developing experience with new computational tools. But on a deeper level, I’m gaining more experience learning on the fly. That’s been something I’ve had to do since the very beginning of college. So while the technical tools I am learning are new and not intuitive, there’s a comforting familiarity to learning them as I go.

Along with my internship with General Motors, I’m working (remotely) as a Teaching Assistant for an Executive MBA course at Sloan. To say the least, being a teaching assistant for 100 executives introduces a weird dynamic - I’m a 22 year old kid who’s never hauled in a full-time paycheck, helping teach mid-career executives how to manage their business operations. The reconciliation of this dynamic is our different sets of experiences. While I don’t have a lot of experience in the workforce, I do have experience tackling a wide variety of modeling problems. And this experience has come both in purely academic settings and in more applied ones. My role as a TA is to use my own experience to “fill in the gaps” for my students. An added perk is that I get to learn from their experience along the way.

Being on the teaching side of a course has also taught me plenty of things. As it turns out, grading is far more time consuming than I expected. It’s pretty nerve-wracking when you get unexpected questions in office hours. The transition from physical to online classes creates a TON of headaches, and demands forethought for a lot of tasks that are normally trivial.

In a more substantive way, it’s also driving home the danger of homework problems. When you’re assigned homework for a class, you can guess with pretty high certainty that the solution will involve the most recent concepts you’ve discussed. A course about queuing is probably not going to throw a novel machine learning problem your way.

Real life problems are a lot murkier. Deciding which model or method to use is a function of what assumptions you make. And if you’ve been to the blog before, you’ll hopefully agree that the assumption making process is underestimated in both importance and in difficulty. If you apply the same sort of thought process to real problems that you do a course’s homework, you’ll search for the problem that fits your desired solution, when it should be the other way around.

Finally, I’ll share a bit of advice I’ve been leaving in assignment comments. An essential part of modeling a real process is making good recommendations. This recommendation has to accomplish two things: connect to the math behind the model, and suggest a specific, achievable course of action.

Often, you’ll see people do only one of these. An example is “automate the process using ML/AI.” Despite its vagueness, this is actually pretty actionable - a trained data scientist will be able to connect the available data to possible analyses and improvements. What it isn’t, however, is connected to the root cause of the problem at hand. The proof? I just made that recommendation without mentioning the problem I have in mind. If that connection between the model and the recommendation isn’t explicit, you have no guarantee that the recommendation is a result of the analysis as opposed to a consequence of intuition. If you’ve been here before, you know that intuition can be misleading.

The other side of this is making a recommendation based on the model that, in practice, means nothing. An example here, for a simple Newsvendor problem, is “we need a better prediction so we can stock less unnecessary products.” How can we make a better prediction? Do we incorporate different datasets? Increase the speed of our data pipeline? Use more powerful predictive methods? It’s true that a “better prediction” will result in a better outcome. The issue is it's not clear how you accomplish it.

Finally, I’m working on getting a job. This job search comes on the heels of a sort of vocational adjustment from the spring. I’ve always found the practice of connecting a mathematical model to “reality” to be an exciting and rewarding process. The act of defining model boundaries, evaluating tradeoffs in making assumptions, and putting together a variety of tools to solve complicated problem are what excite me most.

I’ve recently concluded that I’d mis-defined the boundaries of this style of analysis. Before, I took this natural tendency to “connect the dots” to be a method for technical and quantitative problems. But I missed something. In qualitative analyses, the act of identifying principles and making assumptions is more important, not less. And while I had hints of this understanding before, it wasn’t fully clear to me until I took my “softer” Spring electives, when class discussions worked this same mental muscle. That clarity triggered a reflecting process

This shift in professional self-image naturally brings with it a shift in what I’m looking for. I know that my personal, professional, and academic experience have contributed to my strategic thinking, communication, and leadership skills. These are skills that I'm excited to apply to my career. So, in short, I’m not looking for a job where I code all day. I want to be in a role that requires me to think critically in several domains, quantitative and otherwise, to solve interesting and valuable problems. Maybe the best example of this is the mixed bag of thoughts I collect in this blog.

Unfortunately, this exciting new self-image isn’t the only thing that developed in the spring. The job market has gotten tougher with COVID-19. I’ve already had several application processes end with a hiring freeze. It’s not a new story. Needless to say, I’m grateful for any support in that realm.

All in all, this summer is an enormous learning experience. It’s not all rosy. There’s still a lot of uncertainty in the world and in my life. Despite that, I’m confident that by the end I will have grown professionally, personally, and intellectually.

What I’m Up To - Finishing Grad School

Recent Posts

Comments

Subscribe Form