Tim Hoolihan: Navigating the World of Data Science

Tim Hoolihan lives in Northeast Ohio. He went to Kent State University, and graduated with a Management & Information Systems, although he took a number of courses out of the Computer Science program as well.He has had a 15-year career since college, in which Tim's been a software developer, architect, CTO, and in various roles of management. He has worked in a variety of languages, including Ruby, Java, C#, Python, PHP, Objective-C, HTML, Javascript, C/C++, and more. Tim is currently the Senior Director of Data Science at DialogTech, a marketing technology firm focused on voice interactions.

As a senior director of Data Science of DialogTech, what is your day to day work like?
My job has a few components. First, it’s my job to advise how we invest and grow in our data science practice across the organization. Where do we fit in projects, research, services, and other relevant areas? Put another way, how do we interface with other groups in the company. How do we promote data literacy?

The second component is managing the work our group does. It’s a small group, and folks in our area tend to be senior, so this is a light component of my job, compared to other managerial roles in the software world. Finally, I have research and implementation work I actually get to do. It’s one of the components I love most about my job, I get to keep my hands on the keyboard.

How did you get your first programming job?
In college, I had an internship with a local staffing & consulting company. I worked on websites, internal applications, and a variety of programs. Near the end of my last year of college, I started interviewing and was offered a job with FirstEnergy, a utility company based out of Akron Ohio. I started out in a group that built tools that supported data analysis, and eventually migrated into a group that ran software that monitored and balanced demand across generation plants. Along the way, I worked on a coal pricing application, and other software projects.

What technologies are you excited about? If you were a new aspiring developer what technology would you learn and why?
There is a lot to be excited about these days. For me, I have spent the last five plus years transitioning from a career in software development to a career in data science. I really enjoy prediction, probability, machine learning, etc. If you are interested in that, I would suggest making sure you have a solid mathematical background and learn Python or R. More than the language, I would focus on the principles of understanding analysis, correlation, and visualization. Kaggle.com is a great place to find example problems and tutorials.  If data science is not your thing, there are plenty of other areas to get into. The Makerspace is very interesting right now. Containers, virtualization, and other automation skills are in high demand.   Otherwise, make sure you understand sound programming, general API principles, testing, and source control skills (preferably git). I think that will earn you a seat on the majority of software teams you run into

For someone trying to get into data science, what important differences between Python and R? How would you go about choosing between the two?
They are both great languages for data science work. And often, people end up learning both. R is more focused on statistics and scientific work. You won’t see much general purpose programming done in R. It has a very strong set of packages that support a variety of domains in science, research, etc. Often times, these libraries are written by practitioners who are not necessarily formally trained as programmers. So the trade-off ends up being that sometimes the tooling can lag a bit, but most algorithms or concepts in data science already have a package in R.

Python is a more general purpose programming language. It has great data science libraries, but not to the depth that R does. However, your Python knowledge will also applicable for other software tasks, like application development. Python currently has a better story in deep learning. Libraries like Tensorflow and Theano support machine learning on the GPU, which speeds up more complex neural networks. R is playing catch up a bit in this area.

As for choosing, I think it depends on what use case the learner has in mind and their background. If you could benefit from learning a general purpose scripting language for other purposes (web development, etc) start with Python. If your background is primarily computer science, start with Python. If your background is more scientific or statistical, start with R. Eventually, you should probably learn both, but I think trying to learn both at once is inadvisable. Learn the concepts and techniques in data science using one language, then worry about translation. Put another way, any linguist will tell you that picking up additional languages is easier once you truly grasp grammar and language theory.

What is data science? Is it a more math or computer science heavy field?
Data science is a broad term, and interpretations vary. However, I will share my interpretation. For a variety of reasons, many fields have been converging and collaborating in recent years, including statistics, computer science, information theory, natural language processing, forecasting and artificial intelligence. As the tooling support grew, data capture grew, demand grew, and the boundaries blurred, we needed a new term. Data science became that term. It’s broad enough, that there are areas where it is more computer science heavy. While other areas are more domain specific. Because of the variety of fields involved, it has some natural barriers to entry. These are complex fields that require study and experience, and you need strength in at least a few of these fields to work on most projects in the data science arena. In addition, the field is changing and rapidly evolving.

This has a couple of effects worth understanding. First, it is an attractive field for people who truly enjoy lifelong learning. Formal training helps, but you will always be reading and studying in this field. Second, no one can know it all. This can drive unnecessary insecurity at times. I run a user group for R practitioners. Often, speakers who are scientists worry that their code will look silly to developers. Likewise, programmers worry they are going to present some theory or mathematical concept wrong to someone who may have a stronger domain knowledge. In reality, most people in the field are understanding, and this isn’t an issue.

What do you do in your free time?
There are a variety of things I do that are related to my career, but not explicitly required by my job. For example, I regularly take courses on Coursera. For example, I'm currently midway through Ohio State's Calculus 2 Course, after finishing Calculus 1 in the fall. I'm the organizer of the Cleveland R User Group, which involves arranging for speakers, etc. I'm also a pretty avid reader. I'm currently reading Nassim Taleb's The Black Swan. I'm a big Nate Silver fan, and Taleb is one of his biggest critics, so I wanted to get a different perspective.   Outside of programming & data science altogether, I like to run. I enjoy playing video games. And most important, my wife and I have three young children that keep me very busy.

You contribute to open-source. How do you recommend new software developers get involved in open-source projects?
I have contributed some to open source, but would really like to contribute more. If I had my career to do over, I would get involved with open source projects earlier. Too many people are intimidated by these projects, assuming they will get critiqued for their code quality. You will, but you should use that as a learning experience.   First, read up any guidelines that a team has taken the time to write. They are telling you how to avoid common (and annoying) mistakes, so take the time to digest that information. Second, start small. Many projects need help with documentation or other such tasks that can provide a friendly entrance point. Finally, be up front if you are a novice, and be friendly. Most people will respond to that in a friendly manner, even if your work needs fixing.

As someone who does a significant amount of data analysis on a daily basis, how do developers help figure out what is relevant and what is trivial?
There are volumes of books just on this topic, so I will struggle to summarize here. That said, the number one tip I can give is as follows. Be humble in the assumptions of what you know. People who come into a project with some knowledge of data science and start making assumptions often do the most damage. The people I have come to respect most, enter the conversation with an open mind, assuming they don't know everything. Assumptions cause all kinds of mistakes, such as correlation vs causation, over-generalizing principles, etc.   Neil deGrasse Tyson put this principle well when he summed it up with the question "Are you wired for doubt?" I would amend that question by adding "and are you wired to start that doubt with yourself?"

What are some of the challenges of managing a technical team?
If you are learning to manage technical teams, I would start with some of the more popular books in the area (Smart and Gets Things Done, The Mythical Man-Month, etc). However, I will highlight a couple of the challenges that stand out to me.  First, technical fields are by nature intellectually demanding, and these fields tend to weed out anyone who doesn't enjoy constant learning. Therefore, you have to understand that most people you work with are among the smartest people in their social groups. That usually means everyone has a pretty healthy ego. It takes time for people to adjust to that, and learn that they are now among equals. Managers do better when they understand this and try to facilitate peer respect and collaboration.   Second, understand that managing people is not a linear job. Team members eventually move on, new ones join, and challenges come in cycles. Management is never "done". The analogy I often use is the job of a minister (priest, rabbi, etc). He or she will certainly grow in their career over time, but they will be doing weddings and funerals their first week on the job, and weddings and funerals their last week on the job. When you are talking about people, things are cyclical and never ending.

If you could go back 20 years and give yourself advice, what would you say?
I could write a novel on just this. But I’ll try to narrow it to just a few of key bits. First, you should focus your time on learning concepts over tools. Tools change, but concepts endure. Let’s pick data visualization. You can focus on becoming an expert in reporting tool (e.g. learn Tableau), or you can focus on being great at data visualization (e.g. reading an Edward Tufte book). There is value in both, but I would suggest more time on the conceptual part. The value is longer lasting. Tools get replaced. And besides, when you really understand an area and it’s terminology, you can usually pick up new tools faster.

Second, invest your time in materials that really have substance. It’s easy to be informed about so-called trends in data science without really understanding them. Or to get caught up in industry gossip articles about valuations and acquisitions. You will usually learn more from one good academic paper than you will from one hundred TechCrunch articles.

Lastly, pay attention to detail, particularly in how you use terminology and make assertions. Words have to have meaning and carry weight. Use them sloppily, and you can quickly get yourself into trouble or cause confusion. In the very least you will look silly. This is true in any field but is multiplied in data science based on the number of fields it combines.

Follow Tim on Twitter @thoolihan