I received my Graduate Certificate in Data Science from Harvard Extension School in January of 2020. It took me two years, and this is the summary of my experience with the program.
Note: This is not a review of edX Harvard Data Science Professional Certificate, which is a set of introductory online (no live teaching or grading) classes on how to use R for analysis.
What is the Graduate Certificate in Data Science from Harvard?
It is a graduate four course educational program that covers one core course in Statistics, one core course in Data Science, and two electives. All of the classes are standard graduate level one semester courses offered by Harvard, some by the Extension School (part of Faculty of Arts and Sciences) and some by Harvard School of Engineering and Applied Sciences.
Requirements
You need to have an undergraduate degree from an accredited school, but you don’t have to submit transcripts or proof of your degree or pre-qualification courses. There is no formal admission process into the certificate program.
All classes are graduate level and require knowledge of calculus, linear algebra, probability theory, and introductory statistics. For most courses, you will also need a working knowledge of python programming language, though some classes require R.
International students from non-English speaking countries have to provide proof of their English ability.
Cost
Around $11,500 if you start in September of 2020. You will earn four graduate credit hours per course, a total of 16 graduate credit hours.
Other Commitments
You should expect to spend between 10 and 15 hours a week on a single regular (non-summer) class. Your time commitment depends on how familiar you are with the programming language used in the class. I jumped with both feet after learning python on YouTube, so it was an interesting experience. I did just fine, by the way.
You should also budget for a good computer, preferably with a GPU card, especially if you plan to take machine learning classes that use Tensorflow/Keras as electives.
Overview
This is a good program for those who want to work on data projects for a living. It is also good for those, like me, who have long worked with data, and would like to learn new methods that have been developed while we were in the workforce.
The certificate is popular among programmers/developers who can learn how to expand their knowledge and add value to the employer.
This is not a program where you will learn how to create a Pyplot chart or import your data into a Pandas dataframe. You should be able to do that before starting or at least be able to figure it out as you go. In many classes, you will be working with real life data, and this data can be messy. This is particularly true for final projects.
Format
All classes can be taken remotely, with some having an onsite option. I would highly recommend being onsite for those who live in Boston, provided that is it safe (I am writing this as all schools switch to online learning due to COVID19).
The grading for three out of four classes that I took was based on homework assignments and a final project. Homeworks were weekly, fairly extensive, and were submitted as code (Jupyter notebooks or R .rmd files) and PDFs. The final project can be an individual or group project, and it generally uses real world data that you can either find yourself or download from academic sources.
One of my classes had a midterm and a final exam, which were administered using Proctorio, and basically were timed coding exercises.
Types of Courses
Fundamentally, the classes that you can take for this program are either Extension School classes or regular Harvard Computer Science or Math classes. You can tell what kind of class it is by looking at the description.
Some Extension classes are 100% online, and some have an onsite option. All Harvard Engineering School (in my case, Data Science 109A&B) classes were taught on campus live, and Extension students could watch the lectures in real time or recorded. On campus classes might not be an option during COVID19, but the distinction still stands.
- Non-Extension classes tend to be more challenging. You are taking those classes with seniors at Harvard College and graduate students from other Harvard schools. These are very solid courses with famous professors that you get to take by doing not much more than paying your $3,000. On the flip-side, these classes were created for in person participants, so as a remote student you will feel like you are peeking into a window.
- Extension school classes offer a wide variety of subjects that cover all the fun things, from Deep Learning to Data Visualization. These are specifically formulated for online learning, so all of the classes and labs are a lot more interactive for you as a remote student. You will never feel an afterthought in them, and you will be the focus of the teaching staff.
I recommend taking a mix of types. Starting with an Extension class is going to introduce you to the learning environment, and once you are comfortable, you can take more challenging classes.
If you live in Boston, you should ask if you can attend the non-extension class on campus, and if you can, do surely go. Professors warned us not to share links to the live lectures with offline students because they wanted to discourage on campus students from watching them live instead of attending, so my take is that they encourage in person participation.
Core Statistics
To obtain the certificate, you will have to take one core Data Science course and one core Statistics course. I have taken Statistics and Econometrics courses in the past, though many years ago, so I decided to hold off on the Statistics course until the very end, and I think it was a good decision for me.
I needed to know that I was sure about getting the certificate before spending money on the Stats refresher, but you might be in a different situation. In case you feel like you need to take Statistics first, you should do that, as it provides a solid basis for many methods used further on.
If you have good programming skills, but feel like a core Statistics class might be too hard, I would highly recommend taking an Introduction to Probability course by Joe Blitzstein available for free from Harvard or on iTunesU. It’s truly an amazing course, that you will be glad you will have taken it.
Math Prep
If you are worried about your math skills, here is the list of math topics you need brush upon before getting on this journey:
- Derivatives. Derivatives. Derivatives. Partial derivatives. Multivariable function derivatives. Point derivatives. How to use derivatives to find maximum or minimum of a non-linear function.
- Taylor decomposition.
- Matrix multiplication. Matrix rank. Eigenvalues and eigenvectors.
- Integration.
Class Reviews
Below are reviews of the four classes that I have taken. You need to get a B in every class to count against the certificate.
CS-109A Introduction to Data Science
This is a core Data Science course taught at the School of Engineering by Pavlos Protopapas and Kevin Rader, and it was the most challenging class I took in the program. You are taking this class with other Harvard undergraduate and graduate students, and you will be graded on the same curve by their standard. And homework grading is a bit unpredictable, as you will get points deducted spuriously over things you have done slightly differently. Don’t worry, other students are in the same position.
The class covers applications of statistics, with an emphasis on understanding all sorts of issues that can crop up in regression, and then goes over data science methods, such as decision trees/random forests, discriminant analysis, and support vector machines. It touched on the neural networks, but very briefly.
The class uses Python programming language extensively, and I almost got in trouble on the very first homework. The good thing is that homeworks can be done in pairs, which you should definitely do. Find yourself a few people you work well with so you can do your final project together.
What it takes to get an A. Being a statistician, I will be the first to admit that I got lucky with the two most important homeworks, and that’s how I got a good grade. That was on top of meticulously trying to dot every i and cross every t in the homeworks, which is the really the base of getting through this class.
I only got the perfect grade on a homework assignment once, and it was when I was incensed by the grading on an earlier assignment that asked for a corporate-style report (if there is anything I know how to do well, it’s corporate reports, but the graders had… different ideas). So when faced with another “tell us what you think about these methods” question, I went above and beyond to show that in real life simple approaches easily beat complicated methods we used “for pedagogical purposes.”
This time, it worked. In conclusion, don’t be afraid to voice your opinion even if it means critiquing the methods used in class; just make sure you raise it at the appropriate times.
The time commitment for this class was around 15-20 hours a week. I spent a lot of my time figuring out how to make python behave, and I hope you have better luck than I.
E-106 Data Modeling
This was a core Statistics course taught by Hakan Gogtas, and I took it for a refresher and to learn R. Since this was an online only Extension school class, and since I was familiar with the material, it was an easier class for me, but it was not an easy class by any stretch of the imagination.
The class goes into great detail over regression and all of the interference and confidence metrics associated with it. That means you will be calculating a lot of error terms for a lot of estimates.
You will also need to remember math to hand-calculate maximum likelihood estimates, and while at it, learn LaTeX, since those calculations look a lot better when printed nicely.
This was the only class that had midterm and final exams in addition to homeworks. Those were live coding sessions in R Studio.
To the credit of the teaching staff, this course had the nicest TAs and professor. They were always happy to help and do reasonable investigations for you. I screwed up on occasion, and they helped me out every time.
What it takes to get an A. Submit your homeworks on time. Prepare for the midterm and final. Make sure you have your formulas handy and can do assignments fast. If issues arise, ask, and you will get help.
CS-109B Advanced Topics in Data Science
This class was taught by Pavlos Protopapas and Mark Glickman in the spring following 109A, and together they were much like a one year course in Data Science. To me this class felt a bit more relaxed than 109A, probably because everyone who takes this class has been battle-hardened and cleared in the prior four months.
The class follows the same format as 109A and goes over an the array of methods such as Bayesian analysis, convolutional neural networks, reinforcement learning, generative models etc.
The grading of weekly homeworks was just as nitpicky and spurious as 109A, and my appeal to regrade in a situation where I did the same thing in a slightly different (and in my opinion, better) way yielded zero results, so keep that in mind.
The course has a final project, which in our case turned into a pretty big deal. I was fortunate that all of my wonderful teammates from 109A course took 109B, so we teamed up together and did some great work on image classification models.
What it takes to get an A. Start your homeworks early, complete them well, make sure to comment on every single question in the assignment, and execute a solid final project. I did not have to do anything out of the ordinary to get an A in this class.
E-89 Deep Learning
This was the first class I took, and it totally got me hooked on the program. The class was taught by Zoran Djordjevic, and went over neural networks, how to calculate derivatives in backpropagation, how layers fire in neurons, how to make your GPU work with TensorFlow, and a lot more.
There is a wide variety of models being used, and it was really fun to see how convolutional layers reinterpret an image and train a network capable of contextually translating from one human language into another in just 30 minutes.
You may need a computer with a GPU for this class, it will really help run some of the assignments faster.
The final project in my class was individual, and you get to choose your own subject and data. My recommendation is to start thinking about it from the start. Is there something you would love to apply the knowledge you are getting to? If so, start collecting data, and you will be far ahead of the game. I collected and classified my final project data (images) manually, and it only took a few hours.
What it takes to get an A. Do your homeworks on time and review best practices examples that the teaching staff provides. Attend labs. Labs were great in general, but attending them live gave you opportunities to ask questions and have them answered. This is a fun class, and you will do great by engaging and enjoying it.
Conclusion
If you are an analyst who wants to learn about programming for data science or developer who wants to learn about statistics and data, it’s a great certificate course for you. It does not hurt that Harvard is a name that looks good on a resume.