Our 2016 report, Growth of the Bootcamp Model highlights growth in the field of data science and the steadily increasing number of programs churning out data scientists to fill the market gap. When it comes to this industry, there are a lot of terms thrown around—Data Science, Data Engineering, Data Analytics, bootcamps, and fellowships. The job titles are even more varied! With the help of NYC Data Science Academy, this guide will help you make sense of data science, the technologies and frameworks data scientists use and the types of jobs available in the industry.
What is Data Science?
According to NYC Data Science Academy, Data science is a multi-disciplinary field that combines computer science and statistics. The objective of data science is to pull insightful and useful knowledge out of datasets which, at times, can be too large for traditional statistics to analyze. This can include anything from analyzing complex genomic structures, to interpreting handwriting, to optimizing a marketing strategy.
Data Science Director, Josh Wills offers this definition of Data Scientists, a “person who is better at statistics than any software engineer and better at software engineering than any statistician.” Most data science bootcamps require an aptitude for math and statistics, and in some cases knowledge of a programming language, such as R or Python.
What are the best data science bootcamps?
- Data Science Bootcamp - Beginner and intermediate Data Science with Python, and Hadoop as well as the most popular R packages like Shiny, Knitr, rCharts and more
- Hadoop & Spark Bootcamp - Using Python, Scala and Java the course will emphasize the use of Hadoop tools to analyze large volumes of data
- Data Analytics Bootcamp - Use SQL, Excel, and Tableau to extract, analyze, and illustrate real‐world data
- Data Science Bootcamp - Use Python, SQL, UNIX and Git to mine datasets and predict patterns, build statistical models, and master the basics of machine learning
- Data Science Bootcamp - Students will learn cutting edge technologies like IPython environment, Machine Learning, D3 and other modern big data tools and architecture
- Data Science Bootcamp - Students gain experience across the data science stack: data munging, exploration, modeling, validation, visualization, and communication.
- Data Analytics - Using SQL, R and more students gain foundational skills in analytics and work on integrated projects from real companies.
- Insight Data Science is an intensive 7-week fellowship intended as a post-doctoral bridge between academia and professional data science.
For more data science bootcamps, check out this list of 22 Data Science Bootcamps.
Data Science vs. Data Engineering vs. Data Analytics
Data Science bootcamp or Data Analytics bootcamp, what’s the difference? NYC Data Science Academy offers some insight into the different branches of Data Science and the technologies used in each field:
Data Science is a cross-disciplinary field requiring skills in Computer Science (Machine learning), Statistics and Mathematics. Typically, it requires candidates to have an advanced degree in a STEM field (e.g., Science, Technology, Engineering, Mathematics, Statistics) and a good understanding of the sophisticated concepts underlying modeling. Most Data Scientists use R and/or Python as their primary tools.
Data Engineering leans more towards software engineering and computer science, with just some knowledge of data science. It mainly covers Hadoop, Spark, Python, Java and Scala. It entails writing scripts and being familiar with tools to input and extract data from big data warehouses.
Data Analytics is considered more entry-level and focuses on BI (business intelligence). Its focus is to draw business insights from commonly seen data types. It includes data cleaning, data visualization and simple modeling including linear regression. Common Data Analytics tools are SQL and Excel.
Data Science Bootcamps vs. Data Science Fellowships
There are significant differences between data science bootcamps and fellowships. Data Science bootcamps prepare individuals with some knowledge, but limited experience for roles as data scientists in intensive 3-6 month programs. For most bootcamps, a bachelor's degree and math aptitude are required. However, some schools, such as NYC Data Science Academy, require students to have a masters or Ph.D. The bootcamp tuition ranges from $6,500 to $21,000.
Unlike Data Science bootcamps, Data Science fellowships are generally free to the student (revenue is generated through hiring partnerships). Data Science fellowships generally require more experience than bootcamps. For example, the Data Incubator requires candidates to have a Masters degree or Ph.D. in a social science or engineering field and relevant work experience. Data Science fellowships help academic data scientists prepare for work in a corporation or startup. According to a white paper by Insight Data Science, there are “400 Insight Fellows working as data scientists and engineers across the United States.”
What is “big data”?
Many Data Science courses use the term Big Data to describe their curriculum content. What exactly is “big data”? NYC Data Science Academy tells us below.
“Big data” is a term coined to describe datasets that are too large to be analyzed on one computer. With the advent of the internet, streaming data, wearables, etc, the amount of data being produced each day equals all the data ever created up to the year 2003. This data holds insights that can be useful for decision makers, but its sheer volume, together with the usual problems of corruption, incompatibility, and complex structure (often including natural language), make it challenging to use. Sophisticated tools (e.g, Hadoop, Spark) that can employ multiple computers simultaneously are required to extract actionable knowledge from this data.
Data Science Technologies
The technologies learned at a Data Science bootcamp often differ from what is taught at a traditional coding bootcamp. Here, NYC Data Science Academy breaks down common technologies used in the field and what they’re used for.
- SQL - SQL stands for Structured Query Language. In traditional database environments, industries rely on SQL to extract data for data analytics and reporting purposes. It is designed for managing data in relational database management systems.
- Hadoop - Hadoop is a suite of technologies for managing data and executing programs in a cluster (a collection of networked computers running in a data center). This includes a file system designed for the needs of large data, the MapReduce system for running programs in parallel, the SQL-like Hive database for querying data in a cluster, and many other components.
- Spark - Spark is a system for writing parallel programs to run in clusters. As a competitor to MapReduce, it has gained popularity for its higher efficiency on many problems. It also has a powerful machine learning library, mllib, and can be used with R, which makes it especially popular among data scientists.
- Python vs R - Python and R are both standard languages that are used by data scientists. The Python vs R conversation reflects the fact that data science is a marriage between computer science, where Python is used, and statistics, where R is used. A complete data scientist will know both languages and leverage their different strengths.
- Machine Learning- Machine learning refers to a growing set of algorithms that are able to analyze large sets of data. Its popularity is due to the fact that these algorithms are able to make predictions about future events that exceed what traditional statistics is designed to do. The reason it is called “machine learning” is because many of these algorithms are built to use the results its initial findings to feed better data into subsequent models. Thus the machine “learns” how to improve its predictive powers.
What kind of background should data scientists have?
While having math aptitude is important, Data Scientists come from a variety of educational and professional backgrounds. Check out some of our Q&A’s with Data Science bootcamp grads:
- Sumanth Reddy (Professional Poker Player) NYC Data Science Academy
- Emily (Art Major) & Itelina (Econ Major) Metis
- Jason Liu (Physics Ph.D.) NYC Data Science Academy
- Adam Hill (Astrophysics Ph.D.) Science to Data Science Fellowship
Data Science Jobs
As with most fields, Data Science job titles don’t always give you the nuts and bolts of what the job entails. Below are some common job titles you’ll come across when looking for jobs in data science and their average salaries.
Data Analysts are responsible for analyzing large datasets whether for customer research, business intelligence or internal studies. Data Analysts start with a large data set and are tasked with drawing actionable conclusions from this data. Data Analysts may work with engineers, UX Researchers and Sales staff to develop growth solutions. In addition to data science tools like SQL, Data Analysts should also have knowledge of statistics and concepts like A/B testing.
Data Scientists are responsible for determining the data necessary to answer a question, from designing a method for capturing data to gathering data, analyzing data and finally presenting the solution. Similar to the Data Analyst, the Data Scientist’s role is much larger in scope and requires careful planning and design of research from beginning to end. Data Scientists will use the full gamut of data science tools including as Python, MongoDB, Hadoop and more.
Database Administrators work with technologies such as MySQL, MongoDB, and Postgres to manage large datasets. Depending on the company and role, their duties may include investigating and solving database problems, repairing glitches and designing elements that improve the storage and maintenance of data.
Data Engineers are half software developer, half data scientist. Data Engineers use programming languages to write scripts that capture data. Data Engineers then analyze the data and make program or product recommendations based on their analyses.
Check out reviews of data science bootcamps to find out which one is best for you!
Can a Full Stack Developer Exist?
Understanding the Web Application Stack from Front End to Back End
Python: What Beginners Need to Know
What is Python? Is it easy to learn Python? Hackbright Academy instructor Meggie Mahnken explains.