Syllabus

Learning Outcomes

The overall goal of the course is for students to acquire in depth knowledge and skills in the use and development of Data Science software as well as basic Data Science knowledge: i.e. an interdisciplinary approach to find, extract and discover patterns in data using methods of analysis, domain competence and technology.

Knowledge and understanding
Upon completion of this course, students will be able to:

Explain the data science life cycle
Explain Big Data and data analysis
Explain methods for data preparations

Skills and abilities
Upon completion of this course, students will be able to:

Apply unsupervised and supervised machine learning algorithms for problem-solving
Apply fundamental concepts in statistics and probability theory, including key concepts like probability distributions, statistical significance, hypothesis testing and regression
Use programming languages for data science/data analysis
Perform data extraction from text
Use exploratory data analysis (EDA) to describe the data using summary statistics and visualisation techniques

Values and attitudes
Upon completion of this course, students will be able to:

Interpret and analyse the results of a data extraction process, as well as evaluate the effects of choices made during the process

Course Content

The course covers the process of data science, i.e. a multidisciplinary approach to find, extract and discover patterns in data through a fusion of analytical methods, domain expertise and technology. In this context, the fields of data mining, forecasting, machine learning, predictive analytics, statistics and text analytics are covered.

Within the framework of the iterative data science process, business understanding is treated with problem identification for the specification of key variables that are to function as model goals and identification of relevant data sources. It also includes the formulation of questions that define business goals and that can be quantified by computer science technicians.

To control data quality, the acquisition of raw data, data processing (ETL), examination of data and modeling are included. To facilitate the development of model(s) and to find the model that best answers the initial questions, so called feature engineering is used, when raw data is extracted and distinctive features are created.

Finally, the evaluation of modeling and analysis, presentation of results and commissioning are discussed.

Assessment

Hand-in assignments (2.5 credits), laboratories (3 credits) and mini-tests (2 credits).

Forms of Study

Lectures, laboratory work, and assignments.

Grades

The Swedish grades U–VG.

The final grade is determined by weighting of the hand-in assignments and mini-tests.

Prerequisites

Object-Oriented Programming 7.5 Credits, First cycle or other course in Fundamentals of Programming
Statistical Analysis 7.5 credits

Data Science & Machine Learning