Welcome to the Advanced Linear Models for Data Science Class 2: Statistical Linear Models. This class is an introduction to least squares from a linear algebraic and mathematical perspective. Before beginning the class make sure that you have the following:
- A basic understanding of linear algebra and multivariate calculus.
- A basic understanding of statistics and regression models.
- At least a little familiarity with proof based mathematics.
- Basic knowledge of the R programming language.
After taking this course, students will have a firm foundation in a linear algebraic treatment of regression modeling. This will greatly augment applied data scientists' general understanding of regression models.

The goal of this MOOC is to show that econometric methods are often needed to answer questions. A question comes first, then data are to be collected, and then finally the model or method comes in. Depending on the data, however, it can happen that methods need to be adapted. For example, where we first look at two variables, later we may need to look at three or more. Or, when data are missing, what then do we do? And, if the data are counts, like the number of newspaper articles citing someone, then matters may change too. But these modifications always come last, and are considered only when relevant.
An important motivation for me to make this MOOC is to emphasize that econometric models and methods can also be applied to more unconventional settings, which are typically settings where the practitioner has to collect his or her own data first. Such collection can be done by carefully combining existing databases, but also by holding surveys or running experiments. A byproduct of having to collect your own data is that this helps to choose amongst the potential methods and techniques that are around.
If you are searching for a MOOC on econometrics that treats (mathematical and statistical) methods of econometrics and their applications, you may be interested in the Coursera course “Econometrics: Methods and Applications” that is also from Erasmus University Rotterdam.

Learn fundamental concepts in data analysis and statistical inference, focusing on one and two independent samples.

This course focuses on one of the most important tools in your data analysis arsenal: regression analysis. Using either SAS or Python, you will begin with linear regression and then learn how to adapt when two variables do not present a clear linear relationship. You will examine multiple predictors of your outcome and be able to identify confounding variables, which can tell a more compelling story about your results. You will learn the assumptions underlying regression analysis, how to interpret regression coefficients, and how to use regression diagnostic plots and other tools to evaluate the quality of your regression model. Throughout the course, you will share with others the regression models you have developed and the stories they tell you.

Many experiments involve factors whose levels are chosen at random. A well-know situation is the study of measurement systems to determine their capability. This course presents the design and analysis of these types of experiments, including modern methods for estimating the components of variability in these systems. The course also covers experiments with nested factors, and experiments with hard-to-change factors that require split-plot designs. We also provide an overview of designs for experiments with response distributions from nonnormal response distributions and experiments with covariates.

This course offers a rigorous mathematical survey of advanced topics in causal inference at the Master’s level.
Inferences about causation are of great importance in science, medicine, policy, and business. This course provides an introduction to the statistical literature on causal inference that has emerged in the last 35-40 years and that has revolutionized the way in which statisticians and applied researchers in many disciplines use data to make inferences about causal relationships.
We will study advanced topics in causal inference, including mediation, principal stratification, longitudinal causal inference, regression discontinuity, interference, and fixed effects models.

This course aims to help you to ask better statistical questions when performing empirical research. We will discuss how to design informative studies, both when your predictions are correct, as when your predictions are wrong. We will question norms, and reflect on how we can improve research practices to ask more interesting questions. In practical hands on assignments you will learn techniques and tools that can be immediately implemented in your own research, such as thinking about the smallest effect size you are interested in, justifying your sample size, evaluate findings in the literature while keeping publication bias into account, performing a meta-analysis, and making your analyses computationally reproducible.
If you have the time, it is recommended that you complete my course 'Improving Your Statistical Inferences' before enrolling in this course, although this course is completely self-contained.

Welcome to the Advanced Linear Models for Data Science Class 1: Least Squares. This class is an introduction to least squares from a linear algebraic and mathematical perspective. Before beginning the class make sure that you have the following:
- A basic understanding of linear algebra and multivariate calculus.
- A basic understanding of statistics and regression models.
- At least a little familiarity with proof based mathematics.
- Basic knowledge of the R programming language.
After taking this course, students will have a firm foundation in a linear algebraic treatment of regression modeling. This will greatly augment applied data scientists' general understanding of regression models.

Power and Sample Size for Longitudinal and Multilevel Study Designs, a five-week, fully online course covers innovative, research-based power and sample size methods, and software for multilevel and longitudinal studies. The power and sample size methods and software taught in this course can be used for any health-related, or more generally, social science-related (e.g., educational research) application. All examples in the course videos are from real-world studies on behavioral and social science employing multilevel and longitudinal designs. The course philosophy is to focus on the conceptual knowledge to conduct power and sample size methods. The goal of the course is to teach and disseminate methods for accurate sample size choice, and ultimately, the creation of a power/sample size analysis for a relevant research study in your professional context.
Power and sample size selection is one of the most important ethical questions researchers face. Interventional studies that are too large expose human volunteer research participants to possible, and needless, harm from research. Interventional studies that are too small will fail to reach their scientific objective, again bringing possible harm to research participants, without the possibility of concomitant gain from the increase in knowledge. For observational studies in which there are no possible harms to the participants, such as observational studies, proper power ensures good stewardship of both time and money.
Most National Institutes of Health (NIH) study sections will only fund a grant if the grantee has written a compelling and accurate power and sample size analysis. The Institute of Education Sciences (IES), the statistics, research, and evaluation arm of the U.S. Department of Education, also offers competitive grants requiring a compelling and accurate power and sample size analysis (Goal 3: Efficacy and Replication and Goal 4: Effectiveness/Scale-Up).
At the end of the online course, learners will be able to:
• Use a framework and strategy for study planning
• Write study aims as testable hypotheses
• Describe a longitudinal and multilevel study design
• Write a statistical analysis plan
• Plan a sampling design for subgroups, e.g. racial and ethnic
• Demonstrate the feasibility of recruitment
• Describe expected missing data and dropout
• Write a power and sample size analysis that is aligned with the planned statistical analysis
This is a five-week intensive and interactive online course. We will use a mix of instructional videos, software demonstration videos, online discussion forums, online readings, quizzes, exercise assignments, and peer-review assignments. The final course project is a peer-reviewed research study you design for future power or sample size analysis.

The capstone project will be an analysis using R that answers a specific scientific/business question provided by the course team. A large and complex dataset will be provided to learners and the analysis will require the application of a variety of methods and techniques introduced in the previous courses, including exploratory data analysis through data visualization and numerical summaries, statistical inference, and modeling as well as interpretations of these results in the context of the data and the research question. The analysis will implement both frequentist and Bayesian techniques and discuss in context of the data how these two approaches are similar and different, and what these differences mean for conclusions that can be drawn from the data.
A sampling of the final projects will be featured on the Duke Statistical Science department website.
Note: Only learners who have passed the four previous courses in the specialization are eligible to take the Capstone.