OS4106 Advanced Data Analysis

This course moves beyond the ordinary linear model to other types of statistical models that will be appropriate in different circumstances. Students are first introduced to supervised models, including logistic regression and "generalized linear models" (GLM). The importance of complexity control and a training-set/test-set division is emphasized. Non-parametric models are introduced through classification and regression trees. Classification performance assessment is discussed. Unsupervised models, to include clustering and principal components are presented. Throughout the course, examples are drawn from practical experience with conducting research and solving problems for Navy and DoD customers.

Prerequisite

OA3103 or equivalent, such as an intermediate course on linear models; or the instructor's consent

Lecture Hours

3

Lab Hours

0

Course Learning Outcomes

Upon successful completion of this course, you should be able to:
• Distinguish between supervised and unsupervised methods
• Implement linear regression models
• Implement logistic regression models
• Implement random forest models
• Utilize regularization (Ridge, Lasso, ElasticNet)
• Distinguish between classification and regression
• Define and distinguish various classification metrics
• Utilize validation techniques to assess model performance and avoid overfitting
• Implement clustering models
• Reduce the dimensionality of your data by using principal component analysis
• Utilize exponential smoothing and ARIMA models for time series data