CY3650 Foundations in Data Science

This course surveys the use of information technologies and data analytics, with emphasis on case studies relevant to cyber operations and to the DoD. Topics include technologies and trends for Big Data management (e.g., distributed cloud file systems, NoSQL data stores); major themes and technologies in cloud computing (SaaS, PaaS, IaaS), distributed computation frameworks (MapReduce); and case studies focusing on how cloud infrastructure is used to enable services and analytics (e.g., mining, matching filtering and translating data).

Prerequisite

CS2020 or equivalent introductory programming experience.

Lecture Hours

3

Lab Hours

1

Course Learning Outcomes

A variety of skills will be covered in this course, including understanding of fundamental concepts, knowledge of key terms, and the ability to frame the data management factors in real-world problems.

By the end of the course, students will be able to:

  • Explain the concepts associated with managing large data sets in a distributed computing environment.
  • Identify cloud computing architecture, design principles, and technologies.
  • Differentiate big data management trends and technologies (distributed cloud file systems, NoSQL data stores).
  • Describe distributed computation frameworks and analytics (MapReduce).
  • Recognize simple instances of cloud infrastructure applied to enable services and analytics (search, mine, match, filter, and translate data).
  • Define the data science processes and products in the context of storage and compute infrastructure and analytic capability.