Instructor:
Data science encapsulates the interdisciplinary activities required to create data-centric products and applications that address specific scientific, socio-political, or business questions. It has drawn tremendous attention from both academia and industry and is making deep inroads in industry, government, health, and journalism.
This course focuses on (i) data management systems, (ii) exploratory and statistical data analysis, (iii) data and information visualization, and (iv) the presentation and communication of analysis results. It will be centered around case studies drawing extensively from applications, and will yield a publicly-available final project or tutorial that will strengthen course participants' data science portfolios.
The course will primarily consist of sets of self-contained lectures and assignments that leverage real-world data science platforms when needed. As such, there is no assigned textbook, but there will be some recommended ones. Many lectures will come with links to required readings, which should be completed before or after the lecture (as declared), and, when appropriate, a list of links to other web resources.
Note: This course outline is tentative and subject to modification to meet the specific needs and requirements of the students and the evolving field of data science.
Students enrolled in the course should be comfortable with programming (for those at UMD, passing CMSC216 will be sufficient) and have a reasonable level of mathematical maturity. The course will primarily use the Python programming language through Google Colab (or Jupyter Notebook if needed), with support from the Anaconda package manager. Primer lectures on Python for data science will be provided early in the semester, so there is no need to worry if you are new to Python. Later lectures will introduce statistics and machine learning, potentially incorporating basic calculus and linear algebra. A light level of mathematical maturity, roughly equivalent to that of a junior CS student, is preferred.
This course is aimed at junior- and senior-level Computer Science majors, but should be accessible to any student of life with some degree of mathematical and statistical maturity, reasonable experience with programming, and an interest in the topic area. If in doubt, e-mail me: fardina@umd.edu!
Introduction to Data Science, Experiment Design, Introduction to Python, Data Types, Data collection, Git, Pandas, Database, SQL, Probability, Summary Statistics, Hypothesis Testing, Data Visualization, Data Exploration, Introduction to Machine Learning, Classifications, Decision Trees, Regression, Feature Engineering, ML Evaluation, Neural Network, Image Processing, Natural Language Processing (NLP), Introduction to Graph(s) Theory, Recommender Systems, Data Science Ethics
At the completion of this course, students will be able to:
There will be five (to six) assignments, one semester long final group project, two written midterm exams, and one final written exams (cumulative).
Final grades will be calculated as:
| Component | Percentages |
|---|---|
| Assignments/Mini Projects | 35% |
| Mid Exam 01 | 15% |
| Mid Exam 02 | 15% |
| Final Project/Tutorial | 15% |
| Final Exam | 20% |
Late work typically receives no credit, except as specified below.
It is recommended to submit homework and projects on time. There will be a 15% penalty for late submissions of homework and a 20% penalty for late submissions of project/tutorial checkpoints 1 and 2 within 24 hours after the deadline. After this 24-hour period, no submissions will be accepted. In ELMS/Gradescope (as instructed in the assignment or project), you can submit multiple times, and only the last submission will be graded. The penalty for late homework will be applied automatically; no request is necessary. This policy applies to both homework and projects, EXCEPT for the final submission/checkpoint 3 of the final project/tutorial and for any kind of BONUS work
See the next section about how to contact us in special circumstances. We aim to help everyone succeed.
We are going to use a combination of in-person and online office hours and the Piazza forum (sign up here) for Q&A. This means that it's appropriate to use Piazza for asynchronous communication with the course instructors and other students, and also for short, high-bandwidth discussions that could usually take place before/after class.
Note that Piazza is not appropriate for things like asking for accommodations, extensions or other such issues/concerns. For any logistics-related help, such as extensions or grading issues, please fill out this form: CMSC320 Logistics Request Form. CMSC320- Logistic Request Form However, If you do not receive a response or if your issue is not resolved within 48 hours, then please email:
While sending an email, at the beginning of your email subject line, include relevant tags, such as [hw_extension/req], [project_extension/req], [gradeissue],
etc. More tags will be updated on the website.
If you have a request that fits into one of these categories and you don't email the address given above,
then your request may not reach the right person and may not be answered in a timely manner.
If your correspondence does not fit into either of those two categories, please email an instructor (professor) with
[CMSC320] in the email subject line. You may also go to an instructor's office hours at the times listed below.
Please note: if you don't include [CMSC320] in your subject line when emailing instructors, your email
may not be filtered correctly.
* All hours are EDT
Do keep an eye on Piazza, though; TAs will sometimes swap hours, shift hours, host hours on Zoom, and so on! Additionally, we have at least one TA explicitly covering Piazza on each weekday; all course staff will float around Piazza in general, too!
THERE ARE NO OFFICE HOURS FIRST WEEK (Jan 26 - Jan 30)ALL THE LECTURE SLIDES WILL BE POSTED HERE
The schedule is subject to change as the semester progresses!
| # | Date | Topic | Reading | Slides | Notes |
|---|---|---|---|---|---|
| 1 | Jan 27, Tu | Introduction to Data Science (Chapter 1) and Data Type (Chapter 2) |
Self-Study Slide: Git Additional Reading/Helpful slide: Python Demo Python |
Lec1.1Intro Lec1.2DataTypes (Self-Study) | Sign up on Piazza! |
| 2 | Jan 29, Th | Lec2.BasicProbability (Self Study) | |||
| 3 | Feb 3, Tu | Experimentation | Experimental Design | ||
| 4 | Feb 5, Th | |
HW 1 Out Final Project/Tutorial Instruction Out |
||
| 5 | Feb 10, Tu | |
|||
| 6 | Feb 12, Th | Probability cont. (part 2) |
Central Limit Thm |
||
| 7 | Feb 17, Tu | Hypothesis Testing |
Hypo Testing Steps and Examples P-Value Explanation |
HW 2 Out (Feb 18, Wed) | |
| 8 | Feb 19, Th | Data Visualization | HW 1 Due | ||
| 9 | Feb 24, Tu | Data Exploration | |||
| 10 | Feb 26, Th | Data Cleaning |
Project First Checkpoint Due HW 3 Out |
||
| 11 | Mar 03, Tu | Data Cleaning | HandlingMissingData | ||
| 12 | Mar 05, Th | Introduction to Machine Learning | HW 2 Due | ||
| 13 | Mar 10, Tu | Mid Exam I | |||
| 14 | Mar 12, Th | Feature Engineering | |||
| 15 | Mar 17, Tu | Spring Break | (No Class) | ||
| 16 | Mar 19, Th | Spring Break | (No Class) | ||
| 17 | Mar 24, Tu | ML Evaluation |
Confusion Matrix
CrossValidationVideo KFoldVideo |
HW 3 Due | |
| 18 | Mar 26, Th | Decision Tree | DecisionTree-Calculate-Entropy-InfoGain | ||
| 19 | Mar 31, Tu | Classifications | Reading materials are given at end of the Classification lecture slide | Project Second Checkpoint Due | |
| 20 | Apr 2, Th | Regression (& Unsupervised Learning) | Simple Linear Regression | HW 5 Out | |
| 21 | Apr 7, Th | Unsupervised Learning & Dimensionality Reduction | PCA_short_slides | ||
| 22 | Apr 9, Th | Introduction to Neural Network | HW 4 Due | ||
| 23 | Apr 14, Tu | Image Processing with CNN | |||
| 24 | Apr 16, Th | CNN continue and Intro to Natural Language Processing (NLP) | |||
| 25 | Apr 21, Tu | Intro to Natural Language Processing (NLP) cont. | |||
| 26 | Apr 23, Th | Mid Exam II | |||
| 27 | Apr 28, Tu | Introduction to Graph Theory |
NetworkX Intro to GraphQL These two materials are given for self-exploration if someone is interested. |
||
| 28 | April 30, Tu | Introduction to Graph Theory cont. |
ClassNote_Girvan-NewmanAlgorithm
Girvan & Newman. "Community structure in social and biological networks," PNAS-02. |
HW 5 Due | |
| 29 | May 5, Tu | Recommender System | |||
| 30 | May 7, Th | Recommender System cont. and Data Ethics (self reading) | Project Final Checkpoint Due (May 8, Fri) | ||
| Final Exam Week | Final Exam |
Tentative: Friday, May 15 (4 p.m. – 6 p.m) Reference: https://registrar.umd.edu/registration/register-classes/final-exams/spring |
This date is subject to change based on the official UMD schedule. Until then, please treat this as the working date and plan your end-of-semester activities accordingly. |
All the Assignments will be posted on Piazza/ ELMS. Instructions will appear over the course of the semester. Most assignments get released one or two days before the lecture material gets presented and are due one or two weeks after that.
Weights: HW1 — 5%, HW2 — 5%, HW3 — 6%, HW4 — 7%, HW5 — 12% (Total 35%)
| # | Description | Date Released | Date Due |
|---|---|---|---|
| Homework 1 | Git, Pandas, and SQL | Feb 5, Th | Feb 10, Th |
| Homework 2 | Python & Statistics & Hypothesis Testing | Feb 18, Wed | Mar 5, Th |
| Homework 3 | Data Exploration & Data Cleaning & Missing Data | Feb 26, Th | Mar 24, Tu |
| Homework 4 | Machine Learning: Classification and Clustering | Mar 24, Tu | Apr 9, Th |
| Homework 5 | Regression, Gradient Descent, Neural Network, CNN & NLP | Apr 2, Th | April 30, Th |
There will be a group final project/tutorial with four to six (four recommended) persons in each group. Keep in mind that this is a semester-long group project/tutorial for this course, and you should strive to make it your best work. It will be graded to a higher standard than the rest of the homework, considering that you have had the opportunity to practice these skills beforehand. Project details will be posted on Piazza/ELMS. It will be your responsibility to select the project topic as well as your project partners.
| # | Description | Date Released | Date Due |
|---|---|---|---|
| Checkpoint 1 | Group Formation and Choosing Dataset | Feb 5, Th | Feb 26, Th |
| Checkpoint 2 | Data preprocessing and Exploration | Feb 5, Th | Mar 31, Tu |
| Final Checkpoint | Final deliverable of DS Project | Feb 5, Th | May 8, Fri (NO EXTENSION/LATE SUBMISSION IS ALLOWED)) |
Note that academic dishonesty includes not only cheating, fabrication, and plagiarism but also includes helping other students commit acts of academic dishonesty by allowing them to obtain copies of your work. You are allowed to use the Web for reference purposes, but you may not copy code from any website or any other source. In short, all submitted work must be your own. Cases of academic dishonesty will be pursued to the fullest extent possible as stipulated by the Office of Student Conduct. Without exception, every case of suspected academic dishonesty will be referred to the Office. If the student is found to be responsible for academic dishonesty, the typical sanction results in a special grade “XF", indicating that the course was failed due to academic dishonesty. More serious instances can result in expulsion from the university. If you have any doubt as to whether an act of yours might constitute academic dishonesty, please contact your TA or the course coordinator. The University of Maryland, College Park has a nationally recognized Code of Academic Integrity, administered by the Student Honor Council. This code sets standards for academic integrity at Maryland for all undergraduate and graduate students. As a student, you are responsible for upholding these standards for this course. It is very important for you to be aware of the consequences of cheating, fabrication, facilitation, and plagiarism. For more information on the Code of Academic Integrity or the Student Honor Council, please visit http://www.shc.umd.edu.
Examples of Academic Integrity Violations:
The following are examples of academic integrity violations:
AI Tools Policy for this class:
The use of AI tools (e.g., ChatGPT, Copilot, Gemini, DALL·E etc.) is strictly prohibited for any part of assignments, homework, quizzes, exams, or projects—including brainstorming, writing, coding, and editing. Exception: You may use AI tools only for brainstorming ideas for the semester-long final group project proeject only. If you do, you must clearly declare this use in your project checkpoints submission and cite the tool as a reference. AI-generated content or code cannot appear in your final deliverable. This restriction also applies when using Google Colab or any other online coding platform—AI-assisted features must be turned off. Violating this policy will result in a score of zero on the assignment/project and may lead to an academic integrity referral. While AI tools will play a role in your future work, this policy is designed to help you develop original ideas and a unique voice—both essential skills for this course. If you have questions or would like to suggest potential exceptions, please email me at [instructor email: fardina@umd.edu], and I will be happy to discuss furtherConsequences for non-compliance: Failure to adhere to this policy may result in a zero on the particular course work where the AI tool is used. In addition, the university honor code is applicable here: violation of the honor code and appropriate action will be enforced.
You are responsible for reading the class announcements that are posted on both the course webpage and ELMS. Please check them often (at least once a day). Important information about the course (e.g., deadlines, assignment updates, etc.) will be posted on the course webpage.
Policies relevant to Undergraduate Courses are found here: http://ugst.umd.edu/courserelatedpolicies.html. Topics that are addressed in these various policies include academic integrity, student and instructor conduct, accessibility and accommodations, attendance and excused absences, grades and appeals, copyright and intellectual property.
Projects/Labs: On any graded project or lab, you are NOT allowed to exchange code. We compare each student's code with every other student's code to check for similarities. Every semester, we catch an embarrassingly high number of students that engage in cheating and we have to take them to the Honor Council.
GroupMe/Other Group Chats: We encourage students to talk about course material and help each other out in group chats. However, this does NOT include graded assignments. There have been a couple instances in the past where students have posted pictures/source files of their code, or earlier sections have given away exam questions to later sections. Not only did this lower the curve for the earlier section because the later one will do better, the WHOLE group chat had to pay a visit to the Honor Council. It was an extremely ugly business. Remember that in a group of 200+, someone or the other will blow the whistle. If you happen to be an innocent person in an innocent groupchat and someone starts cheating out of the blue, leave it immediately (and better yet, say you are leaving and say you will report it).
Github: You may post your project code to private Github (or similar service) repos only. As a student, you can make a private repo for free. Just remember that your free premium subscription has an expiration date, and your code becomes public once it expires. The Honor Council can retroactively give an XF (even to students who have already graduated) if your code is then used by another student to cheat. So just be careful. Posting graded code to a public repo will give you a free ticket to the Honor Council(Unless the instructor has given you permission with some strict conditions).
Missing an exam for reasons such as illness, religious observance, mandatory military obligation, physical or mental health conditions of the student or an immediate family member, death of someone close to the student, participation in university activities at the request of university authorities, or compelling circumstances beyond the student’s control (e.g., required court appearances) will be excused so long as the absence is requested in writing at least 2 days in advance and the student includes documentation that shows the absence qualifies as excused; a self-signed note is not sufficient as exams are Major Scheduled Grading Events. For this class, such events are the final project assessment and midterms, which will be due on the dates listed in the schedule above. The final exam is scheduled according to the University Registrar.
Absences stemming from work duties other than military obligation (e.g., unexpected changes in shift assignments) and traffic/transit problems do not typically qualify for excused absence.
For medical absences, you must furnish documentation from the health care professional who treated you. This documentation must verify dates of treatment and indicate the timeframe during which you were unable to meet academic responsibilities. In addition, it must contain the name and phone number of the medical service provider to be used if verification is needed. No diagnostic information will ever be requested. Note that simply being seen by a health care professional does not constitute an excused absence; it must be clear that you were unable to perform your academic duties.
It is the University's policy to provide accommodations for students with religious observances that conflict with exams, but it is your responsibility to inform the instructor in advance of intended observances. If you have a conflict with a planned exam, you must inform the instructor prior to the end of the first two weeks of the class.
The policies for excused absences do not apply to project assignments. Projects will be assigned with sufficient time to be completed by students who have a reasonable understanding of the necessary material and begin promptly. In cases of extremely serious documented illness of lengthy duration or other protracted, severe emergency situations, the instructor may consider extensions on project assignments, depending upon the specific circumstances.
One time per course per semester, students may provide a self-signed excuse as documentation of an absence from a single class (e.g., lecture, recitation, or laboratory session) that does not coincide with a major assessment or assignment due date.
For the full University of Maryland policy on excused absence, including definitions and procedures, see the official policy: University of Maryland Policy on Excused Absence.
If you experience difficulty during the semester keeping up with the academic demands of your courses, you may consider contacting the Learning Assistance Service in 2201 Shoemaker Building at (301) 314-7693. Their educational counselors can help with time management issues, reading, note-taking, and exam preparation skills.
Students can find information about official university closings and delays on the campus website or by contacting the weather emergency phone line at 301-405-7669. If any exam or class assignment is rescheduled or a class is canceled due to inclement weather, we will provide announcements via ELMS. We will ensure to follow the university campus rules
Although every effort has been made to be complete and accurate, unforeseen circumstances arising during the semester could require the adjustment of any material given here. Consequently, given due notice to students, the instructors reserve the right to change any information on this syllabus or in other course materials. Such changes will be announced and prominently displayed.
Course evaluations are important and the department and faculty take student feedback seriously. Near the end of the semester, students can go to http://www.courseevalum.umd.edu to complete their evaluations.
Textbook: We will use the free version of the "CMSC320 Textbook", written by Dr. Fardina Alam and Gavin Hung (along with the class lecture slides). The link will be provided in ELMS. As we go through the course sometimes I will mention additional resources or next steps. None of this is required for the course, but students have asked for me to keep a record of which texts/websites I mention.