Description

For this project you will be playing the role of data scientist(s) extraordinaire, tasked with finding a dataset, establishing a research question, and using your skills to create an interactive web application in R Shiny to assist your less technically inclined collaborators in exploring the data visually.

Even though this particular project is unpaid, you will still be held to the highest of professional standards. Consequently, it is not enough that your app works, it should also look really cool. In the document that follows, I provide a timeline, an outline of expectations, and a rubric that will be used to assess your project.

Timeline

More details on the individual components are given elsewhere in this description.

Working schedule

Important intermediate dates

App Expectations

Your final Shiny app is expected to include

To be clear, the expectations given here represent the minimum. Anything less than this will result in a grade lower than a C. A rubric is presented in the section on assessment.

Presentation Expectations

In conjunction with your app, you (and your partner) will be asked to give a short 5-7 minute presentation of your app. Your presentation should include:

Your target audience should be our class, so you may assume some working knowledge of R Shiny and various types of graphics/statistics; but you should not assume any familiarity with your data source or research question.

Groups

You may work either individually or with one classmate (of your choosing) on this project. This does not preclude you from working through the labs with a partner and being responsible for your own project.

Note: You may not work with the same person on both this project and the final project. So, if you really want to work with a certain classmate on the final you might choose to work with someone else (or independently) on this project.

Additionally, by choosing to work with someone you are consenting to receiving the same score on the project based. If you are not comfortable receiving the same score as your partner, you might opt to work alone.

Proposal

By the end of the day Friday, October 27, one member from your group should email me a brief proposal that addresses the following:

  1. Where will your data come from? (see Shiny Resource page)
  2. Who (if anyone) you will be partnering with
  3. What are you seeking to explore with your app.

In consideration of (3.), a good proposal may be something like “I want users of the app to be able to explore whether there are spatial patterns in the incidences of different types of crimes that were reported in the city of Chicago”. A less good proposal might be something like “I want display all crimes in Chicago on a map”. The first example is good because it involves something that is best achieved using Shiny (ie: a user option to change or filter by crime), while the second is less good because Shiny isn’t necessary to make a map.

Intermediate Progress

By the end of the day Wednesday, 11/1, you should have most of your data processing done and a rough idea of the kinds of graphics you intend on making. To make sure that you are progressing, I will ask that you send me via email an R Script with you data cleaning code along with an R Markdown document with a rough sketch of what kind of plot(s) you intend on making. These do not have to be cleaned up (i.e., don’t worry about labels, themes, etc.,). I will be quick in giving you feedback so that you are able to respond to any changes if needed.

Assessment (100 pts)

App Code – 10 pts

Aesthetics – 20 pts

Function – 30 pts

Presentation – 15 pts

Difficulty – 20 pts

Misc – 5 pts

Examples

In addition to the details here for the assessment, consider also the example projects for grades A, B, and C. Associated with each one are a few comments about each of the apps that I would consider when grading, though in no particular order.

Grade C

Example C

  • Merging data from different sources to create plot
  • Default theme on ggplot
  • Labels for plot are the default xy values from dataset
  • Once “Create Map” is used once, it no longer does anything. Changing the State from the drop down will immediately update the map
  • You can select “Green” for both high and low values of gradient
  • Metric drop down uses default values from dataset
  • Creates error when Metric == “percasian”
  • The choropleth looks “stretched” and not proportional
  • No other tabs or information

Grade B

Example B

  • Merging data from different sources to create plot with additional data source for table
  • Metric values look better (though High School is written as “Highschool”)
  • The “Create Map” button works to update state however, changing the value of State will change the title of the plot without changing plot (error in isolate() most likely)
  • The colors for gradient are better – if “Red” is selected for Color One, it cannot be selected for Color Two. However, the value for Color One changes without prompting if Color Two is changed
  • Still an issue with changing state and the title of the plot changing
  • Adds a table that subsets the college dataset to include only colleges located in the state
  • Has input that updates based on other input – i.e., Gradient or Viridis and allow to choose either high/low values or scale

Grade A

Example A

  • Many separate data sources combined in interesting ways for comprehensive analysis.
  • The colors associated with Gradient work as they should – cannot select two of the same, they do not update without being explicitly changed. This likely took a bit of work and trial/error
  • Added several tabs including a constant row at the top for updating state in analysis
  • Title is correct for both choropleth and analysis tab
  • Plot isn’t “stretched” like in previous
  • Has conditional UI inputs depending on gradient/viridis
  • Added functionality to restore default values for all of the inputs
  • The scale values for Viridis aren’t A-H but rather the names of the actual scales

Level of Difficulty

One goal of this project is to give you the opportunity to find and work on a topic that you find interesting. For better or worse, real world projects have considerable variability in terms of the amount of work required to do anything interesting: for some projects, over 90% of your time may be spent on cleaning and organizing your data in a way that is useful to produce relatively simple yet insightful visualizations. Other times, the data may come relatively clean with the majority of time spent on preparing highly detailed visualizations.

The same can be said about Shiny apps themselves – while many aspects of an app may be straightforward, some aspects may involve crafting intricate logic to get the reactive aspects of an app to work as they should. Having the high/low color relationship from the “A” project above would be an example of this.

To address the levels of variability and difficulty in your projects, you will be asked to submit a <1 page written argument detailing the level of difficulty in your project or pointing out aspects that you are particularly proud of having accomplished.

The hallmarks of an A-level project include things such as:

In short, an A level project will be one that involves some level of self study to present things that go beyond what has been covered in labs.

Finding Data

You are expected to use a challenging data source of your choosing. I encourage you to find something that aligns with your interests, and you are welcome to use data from other courses/internships/etc. provided it is sufficiently complex.

If you are having trouble finding data, check out the list on the Shiny Resources page. Additionally, you might check out kaggle which keeps an interesting repository of datasets used in machine learning competitions.

Comments

During the next two weeks, I am happy to meet with anyone who would like to discuss in more detail what their options are, in planning a good research question, or for working through parts of your Shiny code. If outside of typical office hours, please email me to schedule a time so that I can be sure that I have enough time allotted to work through whatever it is we need to.

I will also plan on hosting an informal session in our standard classroom on Sunday, Nov 5 from 5-630 to get last minute feedback on your projects.