As I have developed the Social Sciences Quantitative Laboratory, I built a list of free online tools that can be used to interact with, be amused by, and engage the theory behind data. This post summarizes some of these resources.

**Theory Development **

Theory development is the core of any data analysis project. It informs the questions we ask, the data we use to answer the questions, how we understand discrepancies in our data, when the evidence is sufficient to answer the question, and much more. Two resources I’ve found useful for explaining the importance of theory development are:

Spurious correlations: While the primary point of this site is to amuse the reader, it can also be used to explain the need for a strong theory. Many correlations arise by accident. Theory informs what correlations we think will be relevant. I ask students to work in group to discuss one correlation of their choosing. The task it to explain (1) why the correlation is spurious and (2) to come up with a theory that could explain the correlation. The goal is to engage the students in creative thinking, as that is the core of theory development. It also emphasizes that humans can come up with a theory for just about anything, so we need to be careful about defining our theories before we analyze our data. Otherwise, we’ll come up with ad-hoc explanations for everything we see instead of rigorously testing our ideas.

Correlation is not causation: XKCD is an amusing webcomic with has a strong scientific grounding. I use it regularly to add dimension to many data analysis concepts. I particularly like the hover-over text on this comic: “Correlation doesn’t imply causation, but it does waggle it’s eyebrows suggestively and gesture furtively while mouthing ‘look over there’.”

**Regressions**

Regressions are the workhorse of many social science studies. These are some links that help explain the core components of a regression, both to students with a background in statistics and to those who have never encountered a regression.

Manipulate scatterplots to see how the regression line changes: This interactive site allows the user to see how the regression lines change based on the distribution of data. It can be used to explain the impact of outliers, but also some principles behind least squares.

Extrapolating data: Another XKCD comic that cautions the users against extrapolating beyond the sample.

**P-Hacking and Omitted Variable Bias**

Interactive graphic with party control and economic power: The folk at fivethirtyeight.com do more than present statistical analysis of politics and sports. They developed an interactive, create-your-own-theory graphic that explains the phenomena of p-hacking. It can also be used to explain why omitting variables can influence the conclusions reached by data analysts. This goes along with an article that discusses scientific methodology.

The importance of effect size: In 2015 the World Health Organization classified cured meats as carcinogens. I use this tweet to highlight the importance of understanding the magnitude of the effect.

Jelly beans (don’t) cause acne: This is another XKCD comic. It shows that if you run an experiment on 20 different groups, then you expect to find a statistically significant results just due to random chance in one of those 20 groups. That is, in 1/20=0.05 groups, the results will be spurious.

Publication bias towards significant results: I use this XKCD comic to explain how journals accept articles with statistically significant results, and the language researchers use to try to accommodate that standard.

**Data Visualization **

Visualizing climate change opinions

Beautiful graphs:

- https://www.reddit.com/r/dataisbeautiful/top/?sort=top&t=all
- https://www.datapine.com/blog/best-data-visualizations/

Misleading graphs:

- http://www.statisticshowto.com/misleading-graphs/
- https://www.reddit.com/r/dataisugly/top/?sort=top&t=all

**Programming Skills**

Two online resources provide a great introduction to programming skills. These are:

Both teach a variety of languages, including R, Python, and SQL. The courses are free and very well designed.

Here I switch focus from digital tools to teach data skills, to using human interaction to teach digital skills. Teaching digital skills is tricky, particularly when students are programming in real time.

- Hire a student to serve as a debugger. Programming languages are notorious for creating idiosyncratic errors. One teacher cannot effectively keep up with all the bugs that come up, and students will be put off if their code breaks and can’t be fixed.
- Emphasize good programming practices, including commenting, good variable names, and structuring code systematically. Many students will end up programming in a different language than the one you are teaching them. Teaching good programming practices will give them comfort in using new languages in the future.
- Discuss the use of a full program, as opposed to piecemeal commands.
- Explain the internal file structure of a computer.
- Discuss the frustrations inherent in programming.
- Help students learn how to find help, both online and using their network. Every programmer regularly asks for help on programming issues.

This list will be updated regularly.