class: middle # Projects in Digital Composition: # Communicating with Data
### Getting Data
Matthew J. Lavin Clinical Assistant Professor of English and Director of Digital Media Lab University of Pittsburgh Spring 2019 --- class: middle # What is Data Journalism?
### Let's Review! --- class: middle
### What makes data journalism different to the rest of journalism? Perhaps it is the new possibilities that open up when you combine the traditional ‘nose for news’ and ability to tell a compelling story, with the sheer scale and range of digital information now available.
- Paul Bradshaw, Birmingham City University, BBC
--- class: middle
### Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both. Like any source, it should be treated with scepticism; and like any tool, we should be conscious of how it can shape and restrict the stories that are created with it.
- Paul Bradshaw, Birmingham City University, BBC
--- class: middle
### Journalists should see data as an opportunity. They can, for example, reveal how some abstract threat such as unemployment affects people based on their age, gender, education. Using data transforms something abstract into something everyone can understand and relate to.
- Mirko Lorenz, Deutsche Welle
--- class: middle
### Data journalism is an umbrella term that, to my mind, encompasses an ever-growing set of tools, techniques and approaches to storytelling. ... The unifying goal is a journalistic one: providing information and analysis to help inform us all about important issues of the day.
- Aron Pilhofer,
New York Times
--- class: middle # Examples
- https://www.propublica.org/article/message-machine-you-probably-dont-know-janet - http://jonathanstray.com/a-full-text-visualization-of-the-iraq-war-logs - http://graphics.wsj.com/hamilton/ - https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/ - https://www.nytimes.com/interactive/2017/08/09/upshot/game-of-thrones-chart.html - https://archive.nytimes.com/www.nytimes.com/interactive/2010/02/01/us/budget.html?_r=0 (requires flash) --- class: middle # Getting Data
- datacatalogues.org - http://www.guardian.co.uk/world-government-data - http://thedatahub.org/ - http://data.worldbank.org/ - http://data.un.org/ - https://data-archive.ac.uk/ - http://getthedata.org/ - https://www.data.gov/ --- class: middle # More Resources
- https://dataverse.harvard.edu/ - https://figshare.com/ - https://zenodo.org/ - https://github.com/fivethirtyeight/data - https://www.propublica.org/datastore/datasets - https://toolbox.google.com/datasetsearch - https://archive.ics.uci.edu/ml/datasets.html - https://www.census.gov/data.html - https://ckan.org/about/instances/ - https://registry.opendata.aws/ - http://www.wprdc.org - https://humanitiesdata.com - https://www.kaggle.com/ - https://github.com --- class: middle # Requesting Data
--- class: middle
### Before you make a Freedom of Information (FOI) request you should check to see if the data you are looking for is already available — or has already been requested by others. The previous chapter has some suggestions for where you might look.
--- class: middle
### Before you start submitting a request, check the rules about fees for either submitting requests or receiving information. That way, if a public official suddenly asks you for money, you will know what your rights are.
--- class: middle
### Governments are not obliged to process data for you, but should give you all the data they have, and if it is data that they should have according to perform their legal competencies, they should certainly produce it for you.
--- class: middle # Other Methods
- #### What is Wobbing? - #### APIs - #### Webscraping - #### Upcoding, Crowdsourcing, and Digitizing --- class: middle # What a Dataset Isn't
- A dataset is the computer-friendly or "machine readable" information that powers a visualization. - Summary statistics: A dataset can be used to generate statistics, but summary statistics cannot be used to recreate a dataset - A Graphic: Many web-based graphics store their source data in an accessible way, but many do not. - An online dashboard or data browser: the data can be viewed, but can it be downloaded? Would you have to scrape it to get it? --- class: middle # Activity: Get Data
#### In teams of 3-4 students, see if you can find _datasets_ that meet the criteria on the next slide. The goal of the activity is to practice finding datasets and to demonstrate how much easier it is to find non-sample data by topic than by granularity or type of data. When you are finished, add your dataset links to the following webform: https://goo.gl/forms/ybvDE7gTeHCrTfAd2 #### (For convenience, the list is also repeated on the online form) --- class: middle # Activity: Get Data
1. A geospatial dataset depicting a year in history before 1950 2. Network data about characters in a novel 3. A collection of the full text of movie dialogue 4. Los Angeles County Social Survey, 1992 (LACSS), version 4.0 5. A document corpus with author information suitable for authorship attribution analysis 6. A collection of English language words with a count of how many syllables each word is 7. A dataset that shows patterns of Internet users' web searches 8. A dataset suitable for predicting whether someone's income per year exceeds $50,000 (using any predictors) 9. A dataset of Jeopardy! questions and answers, with information on who rang in and how players scores changed after each clue 10. A dataset depciting the total number of people incarcerated in the United States (can be specific to a day, month, year, or any other interval) 11. An interesting dataset fitting none of these descriptions that you found while doing this activity --- class: middle # Journalism in the Age of Data
- ### http://datajournalism.stanford.edu/noflash.html - ### Clip 1: 4:52 - 6:37 --- class: middle # Journalism in the Age of Data
- ### http://datajournalism.stanford.edu/noflash.html - ### Clip 2: 27:15 - 29:41 --- class: middle # Journalism in the Age of Data
- ### http://datajournalism.stanford.edu/noflash.html - ### Clip 3: 39:18 - 40:43