Geog 251 Quantitative Geography -- Finding Data Guide

 This guide aims to help you identify data and statistical resources that can be useful for your lab exercises along with examples from two resources: OECD and Statistics Canada.   

Basic concept: statistics, data, microdata

In everyday life, we may get away with using the word "data" in an ambiguous way to refer to anything to do with numbers, but when you need to search for data for research, its important to be able to differentiate between a few important concepts: statistics, data and microdata. 

Statistics
Statistics are data that are already aggregated. Tables of information, like years of education vs mean income; bar charts; pie charts are statistics. Or, to put it a different way, data turns into statistics once some sort of analysis has occurred.

Data
Numeric data is the raw information used to create statistics.  Data can take many forms, such as survey data and environmental data (i.e. temperature and rainfall information for a particular weather station). 

Microdata
Microdata is the raw information produced by surveys or Censuses. Surveys conducted by government organizations, private or not-for-profit organizations, or academic researchers are a common source of data for research. Each piece of data collected about a survey respondent is referred to as a variable, and each individual’s response is referred to as a case

Microdata are composed of individual records containing information collected on persons, households, or other entities. The responses of each person to the different questions are recorded in separate variables. 

Understandably, compared to statistics, data or microdata allows you more flexibility in generating your own statistics in any manner desired, including individual-level multivariate analysis.

Before you begin

Data is available everywhere but not all of it is accessible or usable. Ask yourself: What am I looking for and who may publish it? Data are produced by governmental agencies, non-governmental agencies, researchers, associations, think tanks and more.

Consider:

  • Source
    • Is the data from a reliable source?
    • Have they included the methodology and all relevant information you'll need to understand and use the data?
  • Scope
    • Are the variables you're looking for present? If not, can you work with what is available?
    • Does the geography fit your research?
    • Is the data current or out-dated?
  • Accessibility
    • Is the data available in a format you can use?
    • Is the data available? What restrictions are placed on its use?
  • Presentation
    • Is the data accurate or is it misleading? Is there anything missing?

Selected statistics/data resources

Socio-economic

Statistics Canada's Public Use Microdata Files (PUMFs)
Microdata files for a range of Statistics Canada surveys on various topics covering economy, income, health, travel, impact of COVID-19, and more.

Abacus 
Another way to download the microdata files for Statistics Canada surveys is through Abacus (British Columbia Research Libraries' Data Services, Abacus Dataverse Network).  Search by survey name in "quotations." 

OECD 
Statistics published by the Organisation for Economic Co-operation and Development. Topics covered include agriculture, developing economies, education, employment, energy, environment, migration, social issues, and sustainable development. Refer to the section "Downloading Datasets" below for how to download data from there. 

WHO 
The World Health Organization manages and maintains a wide range of data collections related to global health and well-being as mandated by their Member States. While most of the data you find there are statistics, a fraction of files are microdata, for example, this NCD Microdata Repository (after you choose a survey, click on the "Get microdata" to download the dataset, and "Documentation" and "Data Description" to view metadata.)

Physical geography and climate change 

Historical Canadian Climate Data 
Search by station name or province/territory. Historical data on max temp, min temp, mean temp, total rain, total snow, etc by 'Day'. 

ClimateData.ca 
Canadian historical data on temperature, precipitation, frost days, and more. A map-searching interface allows you to choose a place and download data in CSV.

Climate Change Knowledge Portal (CCKP)
Developed by the World Bank Group, the Climate Change Knowledge Portal (CCKP) provides global data on historical and future climate, vulnerabilities, and impacts.

Natural disasters

Earthquakes Canada
Search the earthquake database to find data on earthquakes in Canada. You can set a time period and geographic extent. 

National Forestry Database by Canadian Council of Forest Ministers 
Statistics on the number of fires by causes, area burned by causes, number of fires by month, number of fires by categories of fire size, etc. 

Multidisciplinary 

FRDR
Canada's federated research data repositories which collect research data from various Canadian universities.

Downloading datasets 

Example 1: OECD Statistics 

  • Click on Browse by Topic, there are dozens of categories and each category includes many  sub-topics. For example, the category “Society” includes these sub-topics: demography, inequality, migration, popoulation by region, social protection. 
  • Here is an example of how to download an indicator’s data. 
  • Choose Economy, and then GDP and Spending, and then Gross Domestic Product (GDP)
  • You can further filter your data by country and time period.  OECD interface
  • Click on “Countries”, you can select one or more countries from all the countries. Note that the countries appear in three-character codes, but you can search by entering keywords. For example, you can search for “Canada”, and CAN will be retrieved.  
  • You can also drag the time scale to select a time period you are interested in.
  • Then you can download your data by selecting “Selected data only” under the Download button. 

Example 2:  Statistics Canada's surveys 

This data resource represents microdata

Here are step-by-step instructions on how to find and download microdata from Statistics Canada. 

1. Visit the website 

Visit Statistics Canada's Public Use Microdata Files (PUMFs) portal. 

2. Select a survey 

There are totally 145 Public Use Microdata files for a range of surveys on various topics covering economy, income, health, travel, impact of COVID-19, and more. Browse these files to find one that interests you.  

Stats Can microdata portal

We'll use Labour Force Survey (LFS) as an example to demonstrate how to find, download, and explore microdata from Statistics Canada. 

You should try to read more information about the survey. The link (highlighted in the image below) will direct you to a page where you can find more information about the survey. 

Stats Can LFS link

 

Information like the image shown below is called Metadata, which usually includes information on for what purposes the data was collected, who created the data, the time period the data covers, geographic coverage, as well as the methodology and processing. 

Stats Can LFS metadata

3.  Select a time period 

Many surveys of Statistics Canada are time-series, which means you will find data for the same survey at different times. 

Select a time period you are interested in for this survey.  

Stats Can LFS time series

 

4. Download files 

The file download page also links to the description page about the particular survey. You may want to learn more about the questionnaires, data sources and methods used in collecting/analyzing the dat for the survey.   

Download the data file in .csv or txt.  Both formats can be opened in Excel.  

The package you download include a data file and a documentation of the variables included in the data file.  

If you open the data file, you will see that each row represents a case while each column presents a variable.  However, you won't understand what these variables mean or what the numbers represent, right?  This is where the documentation of the variables comes in! Read the next section for details.  

Stats Can LFS data file

5. Review the variables 

The variables documentation contains useful information about each variable. This document is actually part of what's called Metadata, which means

For example, below is information on the variable "Province"(on the data file it is named PROV). You can find which values correspond to which labels for this variable.  This information will be useful when you are running statistical analysis in Excel. 

Stats Can LFS variables

Quiz: what is the value corresponding to BC?  

6. Explore the data 

Now that you know where to find information about the variables, you are set to explore the data statistically in Excel.  Here are some ideas to get you started: 

Descriptive statistics 

Choose a variable, can you generate mean, standard deviation, or frequency of the variable?  

Subtract data 

Can you filter your data to the respondents in British Columbia only?   

Cross-tabulation 

Can you generate a table showing the percentages of male and female in different labour force status (employed, unemployed, etc)? 

Hint: use these two variables: "Sex of respondent" and  "Labour force status". 

How to cite statistics/data 

Please refer to this guide Citing guide for Statistics Canada, PCensus, & CHASS data

Additional resources

Refer to the SFU Library's Guides on Data & Statistics,

Interested in learning more about the open government data movement?  Read this article Open government data and environmental science: a federal Canadian perspective

Need more?

Contact Sarah (Tong) Zhang at tza68@sfu.ca or the Data Services Team at data-services@sfu.ca for further help.