On this page
- Basic concept: statistics, data, microdata
- Before you begin
- Selected statistics/data resources
- Socio-economic
- Physical geography and climate change
- Natural disasters
- Multidisciplinary
- Downloading datasets
- Example 1: OECD Statistics
- Example 2: Statistics Canada's surveys
- How to cite statistics/data
- Additional resources
- Need more?
This guide aims to help you identify data and statistical resources that can be useful for your lab exercises along with examples from two resources: OECD and Statistics Canada.
Basic concept: statistics, data, microdata
In everyday life, we may get away with using the word "data" in an ambiguous way to refer to anything to do with numbers, but when you need to search for data for research, its important to be able to differentiate between a few important concepts: statistics, data and microdata.
Statistics
Statistics are data that are already aggregated. Tables of information, like years of education vs mean income; bar charts; pie charts are statistics. Or, to put it a different way, data turns into statistics once some sort of analysis has occurred.
Data
Numeric data is the raw information used to create statistics. Data can take many forms, such as survey data and environmental data (i.e. temperature and rainfall information for a particular weather station).
Microdata
Microdata is the raw information produced by surveys or Censuses. Surveys conducted by government organizations, private or not-for-profit organizations, or academic researchers are a common source of data for research. Each piece of data collected about a survey respondent is referred to as a variable, and each individual’s response is referred to as a case.
Microdata are composed of individual records containing information collected on persons, households, or other entities. The responses of each person to the different questions are recorded in separate variables.
Understandably, compared to statistics, data or microdata allows you more flexibility in generating your own statistics in any manner desired, including individual-level multivariate analysis.
Before you begin
Data is available everywhere but not all of it is accessible or usable. Ask yourself: What am I looking for and who may publish it? Data are produced by governmental agencies, non-governmental agencies, researchers, associations, think tanks and more.
Consider:
- Source
- Is the data from a reliable source?
- Have they included the methodology and all relevant information you'll need to understand and use the data?
- Scope
- Are the variables you're looking for present? If not, can you work with what is available?
- Does the geography fit your research?
- Is the data current or out-dated?
- Accessibility
- Is the data available in a format you can use?
- Is the data available? What restrictions are placed on its use?
- Presentation
- Is the data accurate or is it misleading? Is there anything missing?
Selected statistics/data resources
Socio-economic
Statistics Canada's Public Use Microdata Files (PUMFs)
Microdata files for a range of Statistics Canada surveys on various topics covering economy, income, health, travel, impact of COVID-19, and more.
Abacus
Another way to download the microdata files for Statistics Canada surveys is through Abacus (British Columbia Research Libraries' Data Services, Abacus Dataverse Network). Search by survey name in "quotations."
OECD
Statistics published by the Organisation for Economic Co-operation and Development. Topics covered include agriculture, developing economies, education, employment, energy, environment, migration, social issues, and sustainable development. Refer to the section "Downloading Datasets" below for how to download data from there.
WHO
The World Health Organization manages and maintains a wide range of data collections related to global health and well-being as mandated by their Member States. While most of the data you find there are statistics, a fraction of files are microdata, for example, this NCD Microdata Repository (after you choose a survey, click on the "Get microdata" to download the dataset, and "Documentation" and "Data Description" to view metadata.)
Physical geography and climate change
Historical Canadian Climate Data
Search by station name or province/territory. Historical data on max temp, min temp, mean temp, total rain, total snow, etc by 'Day'.
ClimateData.ca
Canadian historical data on temperature, precipitation, frost days, and more. A map-searching interface allows you to choose a place and download data in CSV.
Climate Change Knowledge Portal (CCKP)
Developed by the World Bank Group, the Climate Change Knowledge Portal (CCKP) provides global data on historical and future climate, vulnerabilities, and impacts.
Natural disasters
Earthquakes Canada
Search the earthquake database to find data on earthquakes in Canada. You can set a time period and geographic extent.
National Forestry Database by Canadian Council of Forest Ministers
Statistics on the number of fires by causes, area burned by causes, number of fires by month, number of fires by categories of fire size, etc.
Multidisciplinary
FRDR
Canada's federated research data repositories which collect research data from various Canadian universities.
Downloading datasets
Example 1: OECD Statistics
- Click on Browse by Topic, there are dozens of categories and each category includes many sub-topics. For example, the category “Society” includes these sub-topics: demography, inequality, migration, popoulation by region, social protection.
- Here is an example of how to download an indicator’s data.
- Choose Economy, and then GDP and Spending, and then Gross Domestic Product (GDP)
- You can further filter your data by country and time period.
- Click on “Countries”, you can select one or more countries from all the countries. Note that the countries appear in three-character codes, but you can search by entering keywords. For example, you can search for “Canada”, and CAN will be retrieved.
- You can also drag the time scale to select a time period you are interested in.
- Then you can download your data by selecting “Selected data only” under the Download button.
Example 2: Statistics Canada's surveys
This data resource represents microdata.
Here are step-by-step instructions on how to find and download microdata from Statistics Canada.
1. Visit the website
Visit Statistics Canada's Public Use Microdata Files (PUMFs) portal.
2. Select a survey
There are totally 145 Public Use Microdata files for a range of surveys on various topics covering economy, income, health, travel, impact of COVID-19, and more. Browse these files to find one that interests you.
We'll use Labour Force Survey (LFS) as an example to demonstrate how to find, download, and explore microdata from Statistics Canada.
You should try to read more information about the survey. The link (highlighted in the image below) will direct you to a page where you can find more information about the survey.
Information like the image shown below is called Metadata, which usually includes information on for what purposes the data was collected, who created the data, the time period the data covers, geographic coverage, as well as the methodology and processing.
3. Select a time period
Many surveys of Statistics Canada are time-series, which means you will find data for the same survey at different times.
Select a time period you are interested in for this survey.
4. Download files
The file download page also links to the description page about the particular survey. You may want to learn more about the questionnaires, data sources and methods used in collecting/analyzing the dat for the survey.
Download the data file in .csv or txt. Both formats can be opened in Excel.
The package you download include a data file and a documentation of the variables included in the data file.
If you open the data file, you will see that each row represents a case while each column presents a variable. However, you won't understand what these variables mean or what the numbers represent, right? This is where the documentation of the variables comes in! Read the next section for details.
5. Review the variables
The variables documentation contains useful information about each variable. This document is actually part of what's called Metadata, which means
For example, below is information on the variable "Province"(on the data file it is named PROV). You can find which values correspond to which labels for this variable. This information will be useful when you are running statistical analysis in Excel.
Quiz: what is the value corresponding to BC?
6. Explore the data
Now that you know where to find information about the variables, you are set to explore the data statistically in Excel. Here are some ideas to get you started:
Descriptive statistics
Choose a variable, can you generate mean, standard deviation, or frequency of the variable?
Subtract data
Can you filter your data to the respondents in British Columbia only?
Cross-tabulation
Can you generate a table showing the percentages of male and female in different labour force status (employed, unemployed, etc)?
Hint: use these two variables: "Sex of respondent" and "Labour force status".
How to cite statistics/data
Please refer to this guide Citing guide for Statistics Canada, PCensus, & CHASS data.
Additional resources
Refer to the SFU Library's Guides on Data & Statistics,
Interested in learning more about the open government data movement? Read this article Open government data and environmental science: a federal Canadian perspective.
Need more?
Contact Sarah (Tong) Zhang at tza68@sfu.ca or the Data Services Team at data-services@sfu.ca for further help.