Read the following instructions carefully.
functions, lists, dictionariescvs, matplotlib, pytest. In order to learn these packages, you need go through the examples via the links in this file.Data can have many formats (.xls, .json, .txt, ...). The .csv format is one of the most simple, portable and popular format. .csv data files can be read easily using the csv module.
Loading data in CSV files into your Python script can be done as the example shown in Lecture 4.
Other examples can be found in the official Python documentations.
The most straightforward way to import csv-data is to create a table (that is, a list of lists). However, it is often not the most convenient way to handle this data as it requires to go through the whole table when searching for a specific element. Dictionaries are often a much better alternative.
Note: The first row of the CSV file usually contains the header/names of each column, which need to treated seperately.
CO2Emissions_filtered.csv from the course website. If necessary, open it in a spreadsheet program (e.g. Excel, Numbers, Sheets, depending on the OS) to grasp how the data is structured. Write a function load_csv(filename) that takes a string filename as argument and returns a dictionary that takes the country code (all in the lowercases) as the key and the list of yearly CO2 emission history as the value. Tip: similar to list-comprehensions, dict-comprehensions can be used to neatly build dictionaries!
l = [
['a', '10', '9'],
['C', '-8', '0'],
['P', '4', '2']
]
d = {v[0]: v[1:] for v in l}
print(d)
> {'a': ['10', '9'], 'C': ['-8', '0'], 'P': ['4', '2']}
Remark 1: The data loaded from CSV are string type, it would be better to convert and save the numbers to float type in the list for future use. The easy ways to convert a list of string to a list of float is to use the list-comprehensions or map() function, for example
strList = ['321','5433', '11']
# Use list comprehension
floatListComp = [float(s) for s in strList]
# Use map() function
floatListMap = list(map(float, strList))
Remark 2: The counrty codes in CO2Emissions_filtered.csv are originally in the uppercases. In order to use it with the other data in this MINI project, you have to convert them to the lowercases. You can do this during the dict-comprehensions with the function lower(), see the official document and search on the internet to find how to use it.
Optional Task: Try to do the type conversion while loading the data from the csv file into the dictionary. This can be done in one line of code. You might need to use both list- and dict-comprehensions.
Matplotlib is the most popular Python module for plotting. It offers many customization options, and similar plotting functionnalities as Matlab.
You might need to install this module on your computer, the installation instruction is here. If there is any problem with the installation, do not hesitate to ask the teacher.
Official examples of matplotlib are available here and there.
Let us take a closer look at one of the examples provided in Lecture 4
import matplotlib.pyplot as plt
from math import pi,sin
# Data for plotting
L = 100
time = list(range(L))
Voltage = [sin(2*pi*t/L) for t in time]
fig, ax = plt.subplots()
ax.plot(time, Voltage)
ax.set(xlabel='time (s)', ylabel='voltage (mV)', title='Example 1')
ax.grid()
fig.savefig("test.png")
plt.show()
import matplotlib.pyplot as plt to use the plot functions from matplotlib module and rename it to plt.fig, ax = plt.subplots sets up the figure's frame.ax.plot(time, Voltage) plots the the data contained in time and Voltage. These are lists, but can also be 'arraies', which will be covered later during the course.ax.set(xlabel='time (s)', ylabel='voltage (mV)', title='Example 1') sets the labels for each axis, as well as the title of the figure.ax.grid() turns on the grid in the figure, useful when you need to read values from the curve.fig.savefig("test.png") saves the figure to a .png files. Matplotlib supports many other formats. In particular, it supports vector graphics (.pdf, .eps, .svg), which are usually of better quality than bitmap formats (.png and .jpg).plt.show() displays the figure in a new window.Tip: The most common way to learn matplotlib is through sample codes. It is important that you try some of the examples before you move on to your own project.
'dnk', 'fin', 'isl', 'nor', 'swe', correspondingly. You can define the time in a list as time = list(range(1960, 2015)). You should see some variations, or noise, in the data. smooth_a and smooth_b.
Use smooth_a and smooth_b to even out the data over a total period of 11 years, i.e. 5 data points around a middle year. Illustrate the result of smooth_a with a solid curve, smooth_b with dashed curve and the original data with dotted curve, for all five countries. You will plot 15 dataset in total. Make sure that you don't duplicate code without good reason. Use one color per country to separate them, but let all curves for a single country use the same color. Below, an example is displayed of how the result may look like, but choose what exact look you prefer as long as the above stated requirements are satisfied.%run plot2D.py
import matplotlib.pyplot as plt
import random
random.seed(20191126)
fig, ax = plt.subplots(figsize=(6, 6))
N = 50
for color in ['tab:blue', 'tab:orange', 'tab:green']:
x = [200**random.random() for i in range(N)]
y = [200**random.random() for i in range(N)]
scale = [500*random.random() for i in range(N)]
ax.scatter(x, y, s=scale, c=color, label=color, alpha=0.3, edgecolors='none')
ax.legend()
plt.show()
Here, the data is actually four dimensional: coordinate x, coordinate y, color and scale.
The coordinates are random numbers between 0 and 200 with some exponential distributions.
The size of each point is also set randomly between 0 and 200.
ax.scatter(x, y, c=color, s=scale, label=color, alpha=0.3, edgecolors='none') is the most relevant part of this code, it takes the x and y coordinates of the data as first and second arguments, respectively.
It also the color (using the c optional argument) and the size (the scale optional argument).
The argument alpha between 0 to 1 is the transparency level.
population.csv. Import it with your import_csv() function and, as a sanity check, plot the population of Bolivia, Venezuela, Chile, Ecuador and Paraguay. Add x- and y- labels, a title, and a legend.Data often comes from different sources therefore have different format, or missing elements. In this condition, when dealing with different sources, it is necessary to reformat the data. Here the two datasets CO2Emissions_filtered.csv and population.csv do not contain the same countries (some countries were removed from the first data set because of missing data for the whole period 1960-2014). Write a function intersection(list_1, list_2), for the lists of country codes from the two datasets, return the country codes which are both in list_1 and list_2. If necessary, write some tests with pytest to convince yourself that your function is working properly.
Example:
intersection(['fra', 'deu', 'ita', 'nld', 'lux'], ['bel', 'ita', 'fra', 'nld'])
> ['fra', 'ita', 'nld']
log-log scale to plot, add x- and y- labels and a title. Annotate every data point with its country code. An example of the figure is shown as following.%run plotScatter.py
country_continent.csv. A example of the figure is shown as following.%run plotScatterContinent.py
coolwarm as colormap). Because some countries (China, the US, and India for instance) have increased their emission tremendously more than others, it is difficult to see differences between all the other countries. scatter can take two optional parameters to set the range of the colormap, that is vmin and vmax. Adjust the colormap to fix this issue. An example of the output is shown below.%run plotScatterDiff.py