Datasets¶

In this section, the package provides four example datasets: First two are used in large number of CNLS/StoNED liturature; the others are commonly used in the SFA liturature. In the Examples, our tutorials will resort to these example data.

Import internal data¶

Finnish electricity firm data

# import dataset module
from pystoned.dataset import load_Finnish_electricity_firm

# import all data (including the contextual varibale)
data = load_Finnish_electricity_firm(x_select=['Energy', 'Length', 'Customers'],
                                        y_select=['TOTEX'],
                                        z_select=['PerUndGr'])
x, y, z = data.x, data.y, data.z

# print data
print(x)
print(y)
print(z)

# (OR) import data (only inputs and output)
data = load_Finnish_electricity_firm(x_select=['Energy', 'Length', 'Customers'],
                                        y_select=['TOTEX'])
x, y = data.x, data.y

# print data
print(x)
print(y)

import OECD GHG emissions data

# import dataset module
from pystoned.dataset import load_GHG_abatement_cost

# import all data
data = load_GHG_abatement_cost(x_select=['HRSN', 'CPNK'],
                                y_select=['VALK'],
                                b_select=['GHG'])
x, y, b = data.x, data.y, data.b

# print data
print(x)
print(y)
print(b)

import Tim Coelli’s Frontier 4.1 data

# import dataset module
from pystoned.dataset import load_Tim_Coelli_frontier

# import all data
data = load_Tim_Coelli_frontier(x_select=['capital', 'labour'],
                                    y_select=['output'])
x, y = data.x, data.y

# print data
print(x)
print(y)

import rice production data

# import dataset module
from pystoned.dataset import load_Philipines_rice_production

# import all data
data = load_Philipines_rice_production(x_select=['AREA', 'LABOR', 'NPK', 'OTHER', 'AREAP', 'LABORP', 'NPKP', 'OTHERP'],
                                            y_select=['PROD', 'PRICE'])
x, y = data.x, data.y

# print data
print(x)
print(y)

# (OR) import partial data (two input-one output)
data = load_Philipines_rice_production(x_select=['LABOR', 'NPK'],
                                            y_select=['PROD'])
x, y = data.x, data.y

# print data
print(x)
print(y)

Import external data¶

Assuming that we have a dataset like the following example in Book1.xlsx, we then use the Panda to read the Excel file and organize the data using the Numpy.

ID	output	input1	input2	input3	z_var
i1	120	10	55	103	0.8
i2	80	30	49	120	0.6
i3	90	25	72	150	0.3
i4	110	16	39	100	0.5
…	…	…	…	…	…

# import basic modules
import numpy as np
import pandas as pd

# import Excel data
df = pd.read_excel("Book1.xlsx")

# output: y
y = df['output']

# inputs: X
x1 = df['input1']
x1 = np.asmatrix(x1).T
x2 = df['input2']
x2 = np.asmatrix(x2).T
x3 = df['input3']
x3 = np.asmatrix(x3).T
x  = np.concatenate((x1, x2, x3), axis=1)

# contextual Variable: z
z = df['z_var']