Datasets
Created on Mon Feb 27 18:16:57 2023
The methods and classes considered define the type of data set used. They are Sliding Windows over the individual time series out of pd.DataFrames @author: Tobias Westmeier CR/AMP4
- class softsensor.datasets.SlidingWindow(df, windowsize, output_columns, input_columns, Add_zeros=False, rnn_window=None, forecast=1, full_ds=True, pre_comp=None)[source]
Initialize a sliding window class from a pandas Dataframe in torch.utils.Dataset format with a tuple of input and output data The dataframe is split into individual tensors with a length of windowsize and width if len(input_columns) using a sliding window approach for the input data if rnn_window is used, two tensors are generated for each time step, where the additional tensor defines the past output data with length of rnn_window and width of len(output_columns)
- Parameters:
df (pd.DataFrame) – pandas DataFrame with named columns and time dependent data.
windowsize (int) – sliding window length.
output_columns (list of str) – list of columns used for output.
input_columns (list of str) – list of columns used for input.
Add_zeros (TYPE, optional) – Adds zeros at the beginning of the time series to generate a dataset with as many inputs as the length of the windowsize. The default is False.
rnn_window (int, optional) – use an additional sliding window as input in which the past output values are saved. The default is None.
pre_comp (list of str, optional) – precomputed solution for models with student forcing used for second input instead of output columns. The default is None.
- Return type:
None
Notes
The use of a recurrent time window (rnn_window) ensures that zeros are artificially placed at the beginning of the time series to keep the dimension of the time series constant in the output SlidingWindow.__len__() returns length of df). If a rnn_window is specified, the time series is shortened. If the output length is to be kept, Add_zeros=True must be used.
Examples
Example of a pure feed forward SlidingWindow dataset
>>> import softsensor.datasets as ds >>> import pandas as pd >>> import numpy as np >>> d = {'in_col': np.linspace(0, 100, 101), 'out_col': np.linspace(100, 0, 101)} >>> df = pd.DataFrame(d) >>> sw = ds.SlidingWindow(df, 10, ['out_col'], ['in_col']) >>> print(sw.__len__()) 92 >>> print(sw.__getitem__(1)) (tensor([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]]), tensor([[91.]]))
Example of a SlidingWindow dataset with recurrent connection
>>> import softsensor.datasets as ds >>> import pandas as pd >>> import numpy as np >>> d = {'in_col': np.linspace(0, 100, 101), 'in_col2': np.linspace(0, 100, 101), 'out_col': np.linspace(100, 0, 101)} >>> df = pd.DataFrame(d) >>> sw = ds.SlidingWindow(df, 3, ['out_col'], ['in_col'], rnn_window=3) >>> print(sw.__len__()) 101 >>> print(sw.__getitem__(2)) ((tensor([[0., 1., 2.]]), tensor([[ 0., 100., 99.]])), tensor([[98.]]))
Example of a SlidingWindow dataset with recurrent connection and precomputed prediction
>>> import softsensor.datasets as ds >>> import pandas as pd >>> import numpy as np >>> d = {'in_col': np.linspace(0, 100, 101), 'in_col2': np.linspace(0, 100, 101), 'out_col': np.linspace(100, 0, 101), 'out_col_precomp': np.linspace(100, 0, 101) + 100} >>> df = pd.DataFrame(d) >>> sw = ds.SlidingWindow(df, 3, ['out_col'], ['in_col'], rnn_window=3, pre_comp=['out_col_precomp']) >>> print(sw.__len__()) 101 >>> print(sw.__getitem__(2)) ((tensor([[0., 1., 2.]]), tensor([[ 0., 200., 199.]])), tensor([[98.]]))
- __getitem__(index)[source]
Takes index and gives back time window for input and subsequent target output
- Parameters:
index (int: the starting point for the time window)
- Returns:
x (torch.Tensor) – if rnn_window is None: Tensor of shape [len(input_columns), windowsize] if rnn_windwo is not None: tuple of Tensors with shape ([len(input_columns), windowsize], [len(output_columns), rnn_window])
y (torch.Tensor) – Tensor of shape [output_channels, 1]
- class softsensor.datasets.batch_rec_SW(list_of_sw)[source]
Batching of multiple sliding window classes for parallelisation purposes
- Parameters:
list_of_sw (list of SlidingWindow) – list of individual SlidingWindow classes for batching
- Return type:
None.
- __getitem__(index)[source]
Takes index and gives back time window for input and subsequent target output. Ouput is batches for individual time series.
batch_size depends on the numbr of SlidingWindow datasaets that have .__len__() >= index
- Parameters:
index (int: the starting point for the time window)
- Returns:
x ((torch.Tensor, torch.Tensor)) – Tensor with shape ([batch_size, len(input_columns), windowsize], [batch_size, len(output_columns), rnn_window])
y (torch.Tensor) – Tensor of shape [batch_size, output_channels, 1]
- class softsensor.datasets.ff_SlidingWindow(windowsize, data_x, data_y, Add_zeros, forecast, full_ds)[source]
Subclass for SlidingWindow class (feed forward) two tensors are generated for each time step, where one tensor defines the input x and the second one the past output data with length of rnn_window and width of len(output_columns)
- Parameters:
- Return type:
None
- class softsensor.datasets.rec_SlidingWindow(windowsize, data_x, data_y, rnn_window, forecast, full_ds, data_y_pre=None, pre=False)[source]
Subclass for SlidingWindow class two tensors are generated for each time step, where one tensor defines the input x and the second one the past output data with length of rnn_window and width of len(output_columns)
- Parameters:
- Return type:
None
- __getitem__(index)[source]
Takes index and gives back time window for input and subsequent target output
- Parameters:
index (int: the starting point for the time window)
- Returns:
x (torch.Tensor) – tuple of Tensors with shape ([len(input_columns), windowsize], [len(output_columns), rnn_window])
y (torch.Tensor) – Tensor of shape [output_channels, 1]