Datasets

Created on Mon Feb 27 18:16:57 2023

The methods and classes considered define the type of data set used. They are Sliding Windows over the individual time series out of pd.DataFrames @author: Tobias Westmeier CR/AMP4

class softsensor.datasets.SlidingWindow(df, windowsize, output_columns, input_columns, Add_zeros=False, rnn_window=None, forecast=1, full_ds=True, pre_comp=None)[source]

Initialize a sliding window class from a pandas Dataframe in torch.utils.Dataset format with a tuple of input and output data The dataframe is split into individual tensors with a length of windowsize and width if len(input_columns) using a sliding window approach for the input data if rnn_window is used, two tensors are generated for each time step, where the additional tensor defines the past output data with length of rnn_window and width of len(output_columns)

Parameters:
  • df (pd.DataFrame) – pandas DataFrame with named columns and time dependent data.

  • windowsize (int) – sliding window length.

  • output_columns (list of str) – list of columns used for output.

  • input_columns (list of str) – list of columns used for input.

  • Add_zeros (TYPE, optional) – Adds zeros at the beginning of the time series to generate a dataset with as many inputs as the length of the windowsize. The default is False.

  • rnn_window (int, optional) – use an additional sliding window as input in which the past output values are saved. The default is None.

  • pre_comp (list of str, optional) – precomputed solution for models with student forcing used for second input instead of output columns. The default is None.

Return type:

None

Notes

The use of a recurrent time window (rnn_window) ensures that zeros are artificially placed at the beginning of the time series to keep the dimension of the time series constant in the output SlidingWindow.__len__() returns length of df). If a rnn_window is specified, the time series is shortened. If the output length is to be kept, Add_zeros=True must be used.

Examples

Example of a pure feed forward SlidingWindow dataset

>>> import softsensor.datasets as ds
>>> import pandas as pd
>>> import numpy as np
>>> d = {'in_col': np.linspace(0, 100, 101),
         'out_col': np.linspace(100, 0, 101)}
>>> df = pd.DataFrame(d)
>>> sw = ds.SlidingWindow(df, 10, ['out_col'], ['in_col'])
>>> print(sw.__len__())
92
>>> print(sw.__getitem__(1))
(tensor([[0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]]), tensor([[91.]]))

Example of a SlidingWindow dataset with recurrent connection

>>> import softsensor.datasets as ds
>>> import pandas as pd
>>> import numpy as np
>>> d = {'in_col': np.linspace(0, 100, 101),
         'in_col2': np.linspace(0, 100, 101),
         'out_col': np.linspace(100, 0, 101)}
>>> df = pd.DataFrame(d)
>>> sw = ds.SlidingWindow(df, 3, ['out_col'], ['in_col'], rnn_window=3)
>>> print(sw.__len__())
101
>>> print(sw.__getitem__(2))
((tensor([[0., 1., 2.]]), tensor([[  0.,   100., 99.]])), tensor([[98.]]))

Example of a SlidingWindow dataset with recurrent connection and precomputed prediction

>>> import softsensor.datasets as ds
>>> import pandas as pd
>>> import numpy as np
>>> d = {'in_col': np.linspace(0, 100, 101),
         'in_col2': np.linspace(0, 100, 101),
         'out_col': np.linspace(100, 0, 101),
         'out_col_precomp': np.linspace(100, 0, 101) + 100}
>>> df = pd.DataFrame(d)
>>> sw = ds.SlidingWindow(df, 3, ['out_col'], ['in_col'], rnn_window=3,
                          pre_comp=['out_col_precomp'])
>>> print(sw.__len__())
101
>>> print(sw.__getitem__(2))
((tensor([[0., 1., 2.]]), tensor([[  0., 200., 199.]])), tensor([[98.]]))
__getitem__(index)[source]

Takes index and gives back time window for input and subsequent target output

Parameters:

index (int: the starting point for the time window)

Returns:

  • x (torch.Tensor) – if rnn_window is None: Tensor of shape [len(input_columns), windowsize] if rnn_windwo is not None: tuple of Tensors with shape ([len(input_columns), windowsize], [len(output_columns), rnn_window])

  • y (torch.Tensor) – Tensor of shape [output_channels, 1]

__len__()[source]
Returns:

number_of_samples – Samples in Dataset

Return type:

int

class softsensor.datasets.batch_rec_SW(list_of_sw)[source]

Batching of multiple sliding window classes for parallelisation purposes

Parameters:

list_of_sw (list of SlidingWindow) – list of individual SlidingWindow classes for batching

Return type:

None.

__getitem__(index)[source]

Takes index and gives back time window for input and subsequent target output. Ouput is batches for individual time series.

batch_size depends on the numbr of SlidingWindow datasaets that have .__len__() >= index

Parameters:

index (int: the starting point for the time window)

Returns:

  • x ((torch.Tensor, torch.Tensor)) – Tensor with shape ([batch_size, len(input_columns), windowsize], [batch_size, len(output_columns), rnn_window])

  • y (torch.Tensor) – Tensor of shape [batch_size, output_channels, 1]

__len__()[source]
Returns:

number_of_samples – Samples in Dataset

Return type:

int

permutation()[source]
Returns:

list of int dim – returns original indizes before permutation

Return type:

[len(list_of_sw)]

class softsensor.datasets.ff_SlidingWindow(windowsize, data_x, data_y, Add_zeros, forecast, full_ds)[source]

Subclass for SlidingWindow class (feed forward) two tensors are generated for each time step, where one tensor defines the input x and the second one the past output data with length of rnn_window and width of len(output_columns)

Parameters:
  • windowsize (int) – sliding window length.

  • data_x (torch.tensor) – input data

  • data_y (torch.tensor) – output data

  • Add_zeros (bool) – Adds zeros at the beginning of the time series with length of windowsize-1.

Return type:

None

__getitem__(index)[source]

Takes index and gives back time window for input and subsequent target output

Parameters:

index (int: the starting point for the time window)

Returns:

  • x (torch.Tensor) – shape: [len(input_columns), windowsize]

  • y (torch.Tensor) – Tensor of shape [output_channels, 1]

__len__()[source]
Returns:

number_of_samples – Samples in Dataset

Return type:

int

class softsensor.datasets.rec_SlidingWindow(windowsize, data_x, data_y, rnn_window, forecast, full_ds, data_y_pre=None, pre=False)[source]

Subclass for SlidingWindow class two tensors are generated for each time step, where one tensor defines the input x and the second one the past output data with length of rnn_window and width of len(output_columns)

Parameters:
  • windowsize (int) – sliding window length.

  • data_x (torch.tensor) – input data

  • data_y (torch.tensor) – output data as well as data that serves as second input

  • rnn_window (int, optional) – use an additional sliding window as input in which the past output values are saved. The default is None.

Return type:

None

__getitem__(index)[source]

Takes index and gives back time window for input and subsequent target output

Parameters:

index (int: the starting point for the time window)

Returns:

  • x (torch.Tensor) – tuple of Tensors with shape ([len(input_columns), windowsize], [len(output_columns), rnn_window])

  • y (torch.Tensor) – Tensor of shape [output_channels, 1]

__len__()[source]
Returns:

number_of_samples – Samples in Dataset

Return type:

int