Bike Dataset

The bike dataset contains messy information about bike rentals, and is a good dataset for testing performance.

Supplied by the UC Irvine Machine Learning Repository [FanaeeT2013].

class vanguard.datasets.bike.BikeDataset(num_samples=None, training_proportion=0.9, significance=0.025, noise_scale=0.001, rng=None)[source]

Bases: Dataset

Comparison of bike rentals to weather information.

Contains the hourly count of rental bikes between years 2011 and 2012 in Capital bike sharing system with the corresponding weather and seasonal information. Supplied by the UC Irvine Machine Learning Repository [FanaeeT2013].

Parameters:
  • num_samples (Optional[int]) – The number of samples to use. If None, all samples will be used.

  • training_proportion (float) – The proportion of data used for training, defaults to 0.9.

  • significance (float) – The significance used, defaults to 0.025.

  • noise_scale (float) – The standard deviation of a given vector v is taken to be noise_scale * np.abs(v).mean(). Defaults to 0.001.

  • rng (Optional[Generator]) – Generator instance used to generate random numbers.

__init__(num_samples=None, training_proportion=0.9, significance=0.025, noise_scale=0.001, rng=None)[source]

Initialise self.

Parameters:
plot()[source]

Plot the data.

Return type:

None

plot_prediction(pred_y_mean, pred_y_lower, pred_y_upper, y_upper_bound=None, error_width=0.3)[source]

Plot a prediction using its confidence interval.

Parameters:
  • pred_y_mean (Tensor) – Array of prediction means.

  • pred_y_lower (Tensor) – Lower bound of predictions, e.g. from a prediction interval.

  • pred_y_upper (Tensor) – Upper bound of predictions, e.g. from a prediction interval.

  • y_upper_bound (Optional[Tensor]) – If provided, any points in the test set above this value will be discarded from plotting.

  • error_width (float) – Error bar line width.

Return type:

None

plot_y(start=0, stop=5, num_samples=1000)[source]

Visualise the target variable.

Parameters:
  • start (int) – The start of the y-values to be plotted, defaults to 0.

  • stop (int) – The end of the y-values to be plotted, defaults to 5.

  • num_samples (int) – The number of samples to be plotted, defaults to 1,000.

  • start

  • stop

  • num_samples

Return type:

None