Bike Rentals Prediction Case Study

Author

Wil Jones

Published

June 14, 2025

Bike Rentals Prediction Case Study

Project Summary

This project aimed to predict hourly bike rental demand using deep learning techniques applied to time-series data spanning 2011 through 2023. Key objectives included capturing behavioral and seasonal patterns, leveraging cyclical feature transformations, and testing generalization via a forward-chaining holdout strategy. The true holdout period for testing was the final two months of the dataset: November and December of 2023.

Tools & Technologies

Python for scripting and modeling
TensorFlow / Keras for deep learning model architecture and training
Pandas & NumPy for data wrangling and numerical operations
Scikit-learn for preprocessing pipelines and regression metrics
Matplotlib for exploratory data visualizations
Quarto for documentation and web rendering

Key Techniques Used

Feature Engineering

Created contextual and temporal encodings, including covid_phase, holiday, is_weekend, and cyclical encodings of hour, month, and day_of_year using sine and cosine transforms.
Dropped redundant variables to reduce multicollinearity and simplify the model’s feature space.

Neural Network Design & Training

Implemented a 3-layer feedforward neural network (Dense 128 → 64 → 16) with ReLU activations.
Incorporated dropout layers (0.1, 0.4, 0.4) and batch normalization to improve generalization and prevent overfitting.
Used ExponentialDecay for learning rate adjustment, along with early stopping and model checkpointing.

Evaluation Protocol

Assessed performance using MAE, RMSE, Median AE, and R².
Stratified evaluation by pre-/post-December 20th periods to observe predictive drift near holiday anomalies.
Supplemented quantitative metrics with residual plots and error band visualizations.

Data Split Strategy

Training included all observations prior to October 31.
November and December of 2023 were strictly withheld as a realistic holdout set.
This forward validation strategy simulates production deployment in time-sensitive forecasting scenarios.

Key Insights

Redundant Feature Elimination

Figure 1. Correlation Heatmap — Dropping feels_like_c due to perfect correlation with temp_c

A heatmap of Pearson correlations revealed that feels_like_c provided no additional signal beyond temp_c, exhibiting a coefficient of 1.00. The variable was removed to streamline the model and mitigate overfitting risk from redundant predictors.

Importance of Cyclical Encoding

Time-based features exhibit clear periodicity (e.g., hourly rush patterns, seasonal transitions). Using sine and cosine encodings allowed the model to preserve cyclical continuity (e.g., 23:00 to 00:00) and improved its ability to generalize temporal transitions.

Architecture Optimization

The best-performing model consisted of three dense layers using ReLU activation. Dropout and batch normalization layers were added progressively to optimize the trade-off between learning capacity and regularization. The architecture was refined through a combination of manual tuning and validation loss tracking.

Strategic Holdout Period

Holdout data from November and December 2023 allowed for rigorous post-training evaluation. Notably, prediction accuracy declined slightly after December 20 due to holiday-related behavior shifts not captured during training. This reinforces the need for adaptive retraining when encountering high-season anomalies.

Operational Insight: Maintenance Window

Figure 2. Post-COVID Hourly Rentals — Optimal Maintenance Hours Between 2:00–5:00 AM

Figure 3. Weekday Demand Breakdown — Mondays Exhibit Consistently Lowest Usage

Behavioral trend analysis post-COVID highlighted minimal rider activity between 2:00 AM and 5:00 AM. This window, especially on Mondays, presents a valuable operational opportunity for off-peak maintenance and fleet rotation without interrupting user demand.

Results Summary

RMSE: 141.30
MAE: 97.10
Median AE: 64.59
R² (Full): 0.859
R² Pre-21st: 0.894
R² Post-20th: 0.780
Predictions within 5%: 10.38%
Predictions within 10%: 21.09%
Predictions within 20%: 42.25%

The model achieved a strong balance between generalization and accuracy, particularly on stable demand periods. Although accuracy dipped during holiday weeks, its core temporal structure allowed for reliable planning and infrastructure forecasting under normal conditions.

Visual Results

Figure 4. Actual vs. Predicted — Holdout Set (November-December 2023)

Figure 5. Residual Plot — Colored by Error Band

Supplementary Materials

Final Assets

Final model saved as: bike_model_covid.keras
Scaler object saved as: bike_scaler_covid.pkl
Notebook for full training/prediction pipeline: Google Colab Link

Next Steps

Future improvements could include: - Incorporating external data (e.g., public events and bike station availability) - Testing LSTM or Transformer architectures for sequential temporal modeling - Automating retraining pipelines for weekly data ingestion and redeployment - Improving prediction confidence intervals for stakeholder-facing dashboards