Skip to content

DeepaSaru/Unsupervised-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

The Online Shoppers Purchasing Intention Dataset is sourced from the UCI Machine Learning Repository and contains 12,330 session-level records collected from an e-commerce website over a one-year period. Each row represents a unique browsing session, rather than an individual user, making the dataset suitable for understanding behavioural patterns without directly identifying customers. The dataset includes 10 numerical features (such as administrative duration, product-related duration, bounce rate, exit rate, and page values) and 8 categorical attributes (such as month, visitor type, operating system, browser, traffic type, and weekend indicator). These variables collectively capture user engagement, technical configurations, navigation paths, and likelihood of conversion.

The target attribute in the original dataset is “Revenue,” indicating whether a session resulted in a purchase. However, for this clustering task, the dataset is used in an unsupervised context to uncover natural customer segments and behavioural groupings. Prior to modelling, preprocessing steps such as handling categorical variables, standardisation, and dimensionality reduction (PCA) are required due to varying scales and mixed data types. Overall, the dataset provides a rich, realistic environment for understanding online shopping behaviours and extracting actionable business insights.

Column Name Type Description
Administrative Numerical Number of pages visited related to account management or administrative functions.
Administrative_Duration Numerical Total time (in seconds) spent on administrative pages during the session.
Informational Numerical Count of informational pages viewed, often related to policies or FAQs.
Informational_Duration Numerical Time spent on informational pages across the session.
ProductRelated Numerical Number of product-related pages viewed, indicating browsing depth.
ProductRelated_Duration Numerical Total duration spent on product pages, strongly linked to purchase intent.
BounceRates Numerical Percentage of visitors who enter and leave the site without interaction.
ExitRates Numerical Percentage of page exits from the visited pages; signals drop-off likelihood.
PageValues Numerical Estimated value of a page based on historical conversion rates.
SpecialDay Numerical Indicates proximity to special events (e.g., Valentine’s Day) where purchase probability is higher.
Month Categorical Month of the session (e.g., Feb, Nov), capturing seasonal behaviour patterns.
OperatingSystems Categorical OS used by the visitor (e.g., Windows, Mac, Linux).
Browser Categorical Browser type (e.g., Chrome, Firefox, Safari).
Region Categorical Geographic region of the visitor (coded values 1–9).
TrafficType Categorical Source of traffic (e.g., direct, referral, ads).
VisitorType Categorical Identifies whether the visitor is a Returning Visitor, New Visitor, or Other.
Weekend Categorical (Boolean) Indicates whether the session occurred on a weekend (TRUE/FALSE).
Revenue Categorical (Boolean) Target variable in original dataset — whether the session resulted in a purchase.

About

A Data-Driven Approach to Online Shopper Segmentation Using PCA and Unsupervised Clustering Algorithms

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors