The Online Shoppers Purchasing Intention Dataset is sourced from the UCI Machine Learning Repository and contains 12,330 session-level records collected from an e-commerce website over a one-year period. Each row represents a unique browsing session, rather than an individual user, making the dataset suitable for understanding behavioural patterns without directly identifying customers. The dataset includes 10 numerical features (such as administrative duration, product-related duration, bounce rate, exit rate, and page values) and 8 categorical attributes (such as month, visitor type, operating system, browser, traffic type, and weekend indicator). These variables collectively capture user engagement, technical configurations, navigation paths, and likelihood of conversion.
The target attribute in the original dataset is “Revenue,” indicating whether a session resulted in a purchase. However, for this clustering task, the dataset is used in an unsupervised context to uncover natural customer segments and behavioural groupings. Prior to modelling, preprocessing steps such as handling categorical variables, standardisation, and dimensionality reduction (PCA) are required due to varying scales and mixed data types. Overall, the dataset provides a rich, realistic environment for understanding online shopping behaviours and extracting actionable business insights.
| Column Name | Type | Description |
|---|---|---|
| Administrative | Numerical | Number of pages visited related to account management or administrative functions. |
| Administrative_Duration | Numerical | Total time (in seconds) spent on administrative pages during the session. |
| Informational | Numerical | Count of informational pages viewed, often related to policies or FAQs. |
| Informational_Duration | Numerical | Time spent on informational pages across the session. |
| ProductRelated | Numerical | Number of product-related pages viewed, indicating browsing depth. |
| ProductRelated_Duration | Numerical | Total duration spent on product pages, strongly linked to purchase intent. |
| BounceRates | Numerical | Percentage of visitors who enter and leave the site without interaction. |
| ExitRates | Numerical | Percentage of page exits from the visited pages; signals drop-off likelihood. |
| PageValues | Numerical | Estimated value of a page based on historical conversion rates. |
| SpecialDay | Numerical | Indicates proximity to special events (e.g., Valentine’s Day) where purchase probability is higher. |
| Month | Categorical | Month of the session (e.g., Feb, Nov), capturing seasonal behaviour patterns. |
| OperatingSystems | Categorical | OS used by the visitor (e.g., Windows, Mac, Linux). |
| Browser | Categorical | Browser type (e.g., Chrome, Firefox, Safari). |
| Region | Categorical | Geographic region of the visitor (coded values 1–9). |
| TrafficType | Categorical | Source of traffic (e.g., direct, referral, ads). |
| VisitorType | Categorical | Identifies whether the visitor is a Returning Visitor, New Visitor, or Other. |
| Weekend | Categorical (Boolean) | Indicates whether the session occurred on a weekend (TRUE/FALSE). |
| Revenue | Categorical (Boolean) | Target variable in original dataset — whether the session resulted in a purchase. |