Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel
    AI Tools

    The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel

    AwaisBy AwaisDecember 10, 2025No Comments5 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    The Machine Learning “Advent Calendar” Day 10: DBSCAN in Excel
    Share
    Facebook Twitter LinkedIn Pinterest Email

    of my Machine Learning “Advent Calendar”. I would like to thank you for your support.

    I have been building these Google Sheet files for years. They evolved little by little. But when it is time to publish them, I always need hours to reorganize everything, clean the layout, and make them pleasant to read.

    Today, we move to DBSCAN.

    DBSCAN Does Not Learn a Parametric Model

    Just like LOF, DBSCAN is not a parametric model. There is no formula to store, no rules, no centroids, and nothing compact to reuse later.

    We must keep the whole dataset because the density structure depends on all points.

    Its full name is Density-Based Spatial Clustering of Applications with Noise.

    But careful: this “density” is not a Gaussian density.

    It is a count-based notion of density. Just “how many neighbors live close to me”.

    Why DBSCAN Is Special

    As its name indicates, DBSCAN does two things at the same time:

    • it finds clusters
    • it marks anomalies (the points that do not belong to any cluster)

    This is exactly why I present the algorithms in this order:

    • k-means and GMM are clustering models. They output a compact object: centroids for k-means, means and variances for GMM.
    • Isolation Forest and LOF are pure anomaly detection models. Their only goal is to find unusual points.
    • DBSCAN sits in between. It does both clustering and anomaly detection, based only on the notion of neighborhood density.

    A Tiny Dataset to Keep Things Intuitive

    We stay with the same tiny dataset that we used for LOF: 1, 2, 3, 7, 8, 12

    If you look at these numbers, you already see two compact groups:
    one around 1–2–3, another around 7–8, and 12 living alone.

    DBSCAN captures exactly this intuition.

    Summary in 3 Steps

    DBSCAN asks three simple questions for each point:

    1. How many neighbors do you have within a small radius (eps)?
    2. Do you have enough neighbors to become a Core point (minPts)?
    3. Once we know the Core points, to which connected group do you belong?

    Here is the summary of the DBSCAN algorithm in 3 steps:

    DBSCAN in excel – all images by author

    Let us begin step by step.

    DBSCAN in 3 steps

    Now that we understand the idea of density and neighborhoods, DBSCAN becomes very easy to describe.
    Everything the algorithm does fits into three simple steps.

    Step 1 – Count the neighbors

    The goal is to check how many neighbors each point has.

    We take a small radius called eps.

    For each point, we look at all other points and mark those whose distance is less than eps.
    These are the neighbors.

    This gives us the first idea of density:
    a point with many neighbors is in a dense region,
    a point with few neighbors lives in a sparse region.

    For a 1-dimensional toy example like ours, a common choice is:
    eps = 2

    We draw a little interval of radius 2 around each point.

    Why is it called eps?

    The name eps comes from the Greek letter ε (epsilon), which is traditionally used in mathematics to represent a small quantity or a small radius around a point.
    So in DBSCAN, eps is literally “the small neighborhood radius”.

    It answers the question:
    How far do we look around each point?

    So in Excel, the first step is to compute the pairwise distance matrix, then count how many neighbors each point has within eps.

    Step 2 – Core Points and Density Connectivity

    Now that we know the neighbors from Step 1, we apply minPts to decide which points are Core.

    minPts means here minimum number of points.

    It is the smallest number of neighbors a point must have (inside the eps radius) to be considered a Core point.

    A point is Core if it has at least minPts neighbors inside eps.
    Otherwise, it may become Border or Noise.

    With eps = 2 and minPts = 2, we have 12 that is not Core.

    Once the Core points are known, we simply check which points are density-reachable from them. If a point can be reached by moving from one Core point to another within eps, it belongs to the same group.

    In Excel, we can represent this as a simple connectivity table that shows which points are linked through Core neighbors.

    This connectivity is what DBSCAN uses to form clusters in Step 3.

    Step 3 – Assign cluster labels

    The goal is to turn connectivity into actual clusters.

    Once the connectivity matrix is ready, the clusters appear naturally.
    DBSCAN simply groups all connected points together.

    To give each group a simple and reproducible name, we use a very intuitive rule:

    The cluster label is the smallest point in the connected group.

    For example:

    • Group {1, 2, 3} becomes cluster 1
    • Group {7, 8} becomes cluster 7
    • A point like 12 with no Core neighbors becomes Noise

    This is exactly what we will display in Excel using formulas.

    Final thoughts

    DBSCAN is perfect to teach the idea of local density.

    There is no probability, no Gaussian formula, no estimation step.
    Just distances, neighbors, and a small radius.

    But this simplicity also limits it.
    Because DBSCAN uses one fixed radius for everyone, it cannot adapt when the dataset contains clusters of different scales.

    HDBSCAN keeps the same intuition, but looks at all radii and keeps what remains stable.
    It is far more robust, and much closer to how humans naturally see clusters.

    Advent Calendar Day DBSCAN Excel Learning Machine
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    Escaping the SQL Jungle | Towards Data Science

    March 21, 2026

    A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

    March 21, 2026

    How to add Google Calendar to Outlook

    March 21, 2026

    Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)

    March 21, 2026

    Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

    March 21, 2026

    How to Measure AI Value

    March 20, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    Escaping the SQL Jungle | Towards Data Science

    March 21, 2026

    don’t collapse overnight. They grow slowly, query by query. “What breaks when I change a…

    SEO’s new battleground: Winning the consensus layer

    March 21, 2026

    A Gentle Introduction to Nonlinear Constrained Optimization with Piecewise Linear Approximations

    March 21, 2026

    23 Radish Recipes for Salads, Pickles, and More

    March 21, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Google confirms AI headline rewrites test in Search results

    March 21, 2026

    How to add Google Calendar to Outlook

    March 21, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.