Tuesday, September 10, 2024

Gower’s Distance for Blended Categorical and Numerical Information | by Haden Pelletier | Jul, 2024

Must read


A distance measure for clustering combined information

Towards Data Science

More than likely you may have heard of Manhattan distance or Euclidean distance. These are two completely different metrics which offer data as to how distant (or completely different) two given information factors are.

Manhattan and Euclidean distance graphed. Picture by creator

In a nutshell, Euclidean distance is the shortest distance from level A to level B. Manhattan distance calculates the sum of absolutely the variations between the x and y coordinates and finds the space between them as in the event that they had been positioned on a grid the place you may solely go up, down, left, or proper (not diagonal).

Distance metrics usually underlie clustering algorithms, resembling k-means clustering, which makes use of Euclidean distance. This is sensible, as with a view to outline clusters, it’s important to first understand how related or completely different 2 information factors are (aka how distant they’re from one another).

Calculating the space between 2 factors

To point out this course of in motion, I’ll begin with an instance utilizing Euclidean distance.



Supply hyperlink

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article