DBSCAN Algorithm | Machine Learning with Scikit-Learn Python

Sdílet
Vložit
  • čas přidán 6. 09. 2024

Komentáře • 47

  • @crazyfootball2271
    @crazyfootball2271 Před 3 lety

    Very easy to understand for 1st timers. Great work. Appreciated.

  • @NhuNguyen-gl3dv
    @NhuNguyen-gl3dv Před 3 lety

    Excellent video. Very well explained. Thank you so much.

  • @MiyaBhai-dj4on
    @MiyaBhai-dj4on Před 3 lety +3

    Please. Provide us code to copy it

  • @anushamv3190
    @anushamv3190 Před 3 lety +1

    Hello sir,
    Which algorithm works well for customer segmentation wrt Recency, Frequency, Monetory?
    And is necessary to apply all the algorithms that is Kmeans, Dbscan, hier to the dataset and then come yo conclusion.

  • @poojachindarkar1207
    @poojachindarkar1207 Před 4 lety

    you made that easy! glad that i found you :)

  • @kannavjiya_raja
    @kannavjiya_raja Před 3 lety

    Great! Can you please provide the more detail explanation of DBSCAN algorithm

  • @arash_mehrabi
    @arash_mehrabi Před 3 lety

    nice, clear explanation, thank you.

  • @zakariaghalmane1547
    @zakariaghalmane1547 Před 3 lety

    Thank you for this very useful video

  • @saiakhileshande8486
    @saiakhileshande8486 Před 2 lety

    Thank you for the video with a clear explanation. Could you also show how to find optimal z and epsilon in sklearn?

  • @laurynasgrusas8755
    @laurynasgrusas8755 Před 3 lety

    This was very helpful. Thank you!

  • @elenatagliabue6625
    @elenatagliabue6625 Před 3 lety +2

    Great! One question: what do you mean when you write "dist i=dist of the 5th neighbor of the ith data point"? What is the neighbor in this case? Thank you

    • @NormalizedNerd
      @NormalizedNerd  Před 3 lety

      dist = an array of n elements
      dist[i] stores the distance of the 5th nearest datapoint from i th data point
      n = number of data points

  • @elvykamunyokomanunebo1441

    Hello Normalizer, I am wondering :
    If DBSCAN doesn't handle higher dimensionality very well, does standardizing improve performance if there is a moderate degree of correlation between features/ dimensions?

  • @stonecastle858
    @stonecastle858 Před 3 lety +1

    z surely can't refer to neighbours only, it must also include the point itself?

    • @NormalizedNerd
      @NormalizedNerd  Před 3 lety +1

      Yes, z includes the point itself. (Sorry for the late reply)

  • @saylik1094
    @saylik1094 Před 3 lety

    Very nice explanation. Thank you!!
    Can you please video on HDBSCAN?

  • @mathavraj9662
    @mathavraj9662 Před 3 lety +1

    By 5th neighbour you mean the 5th radially farthest point from ith point? What if many points are lying in the 5th position

    • @NormalizedNerd
      @NormalizedNerd  Před 3 lety +1

      A point can have any number of equidistant neighbors. The algorithm just checks how many points are inside the circle.

  • @haneulkim4902
    @haneulkim4902 Před 3 lety

    Thanks for great video! I have two questions that I want to ask:
    1. You said DBSCAN performs poorly for high dimensional data, how many dimension are considered high?
    2. Why is it bad for high dimensional data?

    • @NormalizedNerd
      @NormalizedNerd  Před 3 lety +1

      1. That's a very subjective question. For some datasets it's 100 for others it might be 1000. It depends on the distribution of the data.
      2. Because we are using Euclidean distance to find the neighborhood points. Euclidean distance is bad for searching in higher dimensions because it searches a tiny percentage of volume compared to circumscribing hypercube!

    • @haneulkim4902
      @haneulkim4902 Před 3 lety

      ​@@NormalizedNerd
      Thanks for answering!
      1. Distribution of each feature? Can't we just normalize all features?

  • @Slypie2112
    @Slypie2112 Před 3 lety

    How do we specify out the exact values of the outliers from the dataset from this DBSCAN cluster? Thank you

  • @arijitRC473
    @arijitRC473 Před 4 lety

    Well explained content!!

  • @Ajitshukla07
    @Ajitshukla07 Před 4 lety

    Very well experienced, can we get more usecases for DBScan for better understanding .

    • @NormalizedNerd
      @NormalizedNerd  Před 4 lety

      give this a read: datascience.stackexchange.com/questions/10063/for-which-real-world-data-sets-does-dbscan-surpass-k-means

  • @rezamahendra8418
    @rezamahendra8418 Před 3 lety

    How can we input the excel or csv data while using this algorithm?

    • @NormalizedNerd
      @NormalizedNerd  Před 3 lety

      Pretty easy...
      df = read.csv("path_to_csv_file.csv")
      # then use iloc to select columns for features and target variables and put them in X and Y

  • @fitrianinasir1321
    @fitrianinasir1321 Před 3 lety

    Thank u so much, what a great explanation! I have a question, can we use PCA before doing clustering with DBSCAN? If yes, which dimension should I use? before PCA (in this case I have 30 dimensions), or after PCA with 3 dimensions?

    • @NormalizedNerd
      @NormalizedNerd  Před 3 lety

      Yes, you can try to reduce the dimension using PCA and then cluster using DBSCAN.

    • @fitrianinasir1321
      @fitrianinasir1321 Před 3 lety

      @@NormalizedNerd then for the MinPts, in case I will use PCA Dataframe to fit in DBSCAN Algorithm.. which one should I use? MinPts = 2*30 - 1 = 59 (original number of features) or MinPts = 2*3 - 1 (PCA features) ?? (refers to the heuristic approach by the inventor of DBSCAN Algorithm, Martin Ester 1996)

  • @cruzab3153
    @cruzab3153 Před 3 lety

    Very useful....So I have 1 doubt...Assuming we created the clusters...how do we create a buffer or outer polygon for those cluster??...

    • @NormalizedNerd
      @NormalizedNerd  Před 3 lety

      Thanks!
      You need something called convex hull.

    • @cruzab3153
      @cruzab3153 Před 3 lety

      @@NormalizedNerd thanks man... that's everything I need ...

    • @cruzab3153
      @cruzab3153 Před 3 lety

      OMG man it's working.....I have been searching in the wrong direction for over 1 week....this one word opened doors to all my answers😭😭... thanks again man....

    • @NormalizedNerd
      @NormalizedNerd  Před 3 lety +1

      @@cruzab3153 Haha...Yeah it happens. Happy to help :D

  • @dragoneagle11
    @dragoneagle11 Před 3 lety

    Great video! Is there any function built into scikit that can plot the clusters like the function you have in this video? Your show_clusters function

    • @NormalizedNerd
      @NormalizedNerd  Před 3 lety +1

      IDK if scikit learn can do that but you can do a scatter plot using seaborn to indicate the clusters.

  • @pratyakshmathur2334
    @pratyakshmathur2334 Před 3 lety

    Can you please do it with an image