Controlling Drones with AI (Python Reinforcement Learning Quadcopter)

Sdílet
Vložit
  • čas přidán 29. 05. 2024
  • Teaching a Reinforcement Learning agent to pilot a quadcopter and navigate waypoints using careful environment shaping.
    GitHub Repo github.com/AlexandreSajus/Qua...
    0:00 Intro
    0:22 Physics
    1:08 Control Theory
    2:04 Reinforcement Learning
    3:45 Training
    4:13 Results
    4:46 Conclusion

Komentáře • 53

  • @abisheksunil
    @abisheksunil Před rokem +6

    Cool!! Would love to see more videos on RL, especially in an environment with a lot more parameters for the agent to control.

    • @alexandresajus
      @alexandresajus  Před 11 měsíci +1

      Hey! I finally did another video on RL where I trained a humanoid AI to pass obstacles from Total Wipeout in Unity 3D. I hope you'll like it: czcams.com/video/_YXOLM2a41Q/video.html

  • @thinkindude5566
    @thinkindude5566 Před rokem +3

    Great video love it

  • @manigoyal4872
    @manigoyal4872 Před 7 měsíci +3

    The disadvantage of PID as not able to reach higher speed even if the target is far away seems to be wrong tho
    the control output from a PID controller is directly proportional (P) to the error, so if the target is far away, error is more, therefor the output of PID controller will be more.

    • @alexandresajus
      @alexandresajus  Před 7 měsíci +2

      Yes, indeed, I should have phrased it better. What I meant is that PID coefficients are constants and do not change according to the distance to target. When the target is really far, I would ideally like to increase the Integral coefficient to start off with a more aggressive behaviour then reduce it to become more careful. In my mind that is the edge of Reinforcement Learning: the behaviour has more freedom and can adapt to more situations.

  • @SP-db6sh
    @SP-db6sh Před rokem +2

    Amazing, make a video on Disaster management Drone, trained with DRL ?
    How to design efficient aerodynamics & battery efficient Drones with DRL on virtual environment like VR?

  • @EigenA
    @EigenA Před rokem +1

    Cool, good job.

  • @jeremybertoncini6935
    @jeremybertoncini6935 Před rokem +3

    Hi,
    as the task to fulfill is Path-Planning, did you think about comparing results using Optimal Control theory too ?
    For example, Model Predictive Control algorithms may be efficient and real-time solving the presented scenario. Moreover, MPC are collision avoidance robust too, in case you would like to investigate more developed scenarios.
    In any case, very interesting framework of yours!

    • @alexandresajus
      @alexandresajus  Před rokem +2

      Very interesting field I did not know about, thanks for sharing! Yeah it would be interesting to try MPC and more complex scenarios, I might try it one day

    • @manigoyal4872
      @manigoyal4872 Před 7 měsíci +1

      It's actually good where the path planning is done by NN or any other planner, and MPC used for achieving the track line as close to the path

  • @imannabiyouni3006
    @imannabiyouni3006 Před rokem +2

    Keep doing the good work 👏.
    Just curious what platform did you use to make this CZcams video?
    The edit and timeline is impressive.

    • @alexandresajus
      @alexandresajus  Před rokem

      Thanks a lot! I use OBS for recording, Adobe After Effects for editing and Adobe Media Encoder for exporting

  • @Alpha725_
    @Alpha725_ Před rokem +1

    Video got me to sub. Now you just need to write a couple thousand papers and you will have a million subs 😋

  • @freddy_bsc
    @freddy_bsc Před rokem +4

    Very cool video, keep up the great work! I wonder, how fast the AI-controlled drone could have been with more training.

    • @alexandresajus
      @alexandresajus  Před rokem +3

      Thanks a lot! Unfortunately, at 500k training steps, the rewards had converged and continuing the training did not improve the agent's performance. It probably reached the optimal behaviour for this environment

  • @sanchaythalnerkar9736
    @sanchaythalnerkar9736 Před 8 měsíci +2

    Great Video , can you create a video where you explain the process step by step , for beginners?

    • @alexandresajus
      @alexandresajus  Před 8 měsíci

      It is not the first comment I get about making a tutorial on this, so I think I will do a tutorial on this process. Meanwhile there are a lot of resources on the internet about Control Theory and Reinforcement Learning. For RL, I really recommend to follow a tutorial about stable-baselines and gym, these are very easy to use RL frameworks.

  • @barissayin2985
    @barissayin2985 Před 11 měsíci +1

    you are awesome dude

    • @alexandresajus
      @alexandresajus  Před 11 měsíci

      Thank you!

    • @barissayin2985
      @barissayin2985 Před 11 měsíci +1

      @@alexandresajus i am planning on workin on multi-agent systems on quadcopter drones also as my final project at university. I would like to keep in touch and follow you more :)

    • @alexandresajus
      @alexandresajus  Před 11 měsíci

      Good idea! Sure we can keep in touch on Linkedin if you want: www.linkedin.com/in/alexandre-sajus

    • @barissayin2985
      @barissayin2985 Před 11 měsíci +1

      @@alexandresajus right, sent request

  • @kennethporst4359
    @kennethporst4359 Před rokem +1

    I'm confused...how do you give a computer math and it equates that to moving forward?

    • @JohnDoe-rx3vn
      @JohnDoe-rx3vn Před 9 měsíci +1

      PID is just three numbers you add together to tell the drone to get to the target without wasting time, but also to slow down so it doesn't overshoot the target. It's easy to use, and only needs the distance to the target and the time in order to work. It's popular because it works really well for how simple it is, and doesn't use a lot of computer power.
      P is the error (e), which is the distance to the target
      I is e added up over time, which starts to add up the longer you're away from the target
      D is how fast your speed is changing. Current Error - The error last time we checked, then divided by the time that passed since we checked. This number is negative, and slows the drone down by reversing the throttle as you get closer to the target.
      All of these numbers are multiplied by their Gains, Kp, Ki, and Kd respectively (which you manually change to "tune" the PID controller), and then added together. Whatever that number becomes is the throttle. In this video, it's the tilt of the drone. It's way simpler than the scary equations make it look.

  • @rverm1000
    @rverm1000 Před rokem +1

    cann you make a video on that process? sure would like to apply this to real life stuff. autonomous machines

    • @alexandresajus
      @alexandresajus  Před rokem

      I'll maybe do a more in-depth video on the process. Meanwhile there are a lot of resources on the internet about Control Theory and Reinforcement Learning. For RL, I really recommend to follow a tutorial about stable-baselines and gym, these are very easy to use RL frameworks. Keep in mind that using RL for real life machines is really complicated as RL requires a lot of training steps to work, which is complicated outside of a simulated environment. What do you have in mind in terms of real life applications?

    • @rverm1000
      @rverm1000 Před rokem +1

      I'm taking a course on it now. But putting everything together is something they never teach

    • @alexandresajus
      @alexandresajus  Před rokem

      @@rverm1000 ​Yes! I had that same problem a few years ago. Every RL course is generally based on the Sutton & Barto book and is way too theoretical.
      If you want to learn the practical side of RL, I recommend you lookup tutorials on how to create a custom gym environment and how to train a stable-baselines agent on either TowardsDataScience or Medium. The tutorials there are generally very practical and easy to follow

  • @bignerd3783
    @bignerd3783 Před rokem +2

    based hotline miami music

  • @perfumedsea
    @perfumedsea Před rokem +2

    So it's a game? Any plan to port to real drone?

    • @alexandresajus
      @alexandresajus  Před rokem +1

      Yeah this is simulation only, no plans on my end to test it on a real drone since the gap between simulation and production in reinforcement learning is very big but a good example of competent people achieving drone control in real life using RL is the research paper: A Zero-Shot Adaptive Quadcopter Controller ( arxiv.org/abs/2209.09232 )

    • @perfumedsea
      @perfumedsea Před rokem +1

      @@alexandresajus Thanks. I feel it might be feasible or maybe someone already has done that. But not published it yet.

  • @underlecht
    @underlecht Před rokem +1

    Waiting for full quad

    • @alexandresajus
      @alexandresajus  Před rokem

      You mean 3D with 4 propellers? Could be doable, I would need to use Unity for 3D rendering though

  • @user-sk4jp3ul4q
    @user-sk4jp3ul4q Před 4 měsíci

    python -m quadai
    /usr/bin/python: No module named quadai this helped pip install -e . thank you so much

    • @alexandresajus
      @alexandresajus  Před 4 měsíci

      Check this issue and let me know if it solves your case: github.com/AlexandreSajus/Quadcopter-AI/issues/2

    • @user-sk4jp3ul4q
      @user-sk4jp3ul4q Před 4 měsíci

      th for your answers. that was solved I think. another ERROR: Could not find a version that satisfies the requirement numpy==1.26.0 (from quadai)
      ERROR: No matching distribution found for numpy==1.26.0
      @@alexandresajusI am trying man

    • @alexandresajus
      @alexandresajus  Před 4 měsíci

      @@user-sk4jp3ul4q What Python version do you have (python --version)? I think new versions of numpy only support Python 3.8+

    • @user-sk4jp3ul4q
      @user-sk4jp3ul4q Před 4 měsíci

      reinstalled ubuntu 22 from 20/ uses vs code / the same problems ModuleNotFoundError: No module named 'quadai'anothe r libraries installed good@@alexandresajus

    • @user-sk4jp3ul4q
      @user-sk4jp3ul4q Před 4 měsíci +1

      this helped pip install -e . thank you so much@@alexandresajus

  • @TavoFourSeven
    @TavoFourSeven Před rokem +2

    This needs to get to a point where nobody should need to tune a drone (only like, a master multiplier in betaflight). Like just build it and go, slap any props on at anytime and the (ai flight controller name) just knows real time what to do for best propwash handling safely. Could make GPS use probably easier. So many reasons is why I just looked it up and am happy this is only 2 months old. Toroidal prop= yawn*

    • @TavoFourSeven
      @TavoFourSeven Před rokem +1

      I'm talking very adaptive P I and D

    • @alexandresajus
      @alexandresajus  Před rokem +1

      Yeah, a team of researchers have a great paper on this subject: A Zero-Shot Adaptive Quadcopter Controller ( arxiv.org/abs/2209.09232 ). They used reinforcement learning to create a drone controller that didn't need tuning. They tried to use that controller to hover real drones with different sizes and weights and the success rate was quite impressive but I am guessing it'll be a while before something like this is production ready.

    • @TavoFourSeven
      @TavoFourSeven Před rokem +1

      ​@@alexandresajus promising stuff indeed. Gonna be a great day when those are for sale. Maybe even ECS too.😮

  • @hradynarski
    @hradynarski Před 6 měsíci +1

    Cool experiment, but that also may prove that human used in experiment sucks at drone game and in PID control tuning right? ;)

    • @alexandresajus
      @alexandresajus  Před 6 měsíci +1

      Haha, hey chill out you're talking about me here (but yes you are correct)

    • @hradynarski
      @hradynarski Před 6 měsíci +1

      @@alexandresajusI like to suggest, human vs AI PID tuning contests. That would be not only entertaining but also useful.

    • @alexandresajus
      @alexandresajus  Před 6 měsíci

      @@hradynarski Yeah could be fun! Hard to organize but fun

  • @govynela4176
    @govynela4176 Před rokem +1

    Hi ! I liked this video. I would like read your paper. Can I have it ?

    • @alexandresajus
      @alexandresajus  Před rokem

      Yeah sure! In the description there is a link to the GitHub repo of the project; in this repo there is a Reinforcement_Learning_for_the_Control_of_Quadcopters.pdf file; that is the paper.
      Keep in mind that I am not a researcher and that the paper is not peer-reviewed so take everything in it with a grain of salt

    • @govynela4176
      @govynela4176 Před rokem +1

      @@alexandresajus thanks ! I understand why I couldn't find it on Google Scholar. 😉

  • @pierrickbo
    @pierrickbo Před rokem

    first