Faster geospatial queries in MySQL
Vložit
- čas přidán 21. 07. 2024
- Searching a table in MySQL based on latitude and longitude columns.
------------------
📚 Learn more about PlanetScale
• Website: planetscale.com/
• Docs: docs.planetscale.com/
------------------
00:00 A point of interest
00:21 Haversine formula
01:00 ST_DISTANCE_SPHERE distance calculation
03:25 Filtering by ST_DISTANCE_SPHERE
04:28 It's too slow
05:11 Bounding boxes
06:55 Using a bounding box
08:10 Adding an index
10:20 Adding a compound index
11:34 A note about POINT columns
------------------
💬 Follow PlanetScale on social media
• Twitter: / planetscaledata
• TikTok: / planetscale
• Twitch: / planetscale
• LinkedIn: / planetscale - Věda a technologie
Good to know this function has the optional radius argument so we can still use it when we go to Mars
MultiPlanetScale
i really like the way how he teaches ...
🫡 thank you!
If you're wondering how he got the 4 values on lines 6 and 8 at 7:00, use these formulas:
LAT +/- (180/pi)*(1.05 miles / 3958.71 miles) and
LON +/- (180/pi)*(1.05 miles / 3958.71 miles) / cos(pi * LAT / 180)
Also, do note that these values are not universal. It depends on where your point of interest actually is located. Specifically how far it is from the equator.
These videos by Aaron are so informative. I look forward to every single one and learn so much along the way. PlanetScale give this man a raise 😅🎉
You're kind to say so. Thank you so much Zac! Forwarding that last part to my boss
"see if i can explain it to you very quickly"
"... what the f--"
"okay, new plan"
every damn video you get a chuckle out of me.
Haha that's the goal! 🥰
It's always good idea to reduce the number of rows needed to be filtered using an expensive function beforehand.
Keep it up, Aaron!
I had once used haversine formula for one of my freelance projects and didn't realize how much performance I left on the table after watching this video. Please keep making this awesome content!
amazing explanation ! love this video and of course thanks!
I read your post on this before! Really glad to see a video on it nicely explained :D
This is the type of quality in any tutorial. Looking forward to a LOT of these. Thanks ❤
We've only just begun 😎 Thanks for the kind words!
Such a good video, thank you, Aaron
Excellent video! Thank you! 🚀
wow! gonna use the same concept to improve job search by lat,long in my current project. Such a simple idea but a huge improvement. thank you man.
This really helped me at work today. Thanks.
fantastic video but if we use a composite index on "x" , "y" and change the query to be
where x>x1 and yy2
that should work and make full use of the composite index on (x,y). Again thx for the fantastic video
Very interesting topic! Thank you very much!
I really like the way you explain to people, especially adding some funny stuff in the middle XD Keep going Aaron!
Thank you! Trying my best 🫡
This is the most useful and interesting video I have seen in the last few days. Amazing and insightful
Thank you! We'll keep 'em coming 💪
My new research
Incredibly clear and useful!
This is a beneficial video. Hope you can upload more videos on this subject. Keep going !!!
Amazing video! It would be great to see how to calculate the box and how to use the "Point" column as well :)
I felt so smart when I figured this out on my own a couple of years ago
Cool video! I like the minimal and focused presentation style and the usage of the simple UI. It looks very focused and doesn't distract from what you are trying to do. I've never had a chance to use planetscale but I will keep on watching your videos.
These are a ton of work, so I really appreciate the kind feedback!
Excellent! You communicate concepts extremely well
Thank you! I really enjoy teaching
This video was great! I actually will try to implement this because even though the geolocation functions I do have are for Admin only, it would be great to release to the wild and have users have access to it with speed.
I knew the trick with the bounding box. What I didn't know was that MySQL has that nice function. So I always did that calculation in my applications. Still learning new things every day. Keeps the job exciting. ;-)
> Still learning new things every day. Keeps the job exciting. ;-)
Honestly this is such a good perspective. Thank you for the thoughtful comment
This is gold
This is a really fun problem, reminds me one of my first tasks as a developer 7 years ago. Although I did it in SQLServer
I feel like every developer has been asked to do this at some point!
great content, hoping there is new contnt about performance
Excellently explained 🎉
Great videos with clear explanations. Do you offer consulting? I'm running into a time series issue with timezones and I'd be willing to book a consultation to get an understanding of how to best approach the problem.
A follow up video about the spatial column type and the geospatial functions would be awesome.
Absolutely! On my list. I feel like this video could've been an hour long 😅 More to come
@@PlanetScaleI store my lat and lng as GEOMETRY, but I still pull them out using X() and Y() to do some work. video wanted!
Very useful feature I just know. I must try it
Damn your videos are so good
These comments are wonderful to read. Thank you so much
Is there a way to select all the points in a 1 hour radius, instead of 1 mile radius ?
i have an applicatipion that save your geolocation position when you check in... isn't best to use decimal for EXAT location? and how much long should be this numbers?
hmmm, i wonder how you could boost performance by putting the points in a grid to quickly decide handle grid squares that are entirely in the circle
"because I don't speak meters" lmao😂
which editor/ide are you using?
What editor/IDE are you using?
I live in SA. very cool.
We need an entire MySQL course from scratch from Aaron
Boy do I have good news for you PlanetScale.com/mysql
32 second mark is legendary
Great video, clear explanations - this gets me looking forward to learn more about MySQL geospatial capabilities.
I wanted to ask what MySQL client are you using - it looks like it may run queries instantly as you type, almost like hmr- is that so?
TablePlus looks like 👍
beat me to it! TablePlus indeed
Nice video Aaron, but: You should use “Restaurant” condition from the very beginning, because in that case initial query will take less then ~2.2s. So it its a bit misleading and can lower value of otherwise very informative video. #feedback #keepGoing
I wonder if you would be able to gain some performance if you use a spatial index and compound index by doing a sub-query/join 🤔 Or having the sub-query would negate any potential gains from using a spatial index.
Interesting question... I think I'll do another video geo stuff. There's a lot more here to explore
05:20 how would bouding box look if it contains north or south pole?
That's... a great question. I don't know! I'd have to research
I think your longitude wouldn't matter - so just set a maximum latitude if you are looking around the north pole, and a minimum latitude if looking around the south pole?
Though I'm imagining the centre point is at the poles, but if you are 0.5miles south of the North Pole... not a clue! Yes, time to get researching!
sorry, what's the name of the sql too you're using please
found it, in your mysql tuts...
@PlanetScale, just as you created your compund type_lon index, I wondered:
"Could we create a compound index with truncated values of our latitude and longitude?"
It would be like indexing all points by with map grid square they fall into.
Then our 'bounding box' redundant query becomes 'the point falls into one of the 4/8/16 grid squares'
A bit more work perhaps in query design to calculate what the valid grid squares would be, but perhaps that would allow for the index reducing down to just the bounding box contents more immediately, rather than still reading all of our long rectangle?
Answering my own question: yes!
I work with UK postcode data occasionally, and adding an generated column and then indexing it improved the speed for finding postcodes nearby by a further MASSIVE factor:
alter table postcodes add column gridsquare varchar(15)
GENERATED ALWAYS AS (CONCAT(TRUNCATE(latitude, 2), ":", TRUNCATE(longitude, 2))) STORED;
alter table postcodes add index gridsquare_index (gridsquare);
My performance improvement is hard to work out as it got vanishingly small:
Find all postcodes within 10km of one point: 1452 postcodes returned
- using no bounding box: 0.4 seconds
- using bounding box and index: 0.1 seconds
- using gridsquares:
Yes! You can absolutely do that. I originally wrote an article with that method, but it didn't make the video: aaronfrancis.com/2021/efficient-distance-querying-in-my-sql Perhaps the next one 🤔
Bravo on the explanation 👏👏 This is delightful
@@PlanetScale Thank you! Ooh, I didn't realise you had an article on it too!
And compliment right back at you - it's a really well explained article! And, yes, that's so clear that by tranching one of the variables, we can make a compound index that is SOOO much better!
(For others who haven't read the article - and btw, REALLY do! - the idea is to create a new column rounding each decimal to the nearest 0.001, thereby allowing you to query for specific starts of the decimal like 'it must be 30.254 or 30.255' as a way of indexing bounding boxes and getting another performance boost! It's really neat and Aaron explains it very nicely!)
Also, I didn't know that MySQL `IN` was so much more optimised than `BETWEEN` - perhaps another weird one for humans vs computers!
Is there any benefit then to making the index ENTIRELY from tranched values - would it be quicker? Or is the point of an index that we don't want to have so many rows with the same index and so reduce its power/versatility for other query patterns?
(I feel like I have only started scratching the surface of the power of indexes recently, so I'm intrigued by how to get more out of them!)
Why are you storing your "point" in 2 different fields? Just use the POINT column type and add (spatial) index on that. Then buffer on your geometries of interest, and select those that intersect with the buffer. Done.
Could you expand on bigger on your geometries of interest?
excelent video, "bounding rectangle" is an excellent way to speed up such kind of queries (personally I just "assumed" that earth is "flat", so euclidian distance ftw :-P - of course, it's not correct, but it was about closest locations, not all location in a range, so ... trading some accuracy to performance)
what I didn't understand from your video is why do you need some extra distance offset, shouldn't be the rectangle just tangent to the circle in all four points? (or maybe I make the same mistake thinking 2D, but the earth is 3D?)
Haha I made the square a little bigger because I was afraid I was gonna do it wrong! It's probably fine to do it tangentially, but I got scared 😂
What db client/tool is that ?
TablePlus!
Can you share the test data?
Unfortunately I can't. There might be some open data sets with lat/lon pairs though!
Why not group LONG and LAT in one POINT type column
You can! I mention that at the end. I'll have to do another full video on that, as it's a big topic.
the earth is not flat? and to think that I trusted this guy!
To be fair I do know more about databases than celestial bodies, so stay tuned for more database content
New to me
Why no mention of geohash?
They aren't super common, even if they are neat
This is God level content!!!!.Glad that i stumbled upon the video.Made my day❤
Thank you so much! Lots more to come
good explanation but im not getting the most accurate result the same as it is in google maps
Google Maps measures by driving distance typically. This is "as the crow flies"
Could someone please explain why he said he wouldn't use MySQL for driving directions?
@11:30
Because that's exactly what I wanted my app to do and is the reason I clicked on this video 😂
This is able to calculate the distance between two points in a straight line, which doesn't always correspond with roads or driving time. If two points are one mile apart, but there is a mountain or lake between them, that will dramatically change the travel time! You'd need a navigation API for that. Something that can tell you where roads are.
@@PlanetScale Thank you
Perhaps I'll look into using the Google maps API
Good video. You might also want to look into Geohash where you can represent every coordinate with a string. You can then index that column with a string and use like % query.
Shout out to the flat earthers that didn't like this video 0:33
To find the closest point you can probably just use pythagoras right? Unless you need to find points on the other half of the earth, this will work just fine.
I don't know enough about GIS to say, but even over several miles I think it would get inaccurate quickly
For short distances like across a large city. But it will become quite inaccurate if need say 100 miles
The earth is a sphere, so no matter in which direction you travel a 100 miles, you always follow the same curvature. So, it doesn't matter if you just use Pythagoras's theorem instead right? You only need a calculated distance when you want to select it or compare it to some arbitrary value. Otherwise I think it's safe to use a heuristic.
If your "points" are in longitude-latitude where units are degrees, then you'll get the "distance" in degrees as well. The distance per degree latitude changes as a function of longitude, so accurate conversion isn't simple. That's why the haversine formula exists.
@@quillaja I understand that the haversine formula calculates the accurate distance between two points, but I think that when only comparing the distance, you can use a heuristic like pythaghoras
pratically open world chunks
> 2:45 you had π km distance and you converted that to miles. wtf
Did I do it wrong? Or you just prefer meters?
@@aarondfrancis I just meant you had such a nice number of km (3.1415...) between two first points and you promptly got rid of it. but I'm just joking - apologies if that wasn't clear. Great video in general - not many people know you can deal with spatial data on the db side at all, not to mention how to do it efficiently. Thanks mate and keep it up
btw which gui is that?
TablePlus!
@@PlanetScale
Thank you!
guys don't pause at 0:01
Why do you make the bounding box larger than the circle? I would have expected the box have sides of length 2*r, and thus perfectly bound the circle.
Honestly, just to be kinda safe? I don't think the bounding box calculation I used is done on a sphere, but rather on a plane. So to take into account the haversine deal, I just bumped it up a bit. The worst possible thing would be for my redundant condition not to be redundant after all, but rather filter out some things that should be there.
Converts meters to miles.
* close the tab
Thanks for the comment! Every little thing helps 🤗
Correct, earth is not flat. It is donut shaped!!
oh no
I don't speak meters... Muricas
wtf is a mile
~3576.32 cubits
@@PlanetScale already forgot it lol
The highest point of this video was "... since I don't speak meters.".
A bunch of POIs? A speckle of POIs? A murder of POIs? A database table of POIs?
When you explicitly converted from metres to miles, I had to unsubscribe. Sorry man 😂
Other than that part what'd you think?
@@PlanetScale awesome video! Love the incremental approach to optimising queries.
@@invinciblemode great! glad to hear it
the earth is flat
Big if true (it's not)
> 0:33
Dislike, unsubscribe
FlatPlanetScale
@@PlanetScale You can't spell planet scale without plane
The earth is flat. Disliked, didn't comment, unsubscribed
Even set your name to "Ban Aaron," good commitment to the bit! (My name is Aaron)
Is there a way to select all the points in a 1 hour radius, instead of 1 mile radius ?
Unfortunately not. You'd need to calculate drive times for that, and that's beyond the scope of a database.
I'm guessing you will want to do the time calculation outside of the database, as suggested.
That calculation is probably expensive, so the fewer times you have to run that, the better...
Hey presto! Another reason to use a bounding box!
Decide what is the max distance you could travel in 1 hour from your location in each direction.
Add & subtract from your centre, and that becomes a new bounding box.