Wie schon mal angedeutet werde ich hin und wieder Blogposts hier recyclen, die ich in meinem firmeninternen Blog veröffentliche -> daher sind die dann auf Englisch. Heute mal über das Data mining von Uber:
Uber – widely known for their disruptive impact on the traditional cab business – decided to play around with their data. By that they showed both what can be done with Big Data analysis and why we need to be careful (in general and also as an industry) using this without thinking about the impact of the analysis.
What did they do?
They did some analysis of what they call glory rides. If you do not know what a glory ride is supposed to be (as I was not aware of this, too) – here is their explanation:
Recently, I have come to understand that some of you may have—and I’m not pointing any fingers here or anything—on occasion found love that you might immediately regret upon waking up the morning after. Let’s talk about that. In times of yore you would have woken up in a panic, scrambling in the dark trying to find your fur coat or velvet smoking jacket or whatever it is you cool kids wear. Then that long walk home in the pre-morning dawn. But that was then. …we came up with the Ride of Glory (RoG). A RoGer is anyone who took a ride between 10pm and 4am on a Friday or Saturday night, and then took a second ride from within 1/10th of a mile of the previous nights’ drop-off point 4-6 hours later (enough for a quick night’s sleep). (This time window may not be the best, but small changes don’t change the overall pattern.)
So they did some number crunching to see how many rides (might) have been used for one-night stands. And then they did not stop there but mapped it to weekdays, special days (valentine day, bank holidays), the broke down the numbers to cities and did not even stop there -> they even broke it down to city districts – see below the “heatmap” of New York …
This is really scary to see this being done – just think about the potential mis-use. You are living in the US (or any other country/city Uber is offering their services) or travelled to a place where Uber is operational? You want to have your private life investigated by Uber, giving the even the potential to blackmail you (and yes, with the data they could drill down to you an individual user)? They even can correlate the data on your behaviour (not only rides of glory but in general) over all your travels/movements.
And yes, of course there is also a good side on Big Data so I am not condemning it in general – but this clearly shows that we need to be aware about the potential impact of our analysis to be done!
Edit: I had a deeper look at the Uberdata-Blog. Fascinating research being done there – but most of them scary, too. Prognosing where you want to go “In this post, we show you how Uber can use Bayesian statistics and where you get dropped off, to predict where you’re going 3 out of 4 times.”. Mapping crime rates of areas to Uber rides: “Areas of San Francisco with the most prostitution, alcohol, theft, and burglary also have the most Uber rides.” Assessing where you need to go out to date the sex of interest for you in San Francisco: “There are 35% more women in the Marina and 47% more women in Pac Heights on weekend nights than expected. Conversely, there are 23% more men in SoMa, 16% more in the Castro, and 14% more in the Financial District…”
I just added the Uberdata-Blog to my feedreader. I am very curious what kind of analysis they perform (publish) in the future!