People don't grasp how easy it is to build data models like this even without privileged first-party data access.
In 2012 I created a killer prototype that demonstrated that you could accurately reconstruct most people's flight history at scale from social media and/or ad data. Probably the first of its kind. This has been possible for a long time.
A quick sketch of how it worked:
We filtered out all spatiotemporal edges in the entity graph with an implied speed of <300 kilometers per hour or <200 kilometers distance, IIRC. This was the proxy for "was on a plane". It also implicitly provided the origin and destination.
These edges can be correlated with both public flight data and maintenance IoT data from jet engines to put entities on a specific flight. People overlook the extent to which innocuous industrial IoT data can be used as a proxy for relationships in unrelated domains.
In rare cases, there was more than one plausible commercial flight. Because we had their flight history, we assumed in these cases that it was the primary airline they had used in the past, either generally or for that specific origin and destination. This almost always resolved perfectly.
This was impressively effective and it didn't require first-party data from airlines or particularly sophisticated analytics. Space and time are the primary keys of reality.
>We filtered out all spatiotemporal edges in the entity graph with an implied speed of <300 kilometers per hour or <200 kilometers distance, IIRC. This was the proxy for "was on a plane". It also implicitly provided the origin and destination.
Sounds like the bigger issue is that you're able to get "spatiotemporal" data in the first place? Otherwise it's like saying "we can figure out all stores you've been to, if we have your credit card transaction history". Sure, it's kinda creepy that you can figure out which stores I went to, but the bigger problem is that you can get the transaction data in the first place. Moreover whatever "spatiotemporal" data needed to reconstruct such flight history is probably more valuable than the flight history itself. Who cares if you know Joe flew on United 8340 when you have hour-by-hour updates on his rough location?
> Otherwise it's like saying "we can figure out all stores you've been to, if we have your credit card transaction history".
The preposterous thing is that payment processors aren't just allowed to collect this information and tie it to your name, they're required to do that.
People talk a big game about fighting fascism, but how can you allow these laws to exist if you can contemplate what happens when actual fascists get hold of that data going back decades? They need to be dismantled now.
The vast majority of payments fraud is caused by the regulatory environment. Use cards with chips that can be read by commodity PCs and phones using published open standards and then require the card to be physically attached to a device to authorize a new merchant and criminals can no longer make fraudulent credit card charges without stealing the physical object or breaking strong cryptography.
The only reason we don't have this already is that the law makes it so hard to start a competing payments network -- in no small part as a result of KYC requirements -- that the incumbents are insulated from real competition and then don't have to fix the flaws in their systems.
Meanwhile you don't actually need everyone to do it, all you need is someone to do it and then that both becomes a competitive advantage in the market and allows any victim of official misconduct to use that one.
Arguing that we shouldn't do something because it would make it harder to enforce laws is not a convincing argument to me. It sounds like you want to enable people to be criminals.
> Arguing that we shouldn't do something because it would make it harder to enforce laws is not a convincing argument to me. It sounds like you want to enable people to be criminals.
I find this view to be lacking in nuance.
Laws are intended to exist with the consent of the governed. Substantially the whole of society agrees that murder should be illegal, so if someone commits murder we're willing to commit significant resources to investigating and prosecuting the perpetrator. It doesn't have to be efficient or have perfect enforcement because its purpose is to act as a deterrent. Everyone is willing to spend the resources to enforce those laws because everyone agrees that their enforcement is important. Enforcement efficiency is not required when there is popular consent.
Opposing laws that "help criminals" exposes society to shifts in the definition of a crime. When there is a law against being of a particular ethnicity or religion or political ideology, you want to enable people to be criminals. Preventing laws like that from ever being effective is worth sustaining a significant amount of inefficiency in the enforcement of other laws.
And this is not a binary distinction with "laws against murder" on one side and "laws against being Jewish" on the other. The latter is only the viscerally powerful extreme that once made us say never again.
The spectrum spans the full scale, where the middle is filled with police corruption and political retaliation against the opposition and petty busybodies inducing poverty and homelessness through the incompetent micromanagement of society.
Should governments have the ability to freeze the bank accounts of protesters? It doesn't matter what they're protesting or what crimes some minority of the protesters are alleged to have committed when the account freezes are instituted as collective punishment, the answer is no. The government should not have the ability to do that, because in that case they are the criminals, and structural defenses against government abuses are important.
> This is not necessarily a good thing and laws can change without requiring them to be broken.
That's kind of the problem, right? Suppose you have a system that actually allows perfect enforcement and then the government passes a law against some religious practice. Espousing atheism is banned, or Islam, or Christianity, depending on who controls the government this time; take your pick. If anybody who does it is instantly brought up on charges with severe penalties then nobody does it. But that's bad. That's the problem. You need to sustain enough friction to prevent things like that from being possible because enforcing laws like that is worse than anything that could come out of making ordinary law enforcement require more resources.
>If anybody who does it is instantly brought up on charges with severe penalties then nobody does it. But that's bad.
I don't think it's bad. Similar to closed and open source software there is room for closed and open societies. They are different approaches that have different pros and cons.
Okay, let's go with your approach. Then the closed society is China or Iran and the open society is the US and other western countries, right? In which case we shouldn't have any such thing in the open countries.
You can justify almost anything as "progressing society" Tech companies can be "making the world a better place", but that shouldn't give then permission to break laws.
The mental model of how the law works that most people have is wrong.
The law does not, by default, prosecute all crimes. There is no country in the world that has even close to the law enforcement capacity to investigate and prosecute all crimes. What tends to happen instead is crimes that to put it colliquially, "piss off the wrong people" get prosecuted. ie, crimes that draw attention of either the general public or specific people in power.
A reasonable approximation is single digit or less of crimes get investigated and prosecuted, with it obviously being high for violent and visible crimes like murder and lower for less violent and visible crimes like stealing the office paperclips.
Another way of looking at this is, in the current system, if your house get burgled, you need to report it to the police if you expect anything to happen, whereas one could imagine another system where the police already know your house has been burgled and you don't need to report it.
I believe with AI we will be able to scale enforcement much better than a single digit percent. This will allow for more fair enforcement and reviews and cleanup of old laws or punishments which don't make sense anymore.
At this point in human history, is it relevant to the individual whether someone is a criminal? What matters is whether they've injured someone else.
To use the US as an example (I doubt other countries are much better) it's estimated that every adult in the US commits multiple Federal felonies per day[1], Federal law is replete with ridiculous laws[2] and the number of federal laws is uncountable by Congressional Research Service staff. Does it matter at that point?
Is a statistical analysis of the specific number actually the point? Suppose it was three felonies a year. What difference does that make when the prison sentence for each felony is also at least a year? The problem is the same; a prosecutor can throw anyone in prison simply because there are so many laws nobody can follow them all or even realize when they're violating one.
You can check the rest of the thread, but I'm not even convinced that the median person commits 3 crimes a year. Maybe there's an average of 3 felonies per day/month/year if you count all the small businesses that aren't complying with federal product/safety regulation to the letter (thus dragging up the average), but I can't think how realistically the average joe is committing 3 felonies per year.
> I can't think how realistically the average joe is committing 3 felonies per year.
Someone who smokes weed daily in a place where it's illegal could easily commit multiple crimes a day just for drug possession and consumption, for example.
Only 16% of Americans marijuana, according to Gallup. If you exclude people who are in states where it's legal/decriminalized, that'd probably be even lower. Needless to say, even if all 16% of them are criminals, that's far from the median person committing 3 felonies. Moreover the weed example isn't not even applicable to thesis of the book or the commenter that invoked it, which is that the US has so many regulations that nobody can hope to comply with them.
If 1/6 of Americans are potential repeat federal felons based on just one activity, I find it highly dubious that the other 5/6 can't be as well in the other hundreds of activities we undertake each day. Using your parents' Netflix/ Disney+/ etc password can technically be prosecuted under CFAA[1], for example. That's probably another 1/6 at least. Now it's 1/3 of the country.
>A few months after leaving Korn/Ferry, Nosal solicited three Korn/Ferry employees to help him start a competing executive search business. Before leaving the company, the employees downloaded a large volume of "highly confidential and proprietary" data from Korn/Ferry's computers, including source lists, names, and contact information for executives.
Extending that ruling to netflix password sharing is a stretch.
Moreover you can't say "I can think of one activity that many americans do is a felony", and then apply induction on it to claim that the other activities americans due surely contain felonies.
>That's probably another 1/6 at least. Now it's 1/3 of the country.
That's only true if you assume the population of weed smoker and netfilx watchers don't intersect, which is... doubtful.
> Maybe there's an average of 3 felonies per day/month/year if you count all the small businesses that aren't complying with federal product/safety regulation to the letter (thus dragging up the average), but I can't think how realistically the average joe is committing 3 felonies per year.
To begin with, let's not ignore how broad a category "small business" is. Laws requiring health inspections or licenses etc. often operate on the basis of frequency or number of patrons. If you have around a dozen people over for movie night every Saturday with the event published on social media and you all chip in for pizza, are you a food service business? For that matter, is that a public performance in violation of copyright?
If some criminals break into one of your devices or your personal website while you're traveling and you find out about it while you're out of state but don't have time to deal with it until you get back home, have you committed a crime? What if they put some illegal materials there and you clean off the device but still have a backup containing the illegal materials? What if you do delete all of them right away; is that destruction of evidence? What if there's a federal law against keeping the materials and a state law against destruction of evidence and a very specific way to comply with both of them at the same time that may not have been clearly decided by the appellate court when it was happening but has been decided by the time they bring the case against you? What if it was clear ahead of time but wasn't intuitive and you can't afford a lawyer and can't have one appointed until after you've been charged?
It's unreasonable to expect ordinary people to be able to navigate this.
>To begin with, let's not ignore how broad a category "small business" is. Laws requiring health inspections or licenses etc. often operate on the basis of frequency or number of patrons. If you have around a dozen people over for movie night every Saturday with the event published on social media and you all chip in for pizza, are you a food service business? For that matter, is that a public performance in violation of copyright?
That's what courts are for. I don't think there's any case where people tried to prosecute a shared movie night as a business, because it'd be laughed out of court. Same goes for whether it's copyright infringement or not. Moreover if you look at how authoritarian regimes work in practice, dissents are often prosecuted under national security laws, campaign finance violations, or libel laws, not because they violated the health code by having a movie night.
That isn't really how courts work. If you're violating the letter of the law then you are breaking the law and an actual impartial judge would enforce it against you. In practice whether they let you get away with it is based in significant part on whether or not they like you. If the judge doesn't like the administration then maybe they do like you. But if the judge doesn't like you for the same reason the administration doesn't like you then you're going to jail. And it shouldn't have to depend on that; we shouldn't have laws that people are constantly in technical violation of so that the only thing keeping anyone out of jail is prosecutorial discretion and judicial affinity.
Meanwhile you can characterize anything in a negative light. A random home kitchen typically isn't going to meet the standards for commercial operation and the prosecutor's press release isn't going to say "we're prosecuting our enemies for movie night", it's going to say "defendants were operating a for-profit restaurant in violation of zoning rules and storing uncooked meat above fish in the freezer used for storing food sold for resale in violation of the health code" and then stick them with a fine that would make them lose their house.
> Moreover if you look at how authoritarian regimes work in practice, dissents are often prosecuted under national security laws, campaign finance violations, or libel laws, not because they violated the health code by having a movie night.
When the dictator of petrolistan wants to retaliate against their enemies and those laws are available for that, sure.
When the mayor of some US town wants to do the same thing, they might very well resort to health code violations that wouldn't have otherwise been enforced.
Deterrents well short of political executions are still very much official misconduct.
Dissidents are most often prosecuted under those laws, yes, which is a good reason to not have those laws. But I’m aware of at least one case where a Cuban dissident was apprehended and prosecuted for buying cement in the black market, something the government was able to know because they most likely had somebody tagging the person 24/7 [^1]
But that exotic case is not that much needed. Laws will be abused by the powers whenever they want; you don’t need to look farther than the current USA administration and how the president is using war powers to treat poor laborers as enemy combatants and send them to concentration camps. And yet, USA’s system of government was designed in a way that should have prevented the executive to abuse power; why it has failed is another (difficult) discussion, but the founding fathers seemed well acquainted with the despotism of other nations.
> Sounds like the bigger issue is that you're able to get "spatiotemporal" data in the first place?
Almost all data is spatiotemporal data, people just aren't used to thinking about it like that. Everything that "happens" is an event with associated times and places.
Tagging of events with spatiotemporal attributes, or with metadata that can be used to infer spatiotemporal attributes, is pervasive. Every system data passes through, even if not the creator of it, observes the event of the data passing through it. Event observation is not trying to track things but it implicitly and necessarily creates the data that makes tracking and spatiotemporal inference possible.
These kinds of analyses rely almost entirely on knowing the events occurred; you could encrypt the contents of the data and it wouldn't matter. Software leaks spatiotemporal event context everywhere across myriad systems, internal and external, that incidentally collect it. There isn't anything nefarious about most of it and much of it is required for reasons of criminal and civil liability.
What people underestimate is that you can analytically stitch together many unrelated sparse data sources with spatiotemporal attributes, many of which are quite crap or seemingly unfit for purpose, to reconstruct a dense high-quality graph. Counter-intuitively, diverse and seemingly irrelevant data sources often produce better data models. It surfaces bias, errors, manipulation, and processing artifacts in individual sources you might otherwise miss.
It is much more difficult to access the obvious first-party data sources than it used to be, mostly because people with that data are far more selective about who they give access. It doesn't really matter, that is a speed bump for the unsophisticated. The exponential growth in the scale and diversity of network-connected telemetry of all types pretty much guarantees these data models will always be constructible.
The historical limiter has always been the absence of data infrastructure platforms that can handle these kinds of analytics at scale.
>Tagging of events with spatiotemporal attributes, or with metadata that can be used to infer spatiotemporal attributes, is pervasive. Every system data passes through, even if not the creator of it, observes the event of the data passing through it. Event observation is not trying to track things but it implicitly and necessarily creates the data that makes tracking and spatiotemporal inference possible.
>These kinds of analyses rely almost entirely on knowing the events occurred; you could encrypt the contents of the data and it wouldn't matter. Software leaks spatiotemporal event context everywhere across myriad systems, internal and external, that incidentally collect it. There isn't anything nefarious about most of it and much of it is required for reasons of criminal and civil liability.
>What people underestimate is that you can analytically stitch together many unrelated sparse data sources with spatiotemporal attributes, many of which are quite crap or seemingly unfit for purpose, to reconstruct a dense high-quality graph. Counter-intuitively, diverse and seemingly irrelevant data sources often produce better data models. It surfaces bias, errors, manipulation, and processing artifacts in individual sources you might otherwise miss.
That's a lot of technobabble for what essentially sounds like "there's some ad SDK that's phoning home with your gps/ip geolocation every few minutes, if you cross reference that with when flights are, you can guess what flight someone took". How far off am I? Or is there some galaxy brained AI that can infer that from disparate facts like that you stopped posting on twitter for 12 hours, your car's license plate was caught by an ALPR to be heading towards the airport, and 3 weeks ago you visited some portuguese tourism site that had an ad beacon installed?
> Sounds like the bigger issue is that you're able to get "spatiotemporal" data in the first place?
Yeah, this just sounds like it's written from the perspective of a data broker.
Tying particular ad analytics (presumably ip geolocation?) to thousands of particular individuals and having it well populated enough to track them is "privileged first-party data access" by another name.
Your location is leaked in many, many ways. Even if you have location services off on your phone, the first-party (Google, Apple) has access to your precise location. On Android, this bypasses VPNs, and I believe on iOS/Mac first-party apps also bypass VPNs. You are trusting that this data is not leaked to any third-parties. You cannot verify this, as the data is exfiltrated to servers which you can't verify.
Okay, fine, I'll just install another operating system then, like KDE plasma mobile or GrapheneOS. Your location is still leaked 24/7. This is because your cellular modem has it's own operating system, running underneath your phone's operating system, which is triangulating your location at all times. Once again, you are trusting that telecommunications companies aren't misusing this - but please remember they're complied, by law, to make a lot of this information available to numerous third parties.
Okay fine, let me just remove the Sim then and use my phone on Wifi only, always through a VPN. Your location is still being leaked potentially, for example, by your car. Your car also has a cellular modem which leaks your location, and you probably signed a contract allowing that data to be given to hundreds of third-parties.
Of course, all of this is assuming you don't use any social media. Social media can also leak your location, even without location services. If you review a restaurant - that's your location. Where are your friends? You're probably around them. And on and on.
> Your location is still being leaked potentially, for example, by your car. Your car also has a cellular modem which leaks your location, and you probably signed a contract allowing that data to be given to hundreds of third-parties.
Ok, fine. I'll just drive classic cars for the rest of my life. Your location is still being leaked by a global network of automated license plate reading cameras
https://deflock.me/
>You are trusting that this data is not leaked to any third-parties. You cannot verify this, as the data is exfiltrated to servers which you can't verify.
At least on Android you can theoretically disable "google location accuracy" which stops it sending nearby hotspot mac addresses to Google. That's the only public route where google gets your location without you knowingly sending to it. You also imply that mobile operating systems are surreptitiously sending locations back to google/apple even if users have all location related features disabled, but I'm not aware of any evidence this is the case, and this falls into same category as "facebook is secretly listening to you" territory until proven otherwise.
I mean you're saying a lot for rhetorical effect, but it doesn't get around the fact that there aren't that many avenues to reliably collect this data, with high enough resolution and tied to identity, for thousands/millions of individuals, and if you do have that data, you're basically a data broker. I mean, yes, all those things are true, and they're pooled together and available for sale by data brokers.
It's also disappointing that the root comment is distracting from the 4th amendment violations by making the conversation about their vague claims of selling mini-palantir demos through abusing web ads.
The assumption that the data must be "high resolution" is erroneous. Low resolution noisy data works just fine, you just need a lot more of it. You can use standard signal processing tricks such as stacking noisy low-resolution data to extract high-resolution features. This requires a lot more processing but that isn't much of a limitation. These reconstruction techniques work even if the data is from unrelated sources that aren't even trying to measure the thing you are measuring.
Any data exhaust will work, people have created interesting PoCs leveraging things like HVAC data, RF attenuation, etc. High-precision weather models essentially work the same way, making inferences by stitching together diverse event data that has nothing to do with weather.
High-quality high-resolution data sources largely don't exist in the way people imagine they do, so you need to do this anyway. If you have a high-resolution spatiotemporal graph for entities, tying it to identity is always trivial.
It would be more common if it weren't for the fact that open source platforms scale poorly for this type of analytical processing.
Anyone could have acquired this data the time, it was all either free or cheap. Like I said, my business was specialized data infrastructure (e.g. storage engines and analytical parallel processing), we just used these data sources for testing and demos because "free or cheap".
I also have a lot of experience with privileged first-party data but that is governed by a different set of rules and is often regulated. You have to be much more circumspect about how you use it.
Even though it might be convenient to e.g. slurp telemetry off a mobile carrier's backbone, what you eventually realize is the inability to do this isn't a real limitation and in some ways is a blessing in disguise.
Twitter has had timestamped amd geotagged posts for ages. Just clustering things like hashtags of tweets spatiotemporally results in a treasure trove if information about events.
I'm sure that other platforms attach the same kind of info to posts. It's just a matter of scraping it.
The thing that really tickles me is that there's supposedly all this frightening information that can be gathered on people, including by investigating the history of ads they were served; but then in the vast majority of cases the only use the Bad Guys ever seem to come up with for that information is to serve you more ads.
Sounds like if you have a record of a lot of location/timestamp data for people, you look at the distance difference divided by the time difference. Now you have average speed for any pair of points. Now filter where the average speed is as fast as a Boeing jet. That filters out most of the data except for people who are almost certainly on a plane. Et voila, you now look at those data points geolocation and you have people who traveled from one city to another because you already have the location. Compare City1 -> City2 with any public flights in those cities around those times and you know who flew on what flight from where to where and at what time.
from the parent post: `social media and/or ad data`
So if you have ad impression data you have IP geolocation, or maybe better, along with the timestamp. Similarly for socials sometimes you get location metadata, and with image uploads you can can get location metadata (though today these are often stripped, historically they weren't).
You can also exploit it for personal profit. As for stopping it, good luck. Best case is probably to degrade or poison data sources in a preferably legal way.
In this particular case it was just a proof-of-concept, albeit at scale. We did not run a proper ground-truthing process but people actually running that type of data model in production could have ground-truthed the analytic model if they wanted to.
However, it turns out that thousands of people like to talk about their flights on social media, so we scraped that as a spot check and it mostly lined up perfectly. Good enough for a demo and it would have been difficult to come up with an alternative explanation for the patterns in the data.
The purpose of the PoC was to sell the data analysis infrastructure that made that type analysis possible at scale, it wasn't about the data per se. It was a compelling demo we invented given the data that happened to be available. Startup life.
> Good enough for a demo and it would have been difficult to come up with an alternative explanation for the patterns in the data.
For fun edge cases, there's always Antarctica, where you can travel from a US base (which looks like you're in the US) to a NZ base (which looks like you're in NZ) in a couple of minutes: https://brr.fyi/posts/credit-card-shenanigans
i don't have any special knowledge in this area, but just thinking about it idly while sitting here, "robbing their homes while they are away" comes to mind as a good proxy.
Reminds me of this news story of footballer John Terry who's house was robbed because he posted a picture of him on holiday. The insurance company tried to use a 'reasonable care' clause of home insurance to deny his insurance claim.
FYI the source you posted never claimed that John Terry's insurance tried to deny the claim, only mentioning that "some" insurance companies warn of it. However even that claim is questionable, because it isn't even from an insurance company, it's from a content marketing piece by an insurance comparison website.
Wouldn’t that mean all celebrities are uninsurable? If politician/singer/athlete has a public away event, there is little they can do to obscure that fact.
Basically a plot line on the show “Black List”. Had an inside guy at the post office who would forward people stopping mail delivery on vacation. Then used homes as safe houses.
It's funny to see ARC just being described as a "data broker," which strongly implies that it doesn't play a role in facilitating the actual underlying consumer activity.
ARC and IATA absolutely do play such a role, as the financial clearinghouses for ensuring that travel agents (online and offline) and airlines can pay each other, and as gatekeepers/certification bodies for agencies to ensure these financial systems aren't abused.
Now, they absolutely do sell access to data to third parties, governmental and nongovernmental. But the reason they have this data isn't because they buy it to resell it; they are fully part of the funds flow for the underlying transaction. Whether they should be allowed to sell or share non-anonymized data on passenger records and prices paid is a very good question, but at the very least this is about as first-party as data gets.
Two things can be true simultaneously: (a) it is worrisome that a company is selling PII at scale to government entities who would otherwise need to request that data through accountable warrant processes, and (b) we shouldn't call every such company a "data broker" lest we dilute the specificity of that term, particularly when the companies in question participate in the funds flow of the customer transaction.
The amount and extent of data that is available out there by brokers for purchase by literally any company is *mind-boggling*. However bad you think it is, multiply that by 10.
Seems just like retargeting in that case. Ask “victim” to visit page A. On that page A place a retargeting pixel, then now everywhere on the Internet you can display a message for that user as long as you are willing to pay a high price for that impression (high price is way way way less than 0.1 USD)
Reminds me of the time when Signal(the private messaging app) once tried to get ad data from Facebook and show it to users with a high degree of specificity eg “You got this ad because you’re a middle aged woman who enjoys kpop and loves reading about Christopher Nolan”
Around 2014 I worked with recruiters and they had a tool that aggregated data on everyone through LinkedIn, yelp, twitter, GitHub, eventbrite, etc. it was breathtaking the amount of information you could get on anyone, over 10+ years ago.
I’m guessing with the help of Palantir, the government has even more data and can probably link Reddit posts etc based on styleometry and can even perform psychological analysis on your personality and tendencies, etc.
Because none of it is really unknown? People know about it and don't care. Hell, even people on this forum that should know better and care that don't, or think when they hear about stuff like this it's FB pixel or google analytics stuff. The simple fact is with a few basic pieces of information on somebody, there's almost nothing that is sacred or not for sale. People mistakenly believe they're protected by adblockers and stuff, or by avoiding social media, but the simple fact is that it is unavoidable while simply existing and the 1000x comment is from my POV the scale of it is astounding and growing every year and people really don't have a good understanding of the subtle and not subtle ways it can affect you, or when told, don't care/dismiss it. So I don't really feel anymore like explaining it. If more people understood, I'd also stand to profit quite a bit from it, so that's where my frustrated tone is coming from.
I'm pretty sure it was over when we switched to debit/credit cards. Everywhere you go, how much you buy, all that stuff has been sold for quite a while now.
No, it was before this, with phone lines and wiretapping because forcibly allowed by law. As soon as we said "okay, you're allowed to record stuff if it's for a good purpose", it was over.
cash is tracked as well, it's been over for a long time.
each bill has a serial # and it gets scanned going in and out of the bank. Yes, it's still marginally easier to launder cash but if you just take it out of the ATM and spend it at a store it'll get tracked accurately
I don't think this is as accurate as you are making out. Wawa (a connivence store in the Philly area) isn't tracking each $10 that goes in and out of the register. It could float all over the city before hitting a bank, and even then banks typically track serial numbers for large demonizations and we when there's a suspicion of illegal activity. Happy to learn more about this if I have it wrong.
How would one find out what data brokers knew from their cash purchases?
Do banks sell this information? This bill was pulled from this ATM in Georgia by one Claudius McMoneyhands, and then deposited by one CashMoneyBusiness LLC in South Carolina three weeks later
Seems like there could still be intermediaries and a lack of what you actually bought with it at least?
My favorite example is the story about a data broker who, the day after 9/11 happened went from the name "Muhammad" to a list of ~1K people which included 1 out of 4 of the 9/11 terrorists.
I'm aware that using adblockers and avoiding social media doesn't entirely prevent tracking, shadow profiles, and such, but surely it makes things more difficult for these companies, no? Or would you say that there's practically no difference between making an effort to preserve one's privacy and just giving up entirely?
> the subtle and not subtle ways it can affect you
In Manufacturing Consent they measured column inches in the NYT-- IIRC it was something like measuring the total that support the relevant U.S. administration's official position on given policy vs. inches that went against the gov't position. In any case, they were measuring column inches.
What were you measuring to come to your conclusion?
I could give you some great horror stories, but honestly I don't see the benefit in either potentially harming former coworkers of mine that still work at those places or ending myself in some sort of career/legal trouble for something people generally don't care about (other than a few points on HN).
If you were caught demoing something both horrific and internal you would risk serious damage to your career, and ultimately will have zero impact on the industry as there's just too much data out there and too much money wrapped up in it.
Plus, most people working with the data don't bother to look at it. The places I've internally demo'd massive privacy risks were shocked because they didn't realize what their own data was capable of. Most people are just writing jobs that run and shuffle data around from one place to another never really asking "what is this data?" Even among data scientists I'm routinely surprised (so maybe I shouldn't be surprised) how frequently data scientist never do any real error analysis by looking at what the model got wrong and trying to understand why.
Among the additional information Kochava collects and sells are non-anonymized individual home addresses, phone numbers, email addresses, gender, age, ethnicity, yearly income, “economic stability,” marital status, education level, political affiliation and “interests and behaviors,” compiling and selling dossiers on individuals marketed as offering a “360-degree perspective,” the FTC said.
...
According to the FTC, Kochava’s data can identify women who visit reproductive clinics by name and address along with, for example, when they visit particular buildings, their names, email and home addresses, number of children, race and app usage.
...
Kochava marketing materials tell customers it offers “rich geo data spanning billions of devices globally” and that its location data feed “delivers raw latitude/longitude data with volumes around 94B+ geo-transactions per month, 125 million monthly active users, and 35 million daily active users, on average observing more than 90 daily transactions per device.”
...
The complaint also alleges that the company has lax procedures for determining who it is selling data to, saying purchasers are allowed to use a generic personal email address, label an alleged company as “self” and explain they plan to use the data for “business.”
I was on a team of about 25 involved in pitching a particularly large deal to a public sector client (think US state/local governments). The audience was about 50 people from different departments and agencies throughout the state and our pitch team consisted of about 6-8 very big shots + me the computer nerd. During our prep and rehearsals a "look book" was distributed which consisted of write ups on each person expected to be in the audience. It was very detailed with a career and education history of each person, a personality analysis, where their interests/passions lie both at work and personally, and what topics and key points set them off. The deck was very professional and not something thrown together, i was impressed but a little taken aback too.
I asked this same thing in another comment here, but since you mention working in this space, I ask you directly. Where do the brokers obtain their data from? If it's easy for them to obtain, would those who buy it from brokers not be able to simply get it from its respective sources? I'm genuinely curious about how this dynamic works.
what are some good cheap sources to get this? i have an art project idea that i've wanted make that would require invasive data profiles, but it's very big project and i have no idea where to start
I would say that in general the HN crowd doesn't understand the industry at all, and they need to change the direction of their understanding, rather than the magnitude. Your basic hackernews believes that e.g. Google is out there selling all your personal information. But compared to these other industries the tech industry is almost airtight. It has long been possible for someone to pick up the phone and order, in any format they want, transaction data as narrowly targeted as they wish. Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town? By end of day.
This is correct; what people fundamentally misunderstand is that data brokers directly sell personal information about people, but Google and Facebook only allow for targeted advertising while keeping personal information within the confines of their company.
The meta-conspiracy-theory would be that the dossier industry whips up conspiracy theories about online advertisers in order to maintain their own low profile.
I think the HN crowd is especially vocal about the tech industry in particular because that's the industry a lot of us have first-hand knowledge of - we know from personal observation that it is anything but airtight
It has been truly frustrating when people will blame the "tech industry" for what is essentially reckless behavior from other industries. For a while, it was often the finance sector that did most of the crazy stuff. With crypto being an obnoxious overlap of the two.
Data brokers are the OG tech industry. They've been around since the late 60s selling consumer data. Just because it's unsexy data storage and query work doesn't make it less tech.
I'm also surprised that this is so hidden from everyone. Where are the engineers leaking secrets? Much of the online discourse is pure speculation based on what can be observed from the very end of the chain. (ie, what your computer is giving up) The speculation is not necessarily _incorrect_ but is too vague to be useful to anyone. Where does my data _actually_ go? Does anyone know? Can anyone describe the life of my data as it goes through the whole ecosystem? Does anyone know what mitigations are, and are not effective?
Because what's the headline you're going to get out of it?
If the headline is "Mark Zuckerberg is amassing your data and you know it's for evil", it's an easy sell. If it's "there's an ecosystem of little-known companies that sell transaction, location and lifestyle data to marketers, journalists, PIs, and police departments alike", it's not exactly the kind of a message that spurs people to action. And yeah, the newspaper that would be breaking the news is a customer too.
Despite being near universally hated externally, data brokering is a boring industry and is seen as very mundane and routine. They don't attract the type of engineers that have a strong moral stance and will go rogue and blow the whistle. They attract the middle age suburbanite just trying to get through the day and make a living.
Never ask a sales person how much yo have to pay when the prices are not already clearly stated. Tell them how much you are willing to spend to see if they will do it for that amount. Sales people will always shoot high hoping to not leave money on the table. The price might change depending on how much you squeal and how high they shot. Your initial "willing to spend" should also be lower than you're actually willing to spend for the same but converse reason
Ok, so nobody here knows directly of any case where such data has been purchased, or vaguely similar, and we have no pricing information whatsoever available, but we are somehow completely knowledgeable about it being possible and how to do it? That sounds unlikely.
The conversation was for buying transaction data from specific people, something that many seem to insist is easy and cheap and doable. Meanwhile if you actually read the responses to that search you smugly cited you'll find that no one seems to know how to actually do anything remotely like this. Yes this data is definitely harvested and it seems like you should be able to buy it in bulk from someone somewhere, but again no one seems to know where or how much or what the purchase minimum would be etc.
Been busy, but since you seem to be unable to find any body by searching on your own for the past 6 hours, here's something I found with a quick little search.
Of course people do. 5 seconds spent doing the most sparse-ass research will help you find plenty of stuff. If people don't respond, I imagine, for fear of 1) outing the specific area they work in, or 2) realizing these kinds of comments aren't generally acting in good faith so it is generally a complete waste of time.
I'll waste my own time and give a trivial example just off the top of my head. Go peruse some of the products offered on this page, put on your thinking cap or even look into them further and imagine what kind of data those services provide, where it likely comes from, and where it is sold to, and you'll be well on your way - and those are just the ones that are advertised openly.
Pretty much every one of the big players people typically associate with other areas such as personal credit have some feet in this space somewhere. Then theres the hundreds of lesser-known fly-by-night guys that have their own DB's they build off of mostly what is the same data, but correlated in different ways and sold to different audiences.
There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences. The fact you personally have never come across it, or are saying you aren't, is only a data point that is interesting to you, and no one else that actually knows what they are talking about in this space. Hope this post helps you somehow.
>There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences.
Okay but then why not name at least a couple such services. Also, if the tech industry isn't selling data to them, where do they obtain it? Again, I see lots of ambiguity here, and the example link from transunion is hardly revealing of anything.
I think you misunderstand. I'm not doubting that it happens widely and pervasively. It's evident that this is the case. I just requested examples based on some of the very specific claims made here despite many ambiguities in how they were phrased.
Anyhow, thanks for taking the time to include some links.
For the most part, readers here are against it. Just because someone doesn’t know how to do it does not mean it is not doable. If it were not doable, these companies would not exist. I’ve already spent more time than I care on the topic. So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know. You can only lead hostess to water, but you can’t make it drink.
It could also mean that if you have to ask... or the first rule of data brokering...
Seems like the first thing to do would be to get an account with one of these data brokers. I'd imagine most of these places are "contact us for pricing" so they can play used car salesman games
Or, you could ask John Oliver to do it for you and then tell all of us on one of his episodes exactly how in depth it could get. They have the money to do this, and it seems like something right in his team's wheel house
John Oliver likes to spend HBO's money to do things others can't do while entertaining the rest of us. I'm not spending my money on something to prove what is known as possible for you. At this point, even with receipts, you're coming across as someone that would argue that grass is not green, or water isn't wet, and fire isn't hot.
Just because someone doesn't answer your belligerent questions does not mean it's not possible. It probably means that the people that are doing this with first hand knowledge have too much to do than trying to convert doubting Thomas over here.
All of this started because in response to an extremely concrete question, what's the cost of transaction data for a tightly constrained population, you replied with a smug non-answer about the greed of salespeople. These questions only got "belligerent" because every single answer has been nonsense insisting that it's super easy and cheap but also I couldn't possibly name a single site where this data is sold or provide even an order of magnitude of cost. Or maybe now it requires HBO levels of funding, who knows.
I offered sage advice on how to negotiate when you don’t know a firm price on anything whether that be data or a car or a home remodeling. If you want to say that advice was a smug answer then that’s on you. Every answer after has just gone further and further off the rails
Nah there's no way you actually watch John Oliver because that was really funny. Anyways, you mentioned earlier that we wouldn't believe you even if you posted receipts but that's actually exactly what we want to see. Like, just the name of a business, the thing that was sold, and the price.
i think it could be feasible to get an ad in front of "35-year-old dentists living on the 400 block of Elm street in local town" who has bought product X but i've never seen a transaction by transaction purchase history being for sale.
Anyway to opt out of this type of data collection per company? I know for some things you can contact each individual broker and opt out (via some identifier like your email address) of your data being at least publicly available
> Your basic hackernews believes that e.g. Google is out there selling all your personal information
To add to this, any mention of "telemetry" is taken to mean your PII being taken by bad actors to abuse, instead of what it is in 99% of cases, which is usage statistics. (X% of our users use feature A, it merits investment). It can be both, but there's usually no place for differentiation, just pitchforks.
The industry betrayed consumers' trust to the point where no project can be trusted to be mindful of data anymore. Even Proton Mail ended up ratting to the French, and that was just IP and session info, so who can we even trust to get "good telemetry"?
Logs aren't telemetry and calling a response to a court order "ratting out" is exactly the kind of behavior that makes people increasingly skeptical of privacy advocates.
Or they architect their system better so that they never collect the IP addresses to begin with. I think Privacy Pass and other things Mullvad is doing help in this area, but I am not aware of Proton working with them to implement anything like this. But Proton should do this, because it’s relevant to customers of Proton.
Apparently not Privacy Pass related, will keep looking as I seem to remember that Mullvad was doing that implementation, but I may remember incorrectly.
Okay, and who are these people you contact for this data, and how do they themselves obtain it so precisely? You say the big tech industry is pretty air-tight about sharing data, so how does mysterious X company have on hand the credit ratings of all those youngish dentists on Elm street, among other kinds of information? How o these dynamics work, since you seem to know it internally?
A mobile provider enters into marketing sharing agreements with credit card companies. It extracts housing information from local property and tax records. It enters into marketing sharing agreements with retailers, payment processors like ADP. Same with license plate reading companies, loan companies, banks, professional organizations, etc.
It fills its data lakes with the vectorization and down tilt data that it collects every day. It uses federated batched Hadoop tasks to join the above data lakes into one large data lake. Mid-PB in size.
Then it looks for mobile phones that travel to the 400 block at night and stay there, that are buying dentist stuff from Walmart, travel to a dentist office every workday, have an income over $120k, and are a member of the local dentist society. Maybe look for someone with dentist student loans, graduated with a dental degree.
None of those data points can identify an individual. Taken together they can ID just about anybody.
But maybe there is a chance that you ID their wife/husband. So maybe include/exclude people that regularly visit OBGYN offices.
Back in the day we could link cell numbers to credit card purchases in locations to the point of being to identify the name of the person and what they purchased and where it was purchased. For all people in a metro area that were using credit cards and physically visiting stores.
My question here is also how the brokers obtain the data themselves? Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves? Also, if the data is in any case available, the real at-fault culprits aren't so much the brokers as those who store and so easily sell it in the first instance.
> Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves?
In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I used to work with real estate data for the government and if you search for any common things you might want to know you often land on a data brokers page even though property assessor data is freely available in most counties. The problem is each county has their own system of storing data and their own process for searching it. It's a lot of work to learn how just this one dataset works, combining this for all counties in the US is a massive project.
Whenever I buy a new home I always look up all my neighbors, figure out when they bought the house, how much they paid etc. Some people get freaked out by this, but this information is public in most counties.
By joining this data with another public data set, you can actually figure out which lender your neighbors used and what their reported income at time of sale, their age and ethnic background.
Of course there are plenty of other ways data brokers come across data, but even cleaning up and joining public data can require a fair bit of time and expertise.
> In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I am a perfect example of this. Due to a bit of a quirk in how my house got its address assigned to it in 1959, we have a unique postal code. If a data broker gets access to a list of product purchases by postal code from a retailer, that's in theory somewhat anonymized. However... if they also get a list of people-postal code mappings, they have now established exactly what products my wife and I have purchased (by virtue of us being the only two people with this postal code).
Do that across multiple retailers and they've painted an incredibly vivid picture of what exactly we do with our time.
Thanks for the detailed reply! So essentially, what many of them do is scour public data sets of all kinds, cross-reference them and repackage the more complete product as their own, which people then buy simply because it's easier to get it that way, all wrapped up neatly than doing the legwork? This is the basic gist of it? As for the complex and highly specific data about individuals, they do the same thing or do they buy from still other sources? I also wonder if they buy any hacked information off the dark web.
It's amazing to me that the market for data is so well hidden from public view. So many large companies are mining and trading data on a daily basis - you would think that a data marketplace would have been a thing by now, especially with all the noise about "decentralisation" (yes, I know, crypto shill bros).
I've been touting this as a business model for years. Better still, I'd like to see it done with behavioural models (in the open). That would really blow the lid off the industry. Imagine people charging companies, instead of simply being the product...
Is it really that hidden? In 2021, a guy went to another person's home to exact revenge for something 50 years earlier. Security video showed him holding the PeopleFinders folder. What should surprise people is their governments are selling some of the data.
Here's some research aided by Perplexity, which estimates that the global data market is valued at about $1.7 Trillion, with data monetization growing at about 17.6% CAGR:
Also, Meta can identify you based on your movement and a few pieces of social data (all of which is in the open).
Tel Aviv airport has been running behavioural monitoring for about a decade, predicting crimes before they happen.
You mention a case from 2021, which is about $5 trillion ago, and think that the government selling data is surprising. This is mature market that already knows everything about everyone, especially in the US, and is more concerned with what to do with it. The faucet is open, the ground floor is flooded, and we're discussing the different types of fish that have moved into our apartment.
Thinking of ways to profit from it is the absolute norm but, yes, it is perverse.
I'd happily run it as a non-profit with the purpose of highlighting the value of people's data. Tough gig though, when there are all these "off switch" guys around.
I don't get it. Why would CBP and ICE need to buy this from a data broker? The TSA is right there scanning everyone's boarding pass as part of going through security.
Because there is probably a well-defined regulatory framework for accessing data collected by the TSA, whereas there are few or no requirements when the same data is purchased from a broker.
It is not even certain that the data actually comes from the TSA. It could come from airlines, payment companies, etc.
There is no guarantee of quality when purchasing data from a broker.
The regulatory angle at least explains part of my wondering. I'm not really surprised that they have access to this information, I'm just surprised that they buy it, rather than just demanding it be handed over.
Probably because the tsa isn't able/allowed to hand out access willy nilly.
It's kinda like how the police need warrants to request cellphone data, but cellphone companies could sell realtime data to third parties who in turn sold it to the police.
When I worked for the federal government I wanted to collect some publicly visible tweets (this was before the Library of Congress started to harvest them, and back when the API was better). As a government employee I had to write a detailed document of: why I needed this data, what PII would be stored, how long it would be stored and how I would ensure it had been deleted. Then that document had to be approved. Even though this is a project that any person could have done on the weekend, I still had to go through all this work for approval, the collect the data.
But you're proposing something even more outlandish, asking another agency for data. The politics of this are mind bending. If one one agency give their data to another and that agency is successful using it it will make the giving agency look bad which is unacceptable. It was wild how many times another, supposedly friendly agency, would not share data. In fact, I was cautioned not to even bring up the idea in shared meetings because it would create unnecessary friction.
If you buy it from a 3rd party government contractor, none of this has to happen.
Government uses corporations to get around laws and the constitution. Corporations in turn get to use government to get around regulation.
Same as it ever was.
Beyond the other reasons stated re: regulations and law, which this government seems to be more than willing to ignore, the process of setting up reliable feeds of usable data between organizational functions can be more difficult than buying the data from an entity whose profit derives from curation and distribution of the same data. It might seem absurd on the surface but paying a premium for a repackaging of the data is often meaningfully easier and more reliable and you probably save money in the end. The TSA tech teams role isn’t to package and enrich data with useful metadata, with documentation and SLAs, and their incentives don’t naturally align no matter how hard a political appointee bangs a table. The data broker has every incentive however, and will continue to in perpetuity.
Sure but when is that purchase transferred to TSA ? It’s not disclosed . I agree it’s a possibility, but having the flight purchase info is higher value and more complete .
This could actually be interesting because in many past egregious data broker cases, the offenders had no business in the EU so they could just laugh as they were handed one 20M fine after the other (e.g. Clearview), or they were making way more than 4% of their revenue in profit from privacy violations so they could just risk the fine.
But here, the controller of the data is the airline, the transfer to the data broker might be illegal, and an airline is the worst company to commit GDPR violations with: They have a lot of global revenue but a relatively thin margin, very little of that margin comes from data abuse (so they can't just shrug off the GDPR fine as a small cost of doing shady business), and they are reachable in the EU (worst case a member state can ground and confiscate their planes, and essentially ban them from flying to the EU by threatening to confiscate any other plane that lands). And yes, Germany will impound a plane to get debts paid: https://www.reuters.com/article/world/thai-prince-to-pay-bon...
While airlines are the obvious source for such data sets , there are a number of other sources.
The barcode in the boarding pass contains all the information that airlines know about you [1]. It is after all only encoded and not encrypted and so many companies manufacture readers for it.
Airports check-in systems, or it could be from the baggage handling system , the duty free shop or the airport lounge and so on.
There are so many different players who have access to most or all of the data it would hard to prove it came any one source at all.
That is just the barcodes on the boarding pass, passport scanners are like couple of hundred dollars ans airport shops/car rentals use them all the time.
Many airports use facial scanning these days and don’t even ask for boarding pass/passport/visa during boarding at all .
There are auxiliary sources which could be used in conjunction with other sources like Uber booking and so on.
I agree that they can get the data through other means. Not so sure about
> There are so many different players who have access to most or all of the data it would hard to prove it came any one source at all.
Because a prosecutor can obtain copies of all emails talking about this, they can examine your bank accounts for payments from data brokers, they can require legal to give them copies of any contracts, they can look at audit logs from the production database and airlines aren't Evil Inc -- stuff will inevitably leak and get out. You can't cover yourself that well as a CEO looking to make a quick buck...
Siding the topic. Does anyone have any estimate how much does a regular company make for selling this data? I do not mean those focusing on advertising. But companies that willingly sell their customers data and habits?
You can. I emailed ARC and they complied with my request. Helps if you're in California and mention your rights. You can also opt out of them sharing your data. Any consequences to this I guess I'll find out later this year when I'm flying a lot (guessing absolutely zero).
>>"Movement unrestricted by governments is a hallmark of a free society. "
The other half of the lede is that this govt is using Insert_Method of restricting the movements of it's residents.
At this point, any persecuted activity, e.g., obtaining reproductive healthcare with a link to a person in a Red State, requires opsec procedures comparable to a CIA dark op just to not get persecuted.
Who funded many data brokers in the first place? Lots of three letter agencies, through intermediaries. Modern phones + social media = zero cost surveillance for the big brother.
As far as I know there is no definitive guide for how to carry out a 'digital privacy reset' or 'digital rebirth' - but your LLM should be able to give you good instructions.
To do it properly, not only would you have to change all your logins and email accounts, but simultaneously start using a new computer and phone. Also, move home.
In other words: very hard to achieve. But I wonder if there is a set of achievable actions one can take that gets you to 'very good privacy'?
Chilling things here, however - I guess that the US government in general would have access to flight info? at least those going to or travelling within the US.
I have given up keeping my data private from the government. It’s impossible to avoid, so I signed up for Clear, etc because I know they have that information already.
Frankly, Clear and TSA-Pre makes my life so much easier and since I don’t commit crimes I’m not very worried… just a little worried.
For me its not about keeping my data private so much, more about making it harder for them to just have blanket easy access. I have a passport, precheck, global entry... they know who I am and where I go. But if I can make it just a little harder for the other gov agencies to know what I'm doing that's a win in my book.
I hate the excuse "since I don't commit crimes". It's not about that. If they want your info that you're not directly giving them, they can get a warrant.
What if it affects your ability to get work? Have you ever made or viewed any posts that could be considered political or made comments on a political post? What agenda do you support with those actions?
Data brokers listen to everything, track your movements, buying habits, internet history, apps, app usage, buying habits, etc.
Terms of service are meaningless if they keep the extent as secret as possible. Facebook has demonstrably shown this and as shocking as it is they are restrained compared to lots of companies.
Especially when you can out source the full evil to a wholly owned subsidiary for plausible deniability.
And if private corpse know something, many foreign governments know all of it.
People would be surprised at how cheap data is. My company is offered credit card purchases with demographics, occupation, income level, down to the zip code for what is basically pennies. We didn't buy it but that's what advertisers know about you.
Customs & Border Patrol. Immigration and Customs Enforcement. Domestic travel does not involve immigration or customs. No international borders are crossed.
Giving CBP line-item access to the movements of Americans makes as much sense as giving the SEC access to healthcare records.
There is virtually no reason at all for these organizations to have any knowledge about the movements of people inside the US. They do not get to be super-police just because they want to be.
This is the meat of the "unitary executive" legal theory. Under this theory, CBP is just a convenient name for a part of the executive branch. If any part of the executive branch has the right to (or even just isn't prohibited from using) a piece of data, then its just a matter of the president issuing the right orders so that CBP can too. The same would go for getting medical records over to the SEC. Even if he forgets to issue those orders, the Justice department (also part of the executive) obviously won't charge anyone with any kind of crime, and if they did he could pardon them or arrest the judges involved.
I happen to believe that this is all just a convoluted way to back into the fuhrerprinzip by way of originalism, but I have no power.
Either way, the US is not a liberal democracy any more. Laws do not apply to the powerful. Strength and power are the only things that matter. The Enlightment project is dead in Washington (and most statehouses) and the only question of consequence left is: What will replace it?
100%. Neatly incorporated in there is that Congress cannot make any laws to bind the executive in any way. I have a feeling that many people cheering this are going to regret it when the Executive changes hands.
If the executive branch changes hands, I would be willing to bet money that the Supreme Court will immediately do a 180 and declare all of these things as no longer powers of the executive branch.
All of the things that were perfectly OK for Trump to do will suddenly be off limits for a Democrat president.
It's also not at all clear that they intend to let the White House change hands again. One would assume that they have all the power they need to make that happen, and why wouldn't they?
I think if you're a citizen, then I agree with you. However if you're an alien (legal or not) I think they should be allowed to figure out where you are.
They absolutely should, but the onus is on them to figure out a way to do that within the confines of existing law. Existing law notably does not make ICE or CBP a super-agency that can do whatever it wants.
There has been no credible argument why this logic is acceptable for CBP and ICE but not FBI or DEA.
Why are the misdemeanor civil violators ICE chases so important that federal laws no longer apply, when until a year ago it was generally understood that agencies like DEA had guardrails even when trying to apprehend people on felony charges?
If this stands, there is no logically consistent rationale for DEA not being able to perform warrantless wiretaps of all communications, etc.
One can use this logic to create an omnibus surveillance apparatus covering all aspects of communications, commerce, etc.
We do not give other law enforcement similar deference, even though it might help them in some fraction of cases. For example: SEC could prosecute more insider trading if it was able to wiretap all domestic communications.
And yet Americans are subject to TSA stupidity for all domestic flights. The actual lines of when the federal government does and does not have authority are very blurred, even though I would personally argue they shouldn't have any authority on anything unless expressly granted by the Constitution.
Because all of the 9/11 flights were domestic. One of the duties of ICE is investigating terrorism (along wiht transnational criminal organizations).
Air travel in the United States is mamanged by Federal law, not state. This is solidly in the Federal law enforcement wheelhouse. Anything that crosses state boundries is ALSO under Federal law.
So what else is new? Have you heard about Palantir? The government literally sells (or gives) our private data to them. This should be illegal as they don't actually own this data legally as it's not covered by EULA which is generally how data brokers get around privacy violations and governments around unreasonable search and seizure.
People don't grasp how easy it is to build data models like this even without privileged first-party data access.
In 2012 I created a killer prototype that demonstrated that you could accurately reconstruct most people's flight history at scale from social media and/or ad data. Probably the first of its kind. This has been possible for a long time.
A quick sketch of how it worked:
We filtered out all spatiotemporal edges in the entity graph with an implied speed of <300 kilometers per hour or <200 kilometers distance, IIRC. This was the proxy for "was on a plane". It also implicitly provided the origin and destination.
These edges can be correlated with both public flight data and maintenance IoT data from jet engines to put entities on a specific flight. People overlook the extent to which innocuous industrial IoT data can be used as a proxy for relationships in unrelated domains.
In rare cases, there was more than one plausible commercial flight. Because we had their flight history, we assumed in these cases that it was the primary airline they had used in the past, either generally or for that specific origin and destination. This almost always resolved perfectly.
This was impressively effective and it didn't require first-party data from airlines or particularly sophisticated analytics. Space and time are the primary keys of reality.
>We filtered out all spatiotemporal edges in the entity graph with an implied speed of <300 kilometers per hour or <200 kilometers distance, IIRC. This was the proxy for "was on a plane". It also implicitly provided the origin and destination.
Sounds like the bigger issue is that you're able to get "spatiotemporal" data in the first place? Otherwise it's like saying "we can figure out all stores you've been to, if we have your credit card transaction history". Sure, it's kinda creepy that you can figure out which stores I went to, but the bigger problem is that you can get the transaction data in the first place. Moreover whatever "spatiotemporal" data needed to reconstruct such flight history is probably more valuable than the flight history itself. Who cares if you know Joe flew on United 8340 when you have hour-by-hour updates on his rough location?
> Otherwise it's like saying "we can figure out all stores you've been to, if we have your credit card transaction history".
The preposterous thing is that payment processors aren't just allowed to collect this information and tie it to your name, they're required to do that.
People talk a big game about fighting fascism, but how can you allow these laws to exist if you can contemplate what happens when actual fascists get hold of that data going back decades? They need to be dismantled now.
Even if they weren’t required to do it, they would do it anyway because it’s an important part of fraud detection.
The vast majority of payments fraud is caused by the regulatory environment. Use cards with chips that can be read by commodity PCs and phones using published open standards and then require the card to be physically attached to a device to authorize a new merchant and criminals can no longer make fraudulent credit card charges without stealing the physical object or breaking strong cryptography.
The only reason we don't have this already is that the law makes it so hard to start a competing payments network -- in no small part as a result of KYC requirements -- that the incumbents are insulated from real competition and then don't have to fix the flaws in their systems.
Meanwhile you don't actually need everyone to do it, all you need is someone to do it and then that both becomes a competitive advantage in the market and allows any victim of official misconduct to use that one.
The people most opposed to this IME have been the "fascists" who wanted bitcoin. It's another one of those horseshoe situations I guess.
Arguing that we shouldn't do something because it would make it harder to enforce laws is not a convincing argument to me. It sounds like you want to enable people to be criminals.
> Arguing that we shouldn't do something because it would make it harder to enforce laws is not a convincing argument to me. It sounds like you want to enable people to be criminals.
I find this view to be lacking in nuance.
Laws are intended to exist with the consent of the governed. Substantially the whole of society agrees that murder should be illegal, so if someone commits murder we're willing to commit significant resources to investigating and prosecuting the perpetrator. It doesn't have to be efficient or have perfect enforcement because its purpose is to act as a deterrent. Everyone is willing to spend the resources to enforce those laws because everyone agrees that their enforcement is important. Enforcement efficiency is not required when there is popular consent.
Opposing laws that "help criminals" exposes society to shifts in the definition of a crime. When there is a law against being of a particular ethnicity or religion or political ideology, you want to enable people to be criminals. Preventing laws like that from ever being effective is worth sustaining a significant amount of inefficiency in the enforcement of other laws.
And this is not a binary distinction with "laws against murder" on one side and "laws against being Jewish" on the other. The latter is only the viscerally powerful extreme that once made us say never again.
The spectrum spans the full scale, where the middle is filled with police corruption and political retaliation against the opposition and petty busybodies inducing poverty and homelessness through the incompetent micromanagement of society.
Should governments have the ability to freeze the bank accounts of protesters? It doesn't matter what they're protesting or what crimes some minority of the protesters are alleged to have committed when the account freezes are instituted as collective punishment, the answer is no. The government should not have the ability to do that, because in that case they are the criminals, and structural defenses against government abuses are important.
>Opposing laws that "help criminals" exposes society to shifts in the definition of a crime.
This is not necessarily a good thing and laws can change without requiring them to be broken.
> This is not necessarily a good thing and laws can change without requiring them to be broken.
That's kind of the problem, right? Suppose you have a system that actually allows perfect enforcement and then the government passes a law against some religious practice. Espousing atheism is banned, or Islam, or Christianity, depending on who controls the government this time; take your pick. If anybody who does it is instantly brought up on charges with severe penalties then nobody does it. But that's bad. That's the problem. You need to sustain enough friction to prevent things like that from being possible because enforcing laws like that is worse than anything that could come out of making ordinary law enforcement require more resources.
>If anybody who does it is instantly brought up on charges with severe penalties then nobody does it. But that's bad.
I don't think it's bad. Similar to closed and open source software there is room for closed and open societies. They are different approaches that have different pros and cons.
Okay, let's go with your approach. Then the closed society is China or Iran and the open society is the US and other western countries, right? In which case we shouldn't have any such thing in the open countries.
> It sounds like you want to enable people to be criminals.
Yes, wherever it is criminal to improve the wellbeing or support progress of society, I support the ability of people to be criminals.
Rosa Parks wasn't allowed to sit at the front of the bus. Criminal.
I doubt MLK had a permit for every march. Criminal.
I doubt the founding fathers were legally allowed to oppose the British taxes. Criminals.
A society with no crime is a dystopia.
You can justify almost anything as "progressing society" Tech companies can be "making the world a better place", but that shouldn't give then permission to break laws.
The mental model of how the law works that most people have is wrong.
The law does not, by default, prosecute all crimes. There is no country in the world that has even close to the law enforcement capacity to investigate and prosecute all crimes. What tends to happen instead is crimes that to put it colliquially, "piss off the wrong people" get prosecuted. ie, crimes that draw attention of either the general public or specific people in power.
A reasonable approximation is single digit or less of crimes get investigated and prosecuted, with it obviously being high for violent and visible crimes like murder and lower for less violent and visible crimes like stealing the office paperclips.
Another way of looking at this is, in the current system, if your house get burgled, you need to report it to the police if you expect anything to happen, whereas one could imagine another system where the police already know your house has been burgled and you don't need to report it.
I believe with AI we will be able to scale enforcement much better than a single digit percent. This will allow for more fair enforcement and reviews and cleanup of old laws or punishments which don't make sense anymore.
> Arguing that we shouldn't do something because it would make it harder to enforce laws
If you want to do it, get a warrant.
At this point in human history, is it relevant to the individual whether someone is a criminal? What matters is whether they've injured someone else.
To use the US as an example (I doubt other countries are much better) it's estimated that every adult in the US commits multiple Federal felonies per day[1], Federal law is replete with ridiculous laws[2] and the number of federal laws is uncountable by Congressional Research Service staff. Does it matter at that point?
[1] Three Felonies A Day - ISBN 978-1594035227
[2] https://x.com/CrimeADay
>[1] Three Felonies A Day - ISBN 978-1594035227
That's not a serious estimate: https://news.ycombinator.com/item?id=43744267
Is a statistical analysis of the specific number actually the point? Suppose it was three felonies a year. What difference does that make when the prison sentence for each felony is also at least a year? The problem is the same; a prosecutor can throw anyone in prison simply because there are so many laws nobody can follow them all or even realize when they're violating one.
You can check the rest of the thread, but I'm not even convinced that the median person commits 3 crimes a year. Maybe there's an average of 3 felonies per day/month/year if you count all the small businesses that aren't complying with federal product/safety regulation to the letter (thus dragging up the average), but I can't think how realistically the average joe is committing 3 felonies per year.
> I can't think how realistically the average joe is committing 3 felonies per year.
Someone who smokes weed daily in a place where it's illegal could easily commit multiple crimes a day just for drug possession and consumption, for example.
Only 16% of Americans marijuana, according to Gallup. If you exclude people who are in states where it's legal/decriminalized, that'd probably be even lower. Needless to say, even if all 16% of them are criminals, that's far from the median person committing 3 felonies. Moreover the weed example isn't not even applicable to thesis of the book or the commenter that invoked it, which is that the US has so many regulations that nobody can hope to comply with them.
If 1/6 of Americans are potential repeat federal felons based on just one activity, I find it highly dubious that the other 5/6 can't be as well in the other hundreds of activities we undertake each day. Using your parents' Netflix/ Disney+/ etc password can technically be prosecuted under CFAA[1], for example. That's probably another 1/6 at least. Now it's 1/3 of the country.
[1]: https://decider.com/2022/01/04/is-it-federal-crime-to-share-...
> In 2016, the US 9th Circuit Court of Appeals ruled that sharing online passwords is a crime prosecutable under the Computer Fraud and Abuse Act.
Wikipedia on the case in question:
https://en.wikipedia.org/wiki/United_States_v._Nosal
>A few months after leaving Korn/Ferry, Nosal solicited three Korn/Ferry employees to help him start a competing executive search business. Before leaving the company, the employees downloaded a large volume of "highly confidential and proprietary" data from Korn/Ferry's computers, including source lists, names, and contact information for executives.
Extending that ruling to netflix password sharing is a stretch.
Moreover you can't say "I can think of one activity that many americans do is a felony", and then apply induction on it to claim that the other activities americans due surely contain felonies.
>That's probably another 1/6 at least. Now it's 1/3 of the country.
That's only true if you assume the population of weed smoker and netfilx watchers don't intersect, which is... doubtful.
> If you exclude people who are in states where it's legal/decriminalized
There is no state where cannabis derivatives are federally legal.
https://www.ecfr.gov/current/title-21/chapter-II/part-1308
Yeah I agree, though include "knowingly employing unauthorized immigrants" in those averages.
Exceeding the driving speed limit is more of an "infraction" and not a crime until it becomes reckless.
> Maybe there's an average of 3 felonies per day/month/year if you count all the small businesses that aren't complying with federal product/safety regulation to the letter (thus dragging up the average), but I can't think how realistically the average joe is committing 3 felonies per year.
To begin with, let's not ignore how broad a category "small business" is. Laws requiring health inspections or licenses etc. often operate on the basis of frequency or number of patrons. If you have around a dozen people over for movie night every Saturday with the event published on social media and you all chip in for pizza, are you a food service business? For that matter, is that a public performance in violation of copyright?
If some criminals break into one of your devices or your personal website while you're traveling and you find out about it while you're out of state but don't have time to deal with it until you get back home, have you committed a crime? What if they put some illegal materials there and you clean off the device but still have a backup containing the illegal materials? What if you do delete all of them right away; is that destruction of evidence? What if there's a federal law against keeping the materials and a state law against destruction of evidence and a very specific way to comply with both of them at the same time that may not have been clearly decided by the appellate court when it was happening but has been decided by the time they bring the case against you? What if it was clear ahead of time but wasn't intuitive and you can't afford a lawyer and can't have one appointed until after you've been charged?
It's unreasonable to expect ordinary people to be able to navigate this.
>To begin with, let's not ignore how broad a category "small business" is. Laws requiring health inspections or licenses etc. often operate on the basis of frequency or number of patrons. If you have around a dozen people over for movie night every Saturday with the event published on social media and you all chip in for pizza, are you a food service business? For that matter, is that a public performance in violation of copyright?
That's what courts are for. I don't think there's any case where people tried to prosecute a shared movie night as a business, because it'd be laughed out of court. Same goes for whether it's copyright infringement or not. Moreover if you look at how authoritarian regimes work in practice, dissents are often prosecuted under national security laws, campaign finance violations, or libel laws, not because they violated the health code by having a movie night.
> That's what courts are for.
That isn't really how courts work. If you're violating the letter of the law then you are breaking the law and an actual impartial judge would enforce it against you. In practice whether they let you get away with it is based in significant part on whether or not they like you. If the judge doesn't like the administration then maybe they do like you. But if the judge doesn't like you for the same reason the administration doesn't like you then you're going to jail. And it shouldn't have to depend on that; we shouldn't have laws that people are constantly in technical violation of so that the only thing keeping anyone out of jail is prosecutorial discretion and judicial affinity.
Meanwhile you can characterize anything in a negative light. A random home kitchen typically isn't going to meet the standards for commercial operation and the prosecutor's press release isn't going to say "we're prosecuting our enemies for movie night", it's going to say "defendants were operating a for-profit restaurant in violation of zoning rules and storing uncooked meat above fish in the freezer used for storing food sold for resale in violation of the health code" and then stick them with a fine that would make them lose their house.
> Moreover if you look at how authoritarian regimes work in practice, dissents are often prosecuted under national security laws, campaign finance violations, or libel laws, not because they violated the health code by having a movie night.
When the dictator of petrolistan wants to retaliate against their enemies and those laws are available for that, sure.
When the mayor of some US town wants to do the same thing, they might very well resort to health code violations that wouldn't have otherwise been enforced.
Deterrents well short of political executions are still very much official misconduct.
Dissidents are most often prosecuted under those laws, yes, which is a good reason to not have those laws. But I’m aware of at least one case where a Cuban dissident was apprehended and prosecuted for buying cement in the black market, something the government was able to know because they most likely had somebody tagging the person 24/7 [^1]
But that exotic case is not that much needed. Laws will be abused by the powers whenever they want; you don’t need to look farther than the current USA administration and how the president is using war powers to treat poor laborers as enemy combatants and send them to concentration camps. And yet, USA’s system of government was designed in a way that should have prevented the executive to abuse power; why it has failed is another (difficult) discussion, but the founding fathers seemed well acquainted with the despotism of other nations.
[^1]: https://www.rtve.es/noticias/20090828/cuba-detiene-a-disiden...
> Sounds like the bigger issue is that you're able to get "spatiotemporal" data in the first place?
Almost all data is spatiotemporal data, people just aren't used to thinking about it like that. Everything that "happens" is an event with associated times and places.
Tagging of events with spatiotemporal attributes, or with metadata that can be used to infer spatiotemporal attributes, is pervasive. Every system data passes through, even if not the creator of it, observes the event of the data passing through it. Event observation is not trying to track things but it implicitly and necessarily creates the data that makes tracking and spatiotemporal inference possible.
These kinds of analyses rely almost entirely on knowing the events occurred; you could encrypt the contents of the data and it wouldn't matter. Software leaks spatiotemporal event context everywhere across myriad systems, internal and external, that incidentally collect it. There isn't anything nefarious about most of it and much of it is required for reasons of criminal and civil liability.
What people underestimate is that you can analytically stitch together many unrelated sparse data sources with spatiotemporal attributes, many of which are quite crap or seemingly unfit for purpose, to reconstruct a dense high-quality graph. Counter-intuitively, diverse and seemingly irrelevant data sources often produce better data models. It surfaces bias, errors, manipulation, and processing artifacts in individual sources you might otherwise miss.
It is much more difficult to access the obvious first-party data sources than it used to be, mostly because people with that data are far more selective about who they give access. It doesn't really matter, that is a speed bump for the unsophisticated. The exponential growth in the scale and diversity of network-connected telemetry of all types pretty much guarantees these data models will always be constructible.
The historical limiter has always been the absence of data infrastructure platforms that can handle these kinds of analytics at scale.
>Tagging of events with spatiotemporal attributes, or with metadata that can be used to infer spatiotemporal attributes, is pervasive. Every system data passes through, even if not the creator of it, observes the event of the data passing through it. Event observation is not trying to track things but it implicitly and necessarily creates the data that makes tracking and spatiotemporal inference possible.
>These kinds of analyses rely almost entirely on knowing the events occurred; you could encrypt the contents of the data and it wouldn't matter. Software leaks spatiotemporal event context everywhere across myriad systems, internal and external, that incidentally collect it. There isn't anything nefarious about most of it and much of it is required for reasons of criminal and civil liability.
>What people underestimate is that you can analytically stitch together many unrelated sparse data sources with spatiotemporal attributes, many of which are quite crap or seemingly unfit for purpose, to reconstruct a dense high-quality graph. Counter-intuitively, diverse and seemingly irrelevant data sources often produce better data models. It surfaces bias, errors, manipulation, and processing artifacts in individual sources you might otherwise miss.
That's a lot of technobabble for what essentially sounds like "there's some ad SDK that's phoning home with your gps/ip geolocation every few minutes, if you cross reference that with when flights are, you can guess what flight someone took". How far off am I? Or is there some galaxy brained AI that can infer that from disparate facts like that you stopped posting on twitter for 12 hours, your car's license plate was caught by an ALPR to be heading towards the airport, and 3 weeks ago you visited some portuguese tourism site that had an ad beacon installed?
> Sounds like the bigger issue is that you're able to get "spatiotemporal" data in the first place?
Yeah, this just sounds like it's written from the perspective of a data broker.
Tying particular ad analytics (presumably ip geolocation?) to thousands of particular individuals and having it well populated enough to track them is "privileged first-party data access" by another name.
Your location is leaked in many, many ways. Even if you have location services off on your phone, the first-party (Google, Apple) has access to your precise location. On Android, this bypasses VPNs, and I believe on iOS/Mac first-party apps also bypass VPNs. You are trusting that this data is not leaked to any third-parties. You cannot verify this, as the data is exfiltrated to servers which you can't verify.
Okay, fine, I'll just install another operating system then, like KDE plasma mobile or GrapheneOS. Your location is still leaked 24/7. This is because your cellular modem has it's own operating system, running underneath your phone's operating system, which is triangulating your location at all times. Once again, you are trusting that telecommunications companies aren't misusing this - but please remember they're complied, by law, to make a lot of this information available to numerous third parties.
Okay fine, let me just remove the Sim then and use my phone on Wifi only, always through a VPN. Your location is still being leaked potentially, for example, by your car. Your car also has a cellular modem which leaks your location, and you probably signed a contract allowing that data to be given to hundreds of third-parties.
Of course, all of this is assuming you don't use any social media. Social media can also leak your location, even without location services. If you review a restaurant - that's your location. Where are your friends? You're probably around them. And on and on.
> Your location is still being leaked potentially, for example, by your car. Your car also has a cellular modem which leaks your location, and you probably signed a contract allowing that data to be given to hundreds of third-parties.
Ok, fine. I'll just drive classic cars for the rest of my life. Your location is still being leaked by a global network of automated license plate reading cameras https://deflock.me/
>On Android, this bypasses VPNs
source?
>You are trusting that this data is not leaked to any third-parties. You cannot verify this, as the data is exfiltrated to servers which you can't verify.
At least on Android you can theoretically disable "google location accuracy" which stops it sending nearby hotspot mac addresses to Google. That's the only public route where google gets your location without you knowingly sending to it. You also imply that mobile operating systems are surreptitiously sending locations back to google/apple even if users have all location related features disabled, but I'm not aware of any evidence this is the case, and this falls into same category as "facebook is secretly listening to you" territory until proven otherwise.
I mean you're saying a lot for rhetorical effect, but it doesn't get around the fact that there aren't that many avenues to reliably collect this data, with high enough resolution and tied to identity, for thousands/millions of individuals, and if you do have that data, you're basically a data broker. I mean, yes, all those things are true, and they're pooled together and available for sale by data brokers.
It's also disappointing that the root comment is distracting from the 4th amendment violations by making the conversation about their vague claims of selling mini-palantir demos through abusing web ads.
The assumption that the data must be "high resolution" is erroneous. Low resolution noisy data works just fine, you just need a lot more of it. You can use standard signal processing tricks such as stacking noisy low-resolution data to extract high-resolution features. This requires a lot more processing but that isn't much of a limitation. These reconstruction techniques work even if the data is from unrelated sources that aren't even trying to measure the thing you are measuring.
Any data exhaust will work, people have created interesting PoCs leveraging things like HVAC data, RF attenuation, etc. High-precision weather models essentially work the same way, making inferences by stitching together diverse event data that has nothing to do with weather.
High-quality high-resolution data sources largely don't exist in the way people imagine they do, so you need to do this anyway. If you have a high-resolution spatiotemporal graph for entities, tying it to identity is always trivial.
It would be more common if it weren't for the fact that open source platforms scale poorly for this type of analytical processing.
Anyone could have acquired this data the time, it was all either free or cheap. Like I said, my business was specialized data infrastructure (e.g. storage engines and analytical parallel processing), we just used these data sources for testing and demos because "free or cheap".
I also have a lot of experience with privileged first-party data but that is governed by a different set of rules and is often regulated. You have to be much more circumspect about how you use it.
Even though it might be convenient to e.g. slurp telemetry off a mobile carrier's backbone, what you eventually realize is the inability to do this isn't a real limitation and in some ways is a blessing in disguise.
Twitter has had timestamped amd geotagged posts for ages. Just clustering things like hashtags of tweets spatiotemporally results in a treasure trove if information about events.
I'm sure that other platforms attach the same kind of info to posts. It's just a matter of scraping it.
but it's obviously very easy ro get from social media? e.g. you have a post from paris and then later that day a post from brussels
There's dozens of flights per day from paris to brussels, so that wouldn't uniquely identify a flight.
The thing that really tickles me is that there's supposedly all this frightening information that can be gathered on people, including by investigating the history of ads they were served; but then in the vast majority of cases the only use the Bad Guys ever seem to come up with for that information is to serve you more ads.
Where do you get "maintenance IoT data from jet engines"?
Exactly. This does not pass the sniff test.
Indeed, seems like it's way easier to just got the databroker route.
Presumably ICE is trying to determine what cities / countries a person has visited and when, ie your starting point.
Can you eli5 the implementation and how your prototype worked?
Sounds like if you have a record of a lot of location/timestamp data for people, you look at the distance difference divided by the time difference. Now you have average speed for any pair of points. Now filter where the average speed is as fast as a Boeing jet. That filters out most of the data except for people who are almost certainly on a plane. Et voila, you now look at those data points geolocation and you have people who traveled from one city to another because you already have the location. Compare City1 -> City2 with any public flights in those cities around those times and you know who flew on what flight from where to where and at what time.
I'm more interested in this part:
> you have a record of a lot of location/timestamp data for people
What is the source of that data?
from the parent post: `social media and/or ad data`
So if you have ad impression data you have IP geolocation, or maybe better, along with the timestamp. Similarly for socials sometimes you get location metadata, and with image uploads you can can get location metadata (though today these are often stripped, historically they weren't).
> People don't grasp how easy it is to build data models like this even without privileged first-party data access.
People on this site probably understand this better than 99% of the world.
The problem is "What can I, as an individual, do about it?"
block ads, stay off most social media, don't use mobile devices while traveling
> don't use mobile devices while traveling
Some airlines don't even allow you to check in without using their app, unless you are willing to pay for a fee.
You can also exploit it for personal profit. As for stopping it, good luck. Best case is probably to degrade or poison data sources in a preferably legal way.
What was your accuracy rate for this? I imagine it was quite high, but do you happen to remember what your +/- was?
Honestly asking, How did you validate your results?
In this particular case it was just a proof-of-concept, albeit at scale. We did not run a proper ground-truthing process but people actually running that type of data model in production could have ground-truthed the analytic model if they wanted to.
However, it turns out that thousands of people like to talk about their flights on social media, so we scraped that as a spot check and it mostly lined up perfectly. Good enough for a demo and it would have been difficult to come up with an alternative explanation for the patterns in the data.
The purpose of the PoC was to sell the data analysis infrastructure that made that type analysis possible at scale, it wasn't about the data per se. It was a compelling demo we invented given the data that happened to be available. Startup life.
> Good enough for a demo and it would have been difficult to come up with an alternative explanation for the patterns in the data.
For fun edge cases, there's always Antarctica, where you can travel from a US base (which looks like you're in the US) to a NZ base (which looks like you're in NZ) in a couple of minutes: https://brr.fyi/posts/credit-card-shenanigans
i don't have any special knowledge in this area, but just thinking about it idly while sitting here, "robbing their homes while they are away" comes to mind as a good proxy.
Reminds me of this news story of footballer John Terry who's house was robbed because he posted a picture of him on holiday. The insurance company tried to use a 'reasonable care' clause of home insurance to deny his insurance claim.
- https://www.blakefire-security.co.uk/blog/social-media-and-j...
>The insurance company tried to use a 'reasonable care' clause of home insurance to deny his insurance claim.
>- https://www.blakefire-security.co.uk/blog/social-media-and-j...
FYI the source you posted never claimed that John Terry's insurance tried to deny the claim, only mentioning that "some" insurance companies warn of it. However even that claim is questionable, because it isn't even from an insurance company, it's from a content marketing piece by an insurance comparison website.
Wouldn’t that mean all celebrities are uninsurable? If politician/singer/athlete has a public away event, there is little they can do to obscure that fact.
Their policy could require a housesitter or security guard on those occasions, or some other risk countermeasure like an alarm system.
That seems like a risk, but not a validation method, unless you are feeling particularly bold.
Basically a plot line on the show “Black List”. Had an inside guy at the post office who would forward people stopping mail delivery on vacation. Then used homes as safe houses.
Great info
[dead]
It's funny to see ARC just being described as a "data broker," which strongly implies that it doesn't play a role in facilitating the actual underlying consumer activity.
ARC and IATA absolutely do play such a role, as the financial clearinghouses for ensuring that travel agents (online and offline) and airlines can pay each other, and as gatekeepers/certification bodies for agencies to ensure these financial systems aren't abused.
Now, they absolutely do sell access to data to third parties, governmental and nongovernmental. But the reason they have this data isn't because they buy it to resell it; they are fully part of the funds flow for the underlying transaction. Whether they should be allowed to sell or share non-anonymized data on passenger records and prices paid is a very good question, but at the very least this is about as first-party as data gets.
https://www.altexsoft.com/blog/airline-reporting-corporation... describes some of these flows. (Here be dragons.)
this counters none of the points covered in the article
...nor is it meant to?
Two things can be true simultaneously: (a) it is worrisome that a company is selling PII at scale to government entities who would otherwise need to request that data through accountable warrant processes, and (b) we shouldn't call every such company a "data broker" lest we dilute the specificity of that term, particularly when the companies in question participate in the funds flow of the customer transaction.
[dead]
The amount and extent of data that is available out there by brokers for purchase by literally any company is *mind-boggling*. However bad you think it is, multiply that by 10.
A colleague created a banner ad that was an image that had the text “told you I could do this mate!” and targeted an individual to prove a point.
The general public have no idea how much ad providers and data brokers know about them.
Seems just like retargeting in that case. Ask “victim” to visit page A. On that page A place a retargeting pixel, then now everywhere on the Internet you can display a message for that user as long as you are willing to pay a high price for that impression (high price is way way way less than 0.1 USD)
Reminds me of the time when Signal(the private messaging app) once tried to get ad data from Facebook and show it to users with a high degree of specificity eg “You got this ad because you’re a middle aged woman who enjoys kpop and loves reading about Christopher Nolan”
Relevant article: http://archive.today/fzUL4
Around 2014 I worked with recruiters and they had a tool that aggregated data on everyone through LinkedIn, yelp, twitter, GitHub, eventbrite, etc. it was breathtaking the amount of information you could get on anyone, over 10+ years ago.
I’m guessing with the help of Palantir, the government has even more data and can probably link Reddit posts etc based on styleometry and can even perform psychological analysis on your personality and tendencies, etc.
The government has been buying and funding R&D with data brokers since before Google existed.
> it was breathtaking the amount of information you could get on anyone, over 10+ years ago.
After being burnt by things taken from my social media out of context, used to publicly shame me, I locked down my social media
Am I "sweetly naive" to think that had an effect? I do think it did
Before I stopped using Facebook I noticed, over the last decade, that almost every account I encountered was locked down similarly
My point is I suspect it is getting harder, not easier, for data thieves. The golden age of data theft has passed. Maybe.
I work in this space - I'd say 1000x.
Could you elaborate with specifics? If it's this bad, why haven't we heard anything from a whistleblower or seen a good demo?
Because none of it is really unknown? People know about it and don't care. Hell, even people on this forum that should know better and care that don't, or think when they hear about stuff like this it's FB pixel or google analytics stuff. The simple fact is with a few basic pieces of information on somebody, there's almost nothing that is sacred or not for sale. People mistakenly believe they're protected by adblockers and stuff, or by avoiding social media, but the simple fact is that it is unavoidable while simply existing and the 1000x comment is from my POV the scale of it is astounding and growing every year and people really don't have a good understanding of the subtle and not subtle ways it can affect you, or when told, don't care/dismiss it. So I don't really feel anymore like explaining it. If more people understood, I'd also stand to profit quite a bit from it, so that's where my frustrated tone is coming from.
I'm pretty sure it was over when we switched to debit/credit cards. Everywhere you go, how much you buy, all that stuff has been sold for quite a while now.
People voluntarily used loyalty cards well before then.
I remember when loyalty cards first came to England. There were consumer rights shows on TV devoting entire episodes to the evils of their spying.
It’s amazing how much worse things have gotten, yet how people seem to care less now than they used to.
I wonder if it’s just consumers being so overwhelmed by their lack of control that they’ve become apathetic to the problem as a whole.
No, it was before this, with phone lines and wiretapping because forcibly allowed by law. As soon as we said "okay, you're allowed to record stuff if it's for a good purpose", it was over.
cash is tracked as well, it's been over for a long time. each bill has a serial # and it gets scanned going in and out of the bank. Yes, it's still marginally easier to launder cash but if you just take it out of the ATM and spend it at a store it'll get tracked accurately
I don't think this is as accurate as you are making out. Wawa (a connivence store in the Philly area) isn't tracking each $10 that goes in and out of the register. It could float all over the city before hitting a bank, and even then banks typically track serial numbers for large demonizations and we when there's a suspicion of illegal activity. Happy to learn more about this if I have it wrong.
> demonizations
denominations, perhaps?
How would one find out what data brokers knew from their cash purchases?
Do banks sell this information? This bill was pulled from this ATM in Georgia by one Claudius McMoneyhands, and then deposited by one CashMoneyBusiness LLC in South Carolina three weeks later
Seems like there could still be intermediaries and a lack of what you actually bought with it at least?
Oh boy, don't give them any more ideas. This would work.
Grocery store lets you draw $200 cashback out of their register.
My favorite example is the story about a data broker who, the day after 9/11 happened went from the name "Muhammad" to a list of ~1K people which included 1 out of 4 of the 9/11 terrorists.
https://www.nytimes.com/2023/09/22/magazine/hank-asher-data....
Thanks for your perspective.
I'm aware that using adblockers and avoiding social media doesn't entirely prevent tracking, shadow profiles, and such, but surely it makes things more difficult for these companies, no? Or would you say that there's practically no difference between making an effort to preserve one's privacy and just giving up entirely?
> the subtle and not subtle ways it can affect you
In Manufacturing Consent they measured column inches in the NYT-- IIRC it was something like measuring the total that support the relevant U.S. administration's official position on given policy vs. inches that went against the gov't position. In any case, they were measuring column inches.
What were you measuring to come to your conclusion?
I don't really understand the point of this comment.
I really don't think they "know". They have an idea. But they really don't understand any sort of extent or implication.
If the FTC could do anything here to make this situation better, it would be to give every person access to any data about them that gets sold.
I could give you some great horror stories, but honestly I don't see the benefit in either potentially harming former coworkers of mine that still work at those places or ending myself in some sort of career/legal trouble for something people generally don't care about (other than a few points on HN).
If you were caught demoing something both horrific and internal you would risk serious damage to your career, and ultimately will have zero impact on the industry as there's just too much data out there and too much money wrapped up in it.
Plus, most people working with the data don't bother to look at it. The places I've internally demo'd massive privacy risks were shocked because they didn't realize what their own data was capable of. Most people are just writing jobs that run and shuffle data around from one place to another never really asking "what is this data?" Even among data scientists I'm routinely surprised (so maybe I shouldn't be surprised) how frequently data scientist never do any real error analysis by looking at what the model got wrong and trying to understand why.
We hear about it all the time but no one cares.
I guess you were just distracted by all of the other house-on-fire crap going on.
https://therecord.media/ftc-complaint-against-kochava-unseal...
Among the additional information Kochava collects and sells are non-anonymized individual home addresses, phone numbers, email addresses, gender, age, ethnicity, yearly income, “economic stability,” marital status, education level, political affiliation and “interests and behaviors,” compiling and selling dossiers on individuals marketed as offering a “360-degree perspective,” the FTC said.
...
According to the FTC, Kochava’s data can identify women who visit reproductive clinics by name and address along with, for example, when they visit particular buildings, their names, email and home addresses, number of children, race and app usage.
...
Kochava marketing materials tell customers it offers “rich geo data spanning billions of devices globally” and that its location data feed “delivers raw latitude/longitude data with volumes around 94B+ geo-transactions per month, 125 million monthly active users, and 35 million daily active users, on average observing more than 90 daily transactions per device.”
...
The complaint also alleges that the company has lax procedures for determining who it is selling data to, saying purchasers are allowed to use a generic personal email address, label an alleged company as “self” and explain they plan to use the data for “business.”
And then there's this: https://therecord.media/data-brokers-are-selling-military-se...
I was on a team of about 25 involved in pitching a particularly large deal to a public sector client (think US state/local governments). The audience was about 50 people from different departments and agencies throughout the state and our pitch team consisted of about 6-8 very big shots + me the computer nerd. During our prep and rehearsals a "look book" was distributed which consisted of write ups on each person expected to be in the audience. It was very detailed with a career and education history of each person, a personality analysis, where their interests/passions lie both at work and personally, and what topics and key points set them off. The deck was very professional and not something thrown together, i was impressed but a little taken aback too.
Cuz it's not really unknown nor is it illegal.
I know someone who bought the address of everyone with a specific first name.
> nor is it illegal
Where I live it is.
I simply don't believe you that all data brokers are completely and entirely illegal where you live.
Anyway to combat it or stop your info from being overly harvested?
I asked this same thing in another comment here, but since you mention working in this space, I ask you directly. Where do the brokers obtain their data from? If it's easy for them to obtain, would those who buy it from brokers not be able to simply get it from its respective sources? I'm genuinely curious about how this dynamic works.
what are some good cheap sources to get this? i have an art project idea that i've wanted make that would require invasive data profiles, but it's very big project and i have no idea where to start
I would say that in general the HN crowd doesn't understand the industry at all, and they need to change the direction of their understanding, rather than the magnitude. Your basic hackernews believes that e.g. Google is out there selling all your personal information. But compared to these other industries the tech industry is almost airtight. It has long been possible for someone to pick up the phone and order, in any format they want, transaction data as narrowly targeted as they wish. Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town? By end of day.
This is correct; what people fundamentally misunderstand is that data brokers directly sell personal information about people, but Google and Facebook only allow for targeted advertising while keeping personal information within the confines of their company.
This isn't misunderstood, just not relevant. Google sells to a funnel that plays a numbers game, not for individuals to be targeted.
The meta-conspiracy-theory would be that the dossier industry whips up conspiracy theories about online advertisers in order to maintain their own low profile.
I think the HN crowd is especially vocal about the tech industry in particular because that's the industry a lot of us have first-hand knowledge of - we know from personal observation that it is anything but airtight
It has been truly frustrating when people will blame the "tech industry" for what is essentially reckless behavior from other industries. For a while, it was often the finance sector that did most of the crazy stuff. With crypto being an obnoxious overlap of the two.
It has been truly frustrating when arms dealers are punished for what is essentially reckless behavior from warlords, dictators, and drug cartels.
Data brokers are the OG tech industry. They've been around since the late 60s selling consumer data. Just because it's unsexy data storage and query work doesn't make it less tech.
I mean, somewhat fair. But when people decry "big tech," they aren't talking about these companies.
I'm also surprised that this is so hidden from everyone. Where are the engineers leaking secrets? Much of the online discourse is pure speculation based on what can be observed from the very end of the chain. (ie, what your computer is giving up) The speculation is not necessarily _incorrect_ but is too vague to be useful to anyone. Where does my data _actually_ go? Does anyone know? Can anyone describe the life of my data as it goes through the whole ecosystem? Does anyone know what mitigations are, and are not effective?
Because what's the headline you're going to get out of it?
If the headline is "Mark Zuckerberg is amassing your data and you know it's for evil", it's an easy sell. If it's "there's an ecosystem of little-known companies that sell transaction, location and lifestyle data to marketers, journalists, PIs, and police departments alike", it's not exactly the kind of a message that spurs people to action. And yeah, the newspaper that would be breaking the news is a customer too.
Despite being near universally hated externally, data brokering is a boring industry and is seen as very mundane and routine. They don't attract the type of engineers that have a strong moral stance and will go rogue and blow the whistle. They attract the middle age suburbanite just trying to get through the day and make a living.
Is that actually possible? Can we do a live test here?
Let's say we want this dataset: Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town
How much do I have to pay you to get it?
How much you got?
Never ask a sales person how much yo have to pay when the prices are not already clearly stated. Tell them how much you are willing to spend to see if they will do it for that amount. Sales people will always shoot high hoping to not leave money on the table. The price might change depending on how much you squeal and how high they shot. Your initial "willing to spend" should also be lower than you're actually willing to spend for the same but converse reason
Ok, so nobody here knows directly of any case where such data has been purchased, or vaguely similar, and we have no pricing information whatsoever available, but we are somehow completely knowledgeable about it being possible and how to do it? That sounds unlikely.
The supposedly in-the-know responses here are full of bravado but not much other than "trust me, bro"
https://news.ycombinator.com/item?id=44565878
Yea, you know everything, don't you.
Wow the Transunion business site, that really proves it huh.
Experian is known to sell the data they have. Why is this even in question? If I provide you Experian's website, you would give the same BS response?
Let me google this for you...
https://duckduckgo.com/?q=how+to+buy+data+from+a+data+broker...
The conversation was for buying transaction data from specific people, something that many seem to insist is easy and cheap and doable. Meanwhile if you actually read the responses to that search you smugly cited you'll find that no one seems to know how to actually do anything remotely like this. Yes this data is definitely harvested and it seems like you should be able to buy it in bulk from someone somewhere, but again no one seems to know where or how much or what the purchase minimum would be etc.
Yeah people fail to provide examples but continue to be doomers about how easy it is.
Been busy, but since you seem to be unable to find any body by searching on your own for the past 6 hours, here's something I found with a quick little search.
https://datarade.ai/data-categories/food-grocery-transaction...
Have we really lost the ability to use search functionality??
Of course people do. 5 seconds spent doing the most sparse-ass research will help you find plenty of stuff. If people don't respond, I imagine, for fear of 1) outing the specific area they work in, or 2) realizing these kinds of comments aren't generally acting in good faith so it is generally a complete waste of time.
I'll waste my own time and give a trivial example just off the top of my head. Go peruse some of the products offered on this page, put on your thinking cap or even look into them further and imagine what kind of data those services provide, where it likely comes from, and where it is sold to, and you'll be well on your way - and those are just the ones that are advertised openly.
https://www.transunion.com/business
Pretty much every one of the big players people typically associate with other areas such as personal credit have some feet in this space somewhere. Then theres the hundreds of lesser-known fly-by-night guys that have their own DB's they build off of mostly what is the same data, but correlated in different ways and sold to different audiences.
There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences. The fact you personally have never come across it, or are saying you aren't, is only a data point that is interesting to you, and no one else that actually knows what they are talking about in this space. Hope this post helps you somehow.
Literally all anyone is asking for is one single concrete example of a site where you can roll up and buy personal information.
>There are many, many services offering data-for-sale on practically anything to practically anyone. I heard of one recently claiming it can reliably determine someone's porn preferences.
Okay but then why not name at least a couple such services. Also, if the tech industry isn't selling data to them, where do they obtain it? Again, I see lots of ambiguity here, and the example link from transunion is hardly revealing of anything.
Credit card companies are known to sell data. https://www.cbsnews.com/news/mastercard-credit-card-customer...
Mobile service providers are known to have sold data. https://www.fcc.gov/document/fcc-fines-largest-wireless-carr...
Auto makers are known to sell data. https://www.caranddriver.com/news/a61711288/automakers-sold-...
You act like it doesn't happen, yet time and time again we learn about companies selling whatever data they can collect.
I can't believe we are still questioning this fact
What else do you need to know?
I think you misunderstand. I'm not doubting that it happens widely and pervasively. It's evident that this is the case. I just requested examples based on some of the very specific claims made here despite many ambiguities in how they were phrased.
Anyhow, thanks for taking the time to include some links.
For the most part, readers here are against it. Just because someone doesn’t know how to do it does not mean it is not doable. If it were not doable, these companies would not exist. I’ve already spent more time than I care on the topic. So if you want to think that people are collecting the data and not selling it to interested parties, the, boy, I don’t know. You can only lead hostess to water, but you can’t make it drink.
But what type of range are we talking? Tens, hundreds, thousands?
It could also mean that if you have to ask... or the first rule of data brokering...
Seems like the first thing to do would be to get an account with one of these data brokers. I'd imagine most of these places are "contact us for pricing" so they can play used car salesman games
Or, you could ask John Oliver to do it for you and then tell all of us on one of his episodes exactly how in depth it could get. They have the money to do this, and it seems like something right in his team's wheel house
If you need John Oliver to do it maybe it's not such a big problem? If no one here is able to provide a single concrete example, maybe it's not real?
John Oliver likes to spend HBO's money to do things others can't do while entertaining the rest of us. I'm not spending my money on something to prove what is known as possible for you. At this point, even with receipts, you're coming across as someone that would argue that grass is not green, or water isn't wet, and fire isn't hot.
Just because someone doesn't answer your belligerent questions does not mean it's not possible. It probably means that the people that are doing this with first hand knowledge have too much to do than trying to convert doubting Thomas over here.
All of this started because in response to an extremely concrete question, what's the cost of transaction data for a tightly constrained population, you replied with a smug non-answer about the greed of salespeople. These questions only got "belligerent" because every single answer has been nonsense insisting that it's super easy and cheap but also I couldn't possibly name a single site where this data is sold or provide even an order of magnitude of cost. Or maybe now it requires HBO levels of funding, who knows.
I offered sage advice on how to negotiate when you don’t know a firm price on anything whether that be data or a car or a home remodeling. If you want to say that advice was a smug answer then that’s on you. Every answer after has just gone further and further off the rails
Nah there's no way you actually watch John Oliver because that was really funny. Anyways, you mentioned earlier that we wouldn't believe you even if you posted receipts but that's actually exactly what we want to see. Like, just the name of a business, the thing that was sold, and the price.
i think it could be feasible to get an ad in front of "35-year-old dentists living on the 400 block of Elm street in local town" who has bought product X but i've never seen a transaction by transaction purchase history being for sale.
> Your basic hackernews believes that e.g. Google is out there selling all your personal information.
I think most people here understand that Google sells ads against that data, but they aren't selling the data.
Anyway to opt out of this type of data collection per company? I know for some things you can contact each individual broker and opt out (via some identifier like your email address) of your data being at least publicly available
> Your basic hackernews believes that e.g. Google is out there selling all your personal information
To add to this, any mention of "telemetry" is taken to mean your PII being taken by bad actors to abuse, instead of what it is in 99% of cases, which is usage statistics. (X% of our users use feature A, it merits investment). It can be both, but there's usually no place for differentiation, just pitchforks.
The industry betrayed consumers' trust to the point where no project can be trusted to be mindful of data anymore. Even Proton Mail ended up ratting to the French, and that was just IP and session info, so who can we even trust to get "good telemetry"?
Logs aren't telemetry and calling a response to a court order "ratting out" is exactly the kind of behavior that makes people increasingly skeptical of privacy advocates.
> Even Proton Mail ended up ratting to the French,
Answering to court orders isn't "ratting". You either answer court orders or go to prison.
Or they architect their system better so that they never collect the IP addresses to begin with. I think Privacy Pass and other things Mullvad is doing help in this area, but I am not aware of Proton working with them to implement anything like this. But Proton should do this, because it’s relevant to customers of Proton.
https://discuss.privacyguides.net/t/privacy-pass-the-new-pro...
Apparently not Privacy Pass related, will keep looking as I seem to remember that Mullvad was doing that implementation, but I may remember incorrectly.
https://discuss.privacyguides.net/t/mullvad-has-partnered-wi...
I don't think it is common to refer to server logs as "telemetry".
> It can be both, but there's usually no place for differentiation
Fool me once, shame on you. Fool me 153,927,861 times, shame on me.
The place for differentiation, the place for "oh this is probably fine", the benefit of the doubt is, of course, lost.
Because someone (you? people shaped like you?) who misuse telemetry destroyed trust.
> It can be both
should instead be "it usually is both and you the user have no way to know anyway."
> Credit card line items for 35-year-old dentists living on the 400 block of Elm street
I do not believe that. I would like evidence before I am convinced
If my bank is releasing that data I am horrified. I live in anew Zealand and our privacy laws are clear: it would be illegal
Okay, and who are these people you contact for this data, and how do they themselves obtain it so precisely? You say the big tech industry is pretty air-tight about sharing data, so how does mysterious X company have on hand the credit ratings of all those youngish dentists on Elm street, among other kinds of information? How o these dynamics work, since you seem to know it internally?
A mobile provider enters into marketing sharing agreements with credit card companies. It extracts housing information from local property and tax records. It enters into marketing sharing agreements with retailers, payment processors like ADP. Same with license plate reading companies, loan companies, banks, professional organizations, etc.
It fills its data lakes with the vectorization and down tilt data that it collects every day. It uses federated batched Hadoop tasks to join the above data lakes into one large data lake. Mid-PB in size.
Then it looks for mobile phones that travel to the 400 block at night and stay there, that are buying dentist stuff from Walmart, travel to a dentist office every workday, have an income over $120k, and are a member of the local dentist society. Maybe look for someone with dentist student loans, graduated with a dental degree.
None of those data points can identify an individual. Taken together they can ID just about anybody.
But maybe there is a chance that you ID their wife/husband. So maybe include/exclude people that regularly visit OBGYN offices.
Back in the day we could link cell numbers to credit card purchases in locations to the point of being to identify the name of the person and what they purchased and where it was purchased. For all people in a metro area that were using credit cards and physically visiting stores.
My question here is also how the brokers obtain the data themselves? Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves? Also, if the data is in any case available, the real at-fault culprits aren't so much the brokers as those who store and so easily sell it in the first instance.
> Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves?
In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I used to work with real estate data for the government and if you search for any common things you might want to know you often land on a data brokers page even though property assessor data is freely available in most counties. The problem is each county has their own system of storing data and their own process for searching it. It's a lot of work to learn how just this one dataset works, combining this for all counties in the US is a massive project.
Whenever I buy a new home I always look up all my neighbors, figure out when they bought the house, how much they paid etc. Some people get freaked out by this, but this information is public in most counties.
By joining this data with another public data set, you can actually figure out which lender your neighbors used and what their reported income at time of sale, their age and ethnic background.
Of course there are plenty of other ways data brokers come across data, but even cleaning up and joining public data can require a fair bit of time and expertise.
> In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I am a perfect example of this. Due to a bit of a quirk in how my house got its address assigned to it in 1959, we have a unique postal code. If a data broker gets access to a list of product purchases by postal code from a retailer, that's in theory somewhat anonymized. However... if they also get a list of people-postal code mappings, they have now established exactly what products my wife and I have purchased (by virtue of us being the only two people with this postal code).
Do that across multiple retailers and they've painted an incredibly vivid picture of what exactly we do with our time.
Thanks for the detailed reply! So essentially, what many of them do is scour public data sets of all kinds, cross-reference them and repackage the more complete product as their own, which people then buy simply because it's easier to get it that way, all wrapped up neatly than doing the legwork? This is the basic gist of it? As for the complex and highly specific data about individuals, they do the same thing or do they buy from still other sources? I also wonder if they buy any hacked information off the dark web.
Sellers of the data wanna deal with one or a few buyers that buy bulk. They dont wanna deal with thousands of customers.
Further, they are literally in the business of selling your data for a profit.
It should not be surprising that they are selling your data for a profit...
It's amazing to me that the market for data is so well hidden from public view. So many large companies are mining and trading data on a daily basis - you would think that a data marketplace would have been a thing by now, especially with all the noise about "decentralisation" (yes, I know, crypto shill bros).
I've been touting this as a business model for years. Better still, I'd like to see it done with behavioural models (in the open). That would really blow the lid off the industry. Imagine people charging companies, instead of simply being the product...
Is it really that hidden? In 2021, a guy went to another person's home to exact revenge for something 50 years earlier. Security video showed him holding the PeopleFinders folder. What should surprise people is their governments are selling some of the data.
Thank you for making my point.
Here's some research aided by Perplexity, which estimates that the global data market is valued at about $1.7 Trillion, with data monetization growing at about 17.6% CAGR:
https://www.perplexity.ai/search/today-i-would-like-to-try-a... (138 sources)
Also, Meta can identify you based on your movement and a few pieces of social data (all of which is in the open).
Tel Aviv airport has been running behavioural monitoring for about a decade, predicting crimes before they happen.
You mention a case from 2021, which is about $5 trillion ago, and think that the government selling data is surprising. This is mature market that already knows everything about everyone, especially in the US, and is more concerned with what to do with it. The faucet is open, the ground floor is flooded, and we're discussing the different types of fish that have moved into our apartment.
Yes! It is hidden. Go and get your data from this company. Report the results.
Just shut it down and turn it all off. Thinking of ways to profit from this behavior is perverse.
Thinking of ways to profit from it is the absolute norm but, yes, it is perverse.
I'd happily run it as a non-profit with the purpose of highlighting the value of people's data. Tough gig though, when there are all these "off switch" guys around.
I don't get it. Why would CBP and ICE need to buy this from a data broker? The TSA is right there scanning everyone's boarding pass as part of going through security.
Because there is probably a well-defined regulatory framework for accessing data collected by the TSA, whereas there are few or no requirements when the same data is purchased from a broker.
It is not even certain that the data actually comes from the TSA. It could come from airlines, payment companies, etc.
There is no guarantee of quality when purchasing data from a broker.
The regulatory angle at least explains part of my wondering. I'm not really surprised that they have access to this information, I'm just surprised that they buy it, rather than just demanding it be handed over.
Buying has no accountability, no judges. Probably not even a proper paper trail.
They're spending public money, so the cost doesn't matter to them either. With this administration they can get unlimited funding.
Probably because the tsa isn't able/allowed to hand out access willy nilly.
It's kinda like how the police need warrants to request cellphone data, but cellphone companies could sell realtime data to third parties who in turn sold it to the police.
https://news.ycombinator.com/item?id=17081684
It's fine to speculate, but I really wish the article had made it explicit given that the EFF has actual lawyers on staff.
When I worked for the federal government I wanted to collect some publicly visible tweets (this was before the Library of Congress started to harvest them, and back when the API was better). As a government employee I had to write a detailed document of: why I needed this data, what PII would be stored, how long it would be stored and how I would ensure it had been deleted. Then that document had to be approved. Even though this is a project that any person could have done on the weekend, I still had to go through all this work for approval, the collect the data.
But you're proposing something even more outlandish, asking another agency for data. The politics of this are mind bending. If one one agency give their data to another and that agency is successful using it it will make the giving agency look bad which is unacceptable. It was wild how many times another, supposedly friendly agency, would not share data. In fact, I was cautioned not to even bring up the idea in shared meetings because it would create unnecessary friction.
If you buy it from a 3rd party government contractor, none of this has to happen.
Government uses corporations to get around laws and the constitution. Corporations in turn get to use government to get around regulation. Same as it ever was.
Beyond the other reasons stated re: regulations and law, which this government seems to be more than willing to ignore, the process of setting up reliable feeds of usable data between organizational functions can be more difficult than buying the data from an entity whose profit derives from curation and distribution of the same data. It might seem absurd on the surface but paying a premium for a repackaging of the data is often meaningfully easier and more reliable and you probably save money in the end. The TSA tech teams role isn’t to package and enrich data with useful metadata, with documentation and SLAs, and their incentives don’t naturally align no matter how hard a political appointee bangs a table. The data broker has every incentive however, and will continue to in perpetuity.
At a company I once worked at, the data division of a company bought a list of their stores from us. Full polygons, visit durations, etc.
Suspects purchase a flight weeks + months before the flight. The TSA screens them just minutes before getting on.
Flight purchases would be critical and distinct information for law enforcement.
This is wrong. You need to provide your travel documentation id and they share your personal info well before you get on the plane
Sure but when is that purchase transferred to TSA ? It’s not disclosed . I agree it’s a possibility, but having the flight purchase info is higher value and more complete .
At least 24 hours before your flight when they assign pre or the dreaded SSSS status.
that's helpful i've been curious about that
leave it to hackernews to downvote the right answer. People were asking why , not "should they"
https://github.com/yaelwrites/Big-Ass-Data-Broker-Opt-Out-Li... is a useful place to start for opting out. As of this writing, this list does not include Airlines Reporting Corporation (ARC), a data broker mentioned in the article.
This could actually be interesting because in many past egregious data broker cases, the offenders had no business in the EU so they could just laugh as they were handed one 20M fine after the other (e.g. Clearview), or they were making way more than 4% of their revenue in profit from privacy violations so they could just risk the fine.
But here, the controller of the data is the airline, the transfer to the data broker might be illegal, and an airline is the worst company to commit GDPR violations with: They have a lot of global revenue but a relatively thin margin, very little of that margin comes from data abuse (so they can't just shrug off the GDPR fine as a small cost of doing shady business), and they are reachable in the EU (worst case a member state can ground and confiscate their planes, and essentially ban them from flying to the EU by threatening to confiscate any other plane that lands). And yes, Germany will impound a plane to get debts paid: https://www.reuters.com/article/world/thai-prince-to-pay-bon...
While airlines are the obvious source for such data sets , there are a number of other sources.
The barcode in the boarding pass contains all the information that airlines know about you [1]. It is after all only encoded and not encrypted and so many companies manufacture readers for it.
Airports check-in systems, or it could be from the baggage handling system , the duty free shop or the airport lounge and so on.
There are so many different players who have access to most or all of the data it would hard to prove it came any one source at all.
That is just the barcodes on the boarding pass, passport scanners are like couple of hundred dollars ans airport shops/car rentals use them all the time.
Many airports use facial scanning these days and don’t even ask for boarding pass/passport/visa during boarding at all .
There are auxiliary sources which could be used in conjunction with other sources like Uber booking and so on.
[1] https://krebsonsecurity.com/2015/10/whats-in-a-boarding-pass...
I agree that they can get the data through other means. Not so sure about
> There are so many different players who have access to most or all of the data it would hard to prove it came any one source at all.
Because a prosecutor can obtain copies of all emails talking about this, they can examine your bank accounts for payments from data brokers, they can require legal to give them copies of any contracts, they can look at audit logs from the production database and airlines aren't Evil Inc -- stuff will inevitably leak and get out. You can't cover yourself that well as a CEO looking to make a quick buck...
Little discussion 2 months ago (43+7 points, 2+3 comments) https://news.ycombinator.com/item?id=43949975 https://news.ycombinator.com/item?id=43952971
Siding the topic. Does anyone have any estimate how much does a regular company make for selling this data? I do not mean those focusing on advertising. But companies that willingly sell their customers data and habits?
How do I get access to a data broker? I'm curious what info I can get about myself and others in exchange for money.
Does anyone here have some tips how to ”opt out” from this?
It doesn't seem like you can. The airlines actually own the clearing house (ARC) that is selling the data.
You can. I emailed ARC and they complied with my request. Helps if you're in California and mention your rights. You can also opt out of them sharing your data. Any consequences to this I guess I'll find out later this year when I'm flying a lot (guessing absolutely zero).
Yes, email privacy@arccorp.com and cc legalteam@arccorp.com
You'll get a response from their legal counsel requesting some information for them to verify your request.
That's what I'm wondering - maybe a way to opt out when purchasing flights per airline?
What's the lede on this story, that data brokers are selling this data or that the purchasers are ICE/CBP?
The lede is buried, and only half said:
>>"Movement unrestricted by governments is a hallmark of a free society. "
The other half of the lede is that this govt is using Insert_Method of restricting the movements of it's residents.
At this point, any persecuted activity, e.g., obtaining reproductive healthcare with a link to a person in a Red State, requires opsec procedures comparable to a CIA dark op just to not get persecuted.
Who funded many data brokers in the first place? Lots of three letter agencies, through intermediaries. Modern phones + social media = zero cost surveillance for the big brother.
>Who funded many data brokers in the first place? Lots of three letter agencies
okay...
>zero cost surveillance for the big brother
How is it "free" if they are the ones funding the data brokers?
As far as I know there is no definitive guide for how to carry out a 'digital privacy reset' or 'digital rebirth' - but your LLM should be able to give you good instructions.
To do it properly, not only would you have to change all your logins and email accounts, but simultaneously start using a new computer and phone. Also, move home.
In other words: very hard to achieve. But I wonder if there is a set of achievable actions one can take that gets you to 'very good privacy'?
What about the records of your purchase of that new home? Do you need to again get a new bank? what about credit history
Of course, in ICE land, ditching your old identity is a disaster, because now you can't prove you're a citizen. Papers please.
An important part of data collection is dealing with edge cases. That's why I schedule all my travel with a layover in South Sudan.
Chilling things here, however - I guess that the US government in general would have access to flight info? at least those going to or travelling within the US.
what percentage of illegals travel on airplanes?
I have given up keeping my data private from the government. It’s impossible to avoid, so I signed up for Clear, etc because I know they have that information already.
Frankly, Clear and TSA-Pre makes my life so much easier and since I don’t commit crimes I’m not very worried… just a little worried.
For me its not about keeping my data private so much, more about making it harder for them to just have blanket easy access. I have a passport, precheck, global entry... they know who I am and where I go. But if I can make it just a little harder for the other gov agencies to know what I'm doing that's a win in my book.
I hate the excuse "since I don't commit crimes". It's not about that. If they want your info that you're not directly giving them, they can get a warrant.
> I don’t commit crimes
What if it affects your ability to get work? Have you ever made or viewed any posts that could be considered political or made comments on a political post? What agenda do you support with those actions?
Selling... as in my tax dollars are being wasted on this???
Data brokers listen to everything, track your movements, buying habits, internet history, apps, app usage, buying habits, etc.
Terms of service are meaningless if they keep the extent as secret as possible. Facebook has demonstrably shown this and as shocking as it is they are restrained compared to lots of companies.
Especially when you can out source the full evil to a wholly owned subsidiary for plausible deniability.
And if private corpse know something, many foreign governments know all of it.
And who are these shadowy data brokers listening to everything. Heavy on the FUD, light on any details...
People would be surprised at how cheap data is. My company is offered credit card purchases with demographics, occupation, income level, down to the zip code for what is basically pennies. We didn't buy it but that's what advertisers know about you.
If they're selling it why wouldn't you name your company here?
Why would I?
[dead]
[flagged]
Why does ICE need to know about domestic flights? They have as much right to that info as every bus ticket.
Exactly this. Let's spell it out:
Customs & Border Patrol. Immigration and Customs Enforcement. Domestic travel does not involve immigration or customs. No international borders are crossed.
Giving CBP line-item access to the movements of Americans makes as much sense as giving the SEC access to healthcare records.
There is virtually no reason at all for these organizations to have any knowledge about the movements of people inside the US. They do not get to be super-police just because they want to be.
This is the meat of the "unitary executive" legal theory. Under this theory, CBP is just a convenient name for a part of the executive branch. If any part of the executive branch has the right to (or even just isn't prohibited from using) a piece of data, then its just a matter of the president issuing the right orders so that CBP can too. The same would go for getting medical records over to the SEC. Even if he forgets to issue those orders, the Justice department (also part of the executive) obviously won't charge anyone with any kind of crime, and if they did he could pardon them or arrest the judges involved.
I happen to believe that this is all just a convoluted way to back into the fuhrerprinzip by way of originalism, but I have no power.
Either way, the US is not a liberal democracy any more. Laws do not apply to the powerful. Strength and power are the only things that matter. The Enlightment project is dead in Washington (and most statehouses) and the only question of consequence left is: What will replace it?
100%. Neatly incorporated in there is that Congress cannot make any laws to bind the executive in any way. I have a feeling that many people cheering this are going to regret it when the Executive changes hands.
If the executive branch changes hands, I would be willing to bet money that the Supreme Court will immediately do a 180 and declare all of these things as no longer powers of the executive branch.
All of the things that were perfectly OK for Trump to do will suddenly be off limits for a Democrat president.
It's also not at all clear that they intend to let the White House change hands again. One would assume that they have all the power they need to make that happen, and why wouldn't they?
I think if you're a citizen, then I agree with you. However if you're an alien (legal or not) I think they should be allowed to figure out where you are.
They absolutely should, but the onus is on them to figure out a way to do that within the confines of existing law. Existing law notably does not make ICE or CBP a super-agency that can do whatever it wants.
I think data brokers are legal. I think purchasing data is legal. So what is the problem?
Arguably, because a person of interest may cross the border into the country and then travel around domestically.
There has been no credible argument why this logic is acceptable for CBP and ICE but not FBI or DEA.
Why are the misdemeanor civil violators ICE chases so important that federal laws no longer apply, when until a year ago it was generally understood that agencies like DEA had guardrails even when trying to apprehend people on felony charges?
If this stands, there is no logically consistent rationale for DEA not being able to perform warrantless wiretaps of all communications, etc.
There’s a new sheriff in town in case you hadn’t noticed.
Their job is to enforce immigration laws inside the U.S. When illegal immigrants are travelling domestically, it involves ICE.
The crime starts when the border is crossed, it doesn't end once you are illegally in the country and travelling domestically.
One can use this logic to create an omnibus surveillance apparatus covering all aspects of communications, commerce, etc.
We do not give other law enforcement similar deference, even though it might help them in some fraction of cases. For example: SEC could prosecute more insider trading if it was able to wiretap all domestic communications.
And yet Americans are subject to TSA stupidity for all domestic flights. The actual lines of when the federal government does and does not have authority are very blurred, even though I would personally argue they shouldn't have any authority on anything unless expressly granted by the Constitution.
TSA was created after 9/11, and iirc all of the planes involved were domestic flights.
but domestic flights do track with Transportation Security
Because all of the 9/11 flights were domestic. One of the duties of ICE is investigating terrorism (along wiht transnational criminal organizations).
Air travel in the United States is mamanged by Federal law, not state. This is solidly in the Federal law enforcement wheelhouse. Anything that crosses state boundries is ALSO under Federal law.
So what else is new? Have you heard about Palantir? The government literally sells (or gives) our private data to them. This should be illegal as they don't actually own this data legally as it's not covered by EULA which is generally how data brokers get around privacy violations and governments around unreasonable search and seizure.
But hey, it makes Silicon Valley money.