Home / News & Analysis / Facebook independent research commission, Social Science One, will share a petabyte of user interactions

Facebook independent research commission, Social Science One, will share a petabyte of user interactions

Back in April, Facebook announced it would be working with a group of academics to establish an independent research commission to look into issues of social and political significance using the company’s own extensive data collection. That commission just came out of stealth; it’s called Social Science One, and its first project will have researchers analyzing about a petabyte’s worth of sharing data and metadata.

The way the commission works is basically that a group of academics is created and given full access to the processes and data sets that Facebook could potentially provide. They identify and help design interesting sets based on their experience as researchers themselves, then document them publicly — for instance, a set (imaginary for now) may be described 10 million status updates taken during the week of the Brexit vote, with such and such metadata included.

This documentation describing the set doubles as a “request for proposals” from the research community. Other researchers interested in the data propose analyses or experiments, which are evaluated by commission. These proposals will be peer-reviewed with help from the Social Science Research Council. If a proposal has merit, it may be awarded funding, data, and other benefits; resulting papers can be published however the researchers wish, with no restrictions like pre-approval by Facebook or the commission.

“The data collected by private companies has vast potential to help social scientists understand and solve society’s greatest challenges. But until now that data has typically been unavailable for academic research,” said Social Science One co-founder, Harvard’s Gary King, in a blog post announcing the initiative. “Social Science One has established an ethical structure for marshaling privacy preserving industry data for the greater social good while ensuring full academic publishing freedom.”

If you’re curious about the specifics of the partnership, it’s actually been described in a paper of its own, available here. Nate Persily is the other co-chair; he and King were selected by Facebook and the foundations funding the project (listed below), who then selected the other scholars in the group.

The first data set is a juicy one: “almost all” public URLs shared and clicked by Facebook users globally, accompanied by a host of useful metadata.

It will contain “on the order of 2 million unique URLs shared in 300 million posts, per week,” reads a document describing the set. “We estimate that the data will contain on the order of 30 billion rows, translating to an effective raw size on the order of a petabyte.”

The metadata includes country, user age, device and so on, but also dozens of other items, such as “ideological affiliation bucket,” the proportion of friends versus non-friends who viewed a post, feed position, the number of total shares, clicks, likes, hearts, flags… there’s going to be quite a lot to sort through. Naturally all this is carefully pruned to protect user privacy — this is a proper research data set, not a Cambridge Analytica-style catch-all siphoned from the service.

In a call accompanying the announcement, King explained that the commission had much more data coming down the pipeline, with a focus on disinformation, polarization, election integrity, political advertising and civic engagement.

“It really does get at some of the fundamental questions of social media and democracy,” King said on the call.

The other sets are in various stages of completeness or permission: post-election survey participants in Mexico and elsewhere are being asked if their responses can be connected with their Facebook profiles; the political ad archive will be formally made available; they’re working on something with CrowdTangle; there are various partnerships with other researchers and institutions around the world.

A “continuous feed of all public posts on Facebook and Instagram” and “a large random sample of Facebook newsfeeds” are also under consideration, probably encountering serious scrutiny and caveats from the company.

Of course, quality research must be paid for, and it would be irresponsible not to note that Social Science One is funded not by Facebook but by a number of foundations: the Laura and John Arnold Foundation, The Democracy Fund, The William and Flora Hewlett Foundation, The John S. and James L. Knight Foundation, The Charles Koch Foundation, Omidyar Network’s Tech and Society Solutions Lab and The Alfred P. Sloan Foundation.

You can keep up with the organization’s work here; it really is a promising endeavor and will almost certainly produce some interesting science — though not for some time. We’ll keep an eye out for any research emerging from the partnership.

Update: The original headline described the dataset as “user data,” which I don’t think is inaccurate, but the organization’s suggested description of it as “URL data” is, I think, inadequate. I’ve settled for “user interactions,” since that’s more what the dataset is focused on anyway. I also made some slight changes to reflect that the SSRC reviews the proposals, not the papers, and to add the selection process for the co-chairs and other academics.

Read more

Check Also

Twitter is holding off on fixing verification policy to focus on election integrity

Twitter is pausing its work on overhauling its verification process, which provides a blue checkmark to public figures, in favor of election integrity, Twitter product lead Kayvon Beykpour tweeted today. That’s because, as we approach another election season, “updating our verification program isn’t a top priority for us right now (election integrity is),” he wrote on Twitter this afternoon. Last November, Twitter paused its account verifications as it tried to figure out a way to address confusion around what it means to be verified. That decision came shortly after people criticized Twitter for having verified the account of Jason Keller, the person who organized the deadly white supremacist rally in Charlottesville, Virginia. Fast forward to today, and Twitter still verifies accounts “ad hoc when we think it serves the public conversation & is in line with our policy,” Beykpour wrote. “But this has led to frustration b/c our process remains opaque & inconsistent with our intented [sic] pause.” While Twitter recognizes its job isn’t done, the company is not prioritizing the work at this time — at least for the next few weeks, he said. In an email addressed to Twitter’s health leadership team last week, Beykpour said his team simply doesn’t have the bandwidth to focus on verification “without coming at the cost of other priorities and distracting the team.” The highest priority, Beykpour said, is election integrity. Specifically, Twitter’s team will be looking at the product “with a specific lens towards the upcoming elections and some of the ‘election integrity’ workstreams we’ve discussed.” Once that’s done “after ~4 weeks,” he said, the product team will be in a better place to address verification. We've heard some questions recently about the status of Verification on Twitter, so wanted to address directly. Updating our verification program isn’t a top priority for us right now (election integrity is). Here’s some history & context, and how we plan to put it on our roadmap — Kayvon Beykpour (@kayvz) July 17, 2018

Leave a Reply

Your email address will not be published. Required fields are marked *

Disclaimer: Trading in bitcoins or other digital currencies carries a high level of risk and can result in the total loss of the invested capital. theonlinetech.org does not provide investment advice, but only reflects its own opinion. Please ensure that if you trade or invest in bitcoins or other digital currencies (for example, investing in cloud mining services) you fully understand the risks involved! Please also note that some external links are affiliate links.