Home / News & Analysis / Facebook independent research commission, Social Science One, will share a petabyte of user data

Facebook independent research commission, Social Science One, will share a petabyte of user data

Back in April, Facebook announced it would be working with a group of academics to establish an independent research commission to look into issues of social and political significance using the company’s own extensive data collection. That commission just came out of stealth; it’s called Social Science One, and its first project will have researchers analyzing about a petabyte’s worth of sharing data.

The way the commission works is basically that a group of academics is created and given full access to the processes and data sets that Facebook could potentially provide. They identify and help design interesting sets based on their experience as researchers themselves, then document them publicly — for instance, “this data set consists of 10 million status updates taken during the week of the Brexit vote, structured in such and such a way.”

This documentation describing the set doubles as a “request for proposals” from the research community. Other researchers interested in the data propose analyses or experiments, which are evaluated by commission. These proposals are then granted (according to their merit) access to the data, funding and other privileges. Resulting papers will be peer-reviewed with help from the Social Science Research Council, and can be published without being approved (or even seen) by Facebook.

“The data collected by private companies has vast potential to help social scientists understand and solve society’s greatest challenges. But until now that data has typically been unavailable for academic research,” said Social Science One co-founder, Harvard’s Gary King, in a blog post announcing the initiative. “Social Science One has established an ethical structure for marshaling privacy preserving industry data for the greater social good while ensuring full academic publishing freedom.”

If you’re curious about the specifics of the partnership, it’s actually been described in a paper of its own, available here.

The first data set is a juicy one: “almost all” public URLs shared and clicked by Facebook users globally, accompanied by a host of useful metadata.

It will contain “on the order of 2 million unique URLs shared in 300 million posts, per week,” reads a document describing the set. “We estimate that the data will contain on the order of 30 billion rows, translating to an effective raw size on the order of a petabyte.”

The metadata includes country, user age, device and so on, but also dozens of other items, such as “ideological affiliation bucket,” the proportion of friends versus non-friends who viewed a post, feed position, the number of total shares, clicks, likes, hearts, flags… there’s going to be quite a lot to sort through. Naturally all this is carefully pruned to protect user privacy — this is a proper research data set, not a Cambridge Analytica-style catch-all siphoned from the service.

In a call accompanying the announcement, King explained that the commission had much more data coming down the pipeline, with a focus on disinformation, polarization, election integrity, political advertising and civic engagement.

“It really does get at some of the fundamental questions of social media and democracy,” King said on the call.

The other sets are in various stages of completeness or permission: post-election survey participants in Mexico and elsewhere are being asked if their responses can be connected with their Facebook profiles; the political ad archive will be formally made available; they’re working on something with CrowdTangle; there are various partnerships with other researchers and institutions around the world.

A “continuous feed of all public posts on Facebook and Instagram” and “a large random sample of Facebook newsfeeds” are also under consideration, probably encountering serious scrutiny and caveats from the company.

Of course, quality research must be paid for, and it would be irresponsible not to note that Social Science One is funded not by Facebook but by a number of foundations: the Laura and John Arnold Foundation, The Democracy Fund, The William and Flora Hewlett Foundation, The John S. and James L. Knight Foundation, The Charles Koch Foundation, Omidyar Network’s Tech and Society Solutions Lab and The Alfred P. Sloan Foundation.

You can keep up with the organization’s work here; it really is a promising endeavor and will almost certainly produce some interesting science — though not for some time. We’ll keep an eye out for any research emerging from the partnership.

Read more

Check Also

LinkedIn Learning now includes 3rd party content and Q&A interactive features

LinkedIn, the Microsoft-owned social network for the working world with some 580 million users, took a big step into professional development and education when it acquired Lynda.com for $1.5 billion and used it as the anchor for LinkedIn Learning. Now, with 13,000 courses on the platform, LinkedIn is announcing two new developments to get more people using the service. It will now offer videos, tutorials and courses from third parties such as Treehouse and the publishing division of Harvard Business School. And in a social twist, people who use LinkedIn learning — the students and teachers — will now be able to ask and answer questions around LinkedIn Learning sessions, as well as follow instructors on LinkedIn, and see others’ feedback on courses. Unlimited access to LinkedIn Learning comes when a person pays for LinkedIn’s Premium Career tier which costs around $30/month, or when a company takes an enterprise team subscription for the Learning service. Today, LinkedIn tells me that it has around 11,000 enterprise customers, and it doesn’t break out how much traffic is has overall on LinkedIn, but says that there has been a 64 percent growth in paid learners since the start of 2017 — number that it’s clearly looking to boost with these new features. James Raybould, the director of product for LinkedIn Learning, said that the third-party expansion will come slowly at first with a handful of partners getting access to integrate with LinkedIn Learning. Over time, this could expand to be a public API for anyone to integrate content, he added, but for now LinkedIn is doing the curating. Notably, he also said that LinkedIn itself is not planning on curtailing the amount of content it will continue to produce for Learning: it’s currently adding on average more than 70 new courses each week on average, he said. The content in this first wave of third-party providers feels like a natural extension of the Influencer-based content that LinkedIn has been running in its main newsfeed: it runs the gamut from actual courses to learn new skills in specific disciplines, to the more nebulous area of professional development. The first group includes Harvard ManageMentor (leadership development courses from Harvard Business School’s publishing arm); getAbstract (a Blinkist-style service that provides 10,000+ non-fiction book summaries plus TED talks); Big Think: 500 short-form videos on topics of the day (these are not so much ‘courses’ as they are ‘life lessons’ — subjects include organising activism and an explainer on how to end bi-partisan politics); Treehouse with courses on coding and product design skills; and Creative Live with courses and tutorials for professionals in the creative industries to improve their skills and business acumen. The fact that LinkedIn is adding in more learning material that’s a natural extension of the kind of content it already offers to users in their timelines is not the only parallel between main LinkedIn and LinkedIn Learning. Raybould said that to help users discover content that might be most interesting to them, it uses data about what users browse and click on in the regular site. “We have rich information about the network, including on engagement,” he said, and that helps LinkedIn’s algorithms suggest what to populate in individual learning libraries. This is also, presumably, one of the reasons why third parties will want to integrate: to get new audiences that are more targeted to the kind of content they are producing: “At Harvard Business Publishing, we work to create the world best learning experiences to help organizations discover new ways to solve their most pressing leadership development challenges,” said Rich Gravelin, Director, Partnerships and Alliances, at Harvard Business Publishing, in a statement. “As an inaugural partner in the LinkedIn Learning Content Partner Program, we are bringing rich leadership development content to professionals across the globe, helping them navigate today’s complex business landscape. Thanks to the robust platform that LinkedIn Learning has built, we’re able to meet learners where they are and provide them with the unique and personalized learning experiences they need to succeed in their organizations.” The social features also follow this model. Last year, LinkedIn rolled out a mentorship product across selected markets to pair users with people who can give them steers on their career development. That product set out a precedent for how LinkedIn might use its wider social network and communication features to engage users in different ways, in the name of professional development. The new addition of Q&A features follows on from that, giving those taking courses or watching videos a way of interacting and following up with those who are doing the teaching. Adding that in could see more engagement across the whole of the Learning product. It’s a surprise, in a way, that it’s taken this long for LinkedIn to add an interactive Q&A feature in, considering that direct messaging and users interacting with each other has been a cornerstone of the product. On the other hand, it will be interesting to see if it proves to be a compelling enough feature to bring in more users to LinkedIn, luring them away from Udemy’s and Skillsofts of the world.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Disclaimer: Trading in bitcoins or other digital currencies carries a high level of risk and can result in the total loss of the invested capital. theonlinetech.org does not provide investment advice, but only reflects its own opinion. Please ensure that if you trade or invest in bitcoins or other digital currencies (for example, investing in cloud mining services) you fully understand the risks involved! Please also note that some external links are affiliate links.