Microsoft Bing

Report March 2025

Submitted
Commitment 26
Relevant Signatories commit to provide access, wherever safe and practicable, to continuous, real-time or near real-time, searchable stable access to non-personal data and anonymised, aggregated, or manifestly-made public data for research purposes on Disinformation through automated means such as APIs or other open and accessible technical solutions allowing the analysis of said data.
We signed up to the following measures of this commitment
Measure 26.1 Measure 26.2 Measure 26.3
In line with this commitment, did you deploy new implementation measures (e.g. changes to your terms of service, new tools, new policies, etc)?
Yes
If yes, list these implementation measures here
Bing released a specialized dataset of European Parliament election related queries in different EU languages for use by the research community and to support transparency. Researchers can apply using the form found here

Do you plan to put further implementation measures in place in the next 6 months to substantially improve the maturity of the implementation of this commitment?
Yes
If yes, which further implementation measures do you plan to put in place in the next 6 months?
Bing is actively exploring additional mechanisms to meet this commitment and welcomes feedback from the research community and Commission on the types of data that would be most useful to the research community. Bing is working to provide additional open datasets and resources that may be used by the research community. 
Measure 26.1
Relevant Signatories will provide public access to non-personal data and anonymised, aggregated or manifestly-made public data pertinent to undertaking research on Disinformation on their services, such as engagement and impressions (views) of content hosted by their services, with reasonable safeguards to address risks of abuse (e.g. API policies prohibiting malicious or commercial uses).
QRE 26.1.1
Relevant Signatories will describe the tools and processes in place to provide public access to non-personal data and anonymised, aggregated and manifestly-made public data pertinent to undertaking research on Disinformation, as well as the safeguards in place to address risks of abuse.
Bing Search and Microsoft are dedicated to supporting the research community and regularly provide information and data to the research community in a variety of ways.

Bing Search already provides researchers and the public with access to MS MARCO, a collection of datasets focused on deep learning in search that are derived from Bing Search queries and related data. Research organizations can gain access to the MS MARCO datasets instantaneously via the MS MARCO homepage. The MS MARCO dataset has been cited in numerous research papers since its release and has been utilized for a range of research issues, including in connection with misinformation and disinformation. Because the dataset is provided open source, the extent to which it has been used for disinformation related research purposes cannot easily be ascertained. 

Bing Search also provides researchers with access to ORCAS: Open Resource for Click Analysis in Search | msmarco (microsoft.github.io), a click-based dataset associated with the TREC Deep Learning Track, which provides 18 million connections to 10 million distinct queries and is available to researchers. 

In 2020, Bing Search also shared a search dataset for Coronavirus Intent comprised of queries from all over the world that had an intent related to the Coronavirus or Covid-19 (e.g., searches for “Coronavirus updates Seattle” or “Shelter in place”) for use by researchers and the public. This data, which is divisible by country, is particularly relevant to misinformation research on public health issues and the COVID-19 pandemic, as it provides insights into how users sought information related to the coronavirus during the pandemic. The dataset was also posted to Azure Open datasets for Machine Learning, Tensorflow.org,  and Kaggle. See additional information on the dataset at Extracting Covid-19 insights from Bing Search data  | Bing Search Blog

In 2024, Microsoft publicly released a new information rich dataset, MS MARCO Web Search dataset, leveraging Bing search data. This dataset closely mimics real-world web document and query distribution and provides rich information for various kinds of downstream tasks and encourages research in various areas, It also contains rich information from the web pages, such as visual representation rendered by web browsers, raw HTML structure, clean text, semantic annotations, language and topic tags labeled by industry document understanding systems, etc. MS MARCO Web Search further contains 10 million unique queries from 93 languages with millions of relevant labeled query-document pairs collected from the search log of the Microsoft Bing search engine to serve as the query set.

Additionally, researchers who are registered webmasters may utilize Bing Search’s Keyword Tools and Backlinks Webmaster Tools to provide insights into search usage and keywords. Bing is also working on ways to provide deeper research access to the tool across the research community and hopes to provide updates in its next report. 

Bing Search also offers use of Bing APIs to the public, which include Bing Image Search, Bing News Search, Bing Video Search, Bing Visual Search, Bing Web Search, Bing Entity Search, Bing Autosuggest, and Bing Spell Check. Bing Search provides free access to these APIs for up to 1,000 transactions per month, which may be leveraged by the research community. 

In addition to the above datasets, Microsoft Research maintains a public portal of codes, APIs, software development kits, and datasets that are available to the Research Community at Researcher tools: code & datasets - Microsoft Research. These public research tools can be accessed by researchers and downloaded instantaneously without formal applications or login credentials. 

Bing launched a Qualified Researcher Program to enable EU researchers to easily request access for publicly accessible Bing data from a singular landing page.  However, because these datasets are already available open-source (see below), we expect some researchers may elect to obtain datasets via the above means to avoid the burden of an application and credentialing process.

Bing compiled a specialized dataset of European Parliament election related queries in different EU languages for use by the research community and to support transparency; researchers can apply using the form foundhere Additionally, Bing has engaged with European researchers to discuss the types of data that will be most useful to the research community.
 
Microsoft is also a leader in research in Responsible AI and provides a range of tools and resources dedicated to promoting responsible usage of artificial intelligence to allow practitioners and researchers to maximize the benefits of AI systems while mitigating harms. 

Lastly, given the open nature of the Bing Search index and public nature of search results, researchers can utilize Bing Search or Bing’s generative AI experiences to run specific queries and analyze results (unlike social media which may require private accounts or connections between users to access certain materials).

QRE 26.1.2
Relevant Signatories will publish information related to data points available via Measure 25.1, as well as details regarding the technical protocols to be used to access these data points, in the relevant help centre. This information should also be reachable from the Transparency Centre. At minimum, this information will include definitions of the data points available, technical and methodological information about how they were created, and information about the representativeness of the data.
Bing Search will publish information as it continues to build further data research infrastructure pertinent to these commitments. 
SLI 26.1.1
Relevant Signatories will provide quantitative information on the uptake of the tools and processes described in Measure 26.1, such as number of users.
Because the above-mentioned tools discussed in QRE 26.1.2 predate the Code of Practice and were provided open source without tracking mechanisms, Microsoft is working on developing improved usage tracking for these publicly accessible researcher tools and datasets.