In April 2021, a large data set of over 500 million Facebook users was made freely available for download. Encompassing approximately 20% of Facebook's subscribers, the data was allegedly obtained by exploiting a vulnerability Facebook advises they rectified in August 2019. The primary value of the data is the association of phone numbers to identities; whilst each record included phone, only 2.5 million contained an email address. Most records contained names and genders with many also including dates of birth, location, relationship status and employer.
In November 2022, a significant data leak involving WhatsApp was reported. Approximately 487 million WhatsApp user records were offered for sale on an underground forum. This data included mobile phone numbers from 84 countries. It is speculated that the data might have been obtained through scraping, which violates WhatsApp's terms of service. However, the exact method used by the seller was not disclosed.
In approximately August 2021, hundreds of gigabytes of data produced by Bureau van Dijk (BVD) was obtained and later published to a popular hacking forum. BVD claims to "capture and treat private company information for better decision making and increased efficiency", and the corpus of data released contained hundreds of millions of lines about corporations and individuals, including personal information such as names and dates of birth. The data also included 28M unique email addresses along with physical addresses (presumedly corporate locations), phone numbers and job titles.
This collection is part of a larger series of data dumps, including Collections #1 through #5, which compiled email addresses and passwords from thousands of sources, from previously known data breaches and some new alleged breaches. Collection #1 alone contained about 2.7 billion records, including 1.2 billion unique email and password combinations, 773 million unique email addresses, and 21 million unique plaintext passwords. Additional collections, named Collections #2 through #5, along with "AP MYR&ZABUGOR #2" and "ANTIPUBLIC #1," were also discovered, significantly adding to the scope of compromised data.