#TidyRainbow

A repo for LGBTQ data

The one stop for finding and using LGBTQ data for your data science, machine learning and research needs.
TidyRainbow
data
LGBTQ+
Author

Zane (She/They)

Published

June 25, 2022

TidyRainbow

This project started out of the need for representation in data, as a Trans person I hardly see anyone post in the #TidyTuesday or any other #RStats tweet that includes Transgender people. The response I often get in response to why is the data only for cisgender people is that the data has only a gender binary. After seeing so many people in academia post gender binary posts despite that their institutions have LGBTQ students the anger for change started inside me. With the initial relaunch meeting the needs assessment included LGBTQ data, hence the impetus for making #TidyRainbow.

Searching the internet for LGBTQ data was going to be a challenge, I knew that there are few countries that collect LGBTQ data and fewer places that are accessible. My goal was to include non-English data and did try in German, India and Thailand but as English being my native language and Google Translate goes only so far. Wanting to share that I was mindful in my search but ultimately went with English sources as they offered the path of least resistance.

Searching the English speaking internet for data is hard, as there are few places that have data and it being accessible. Anyone knows that most people never go beyond page 2 in Google search, but I went to page 20 trying to find non-redundant data. On page one of Google, you find Kaggle has 2 datasets and few other websites but ultimately very little data. This is where you spend so much time scouring for where there is actual data and not just a news report citing something, we want actual data that has LGBTQ people.

Normally I search on private mode but luckily I was searching in cookie tracking mode as I lost all of my work on the links I found when changes made to the repo happened. After tracking down all the links I found I came up with a list and started to pull data down from the European Union Freedom Rights Association which has 30 questions and over 10 topics, each dataset being hosted on their website. The data I pulled down and cleaned was a pain, as the website has “all” option or individually selecting the groups for the questions and by which factor. The “all” option only gives you the column labelled “all” which is not informative, as you can’t make a difference between each group in the LGBTQ data based on a factor. There is a few xlsx datasets available that are about questions regarding safety and being open in society in the collective EU.

The need for data extraction and searching continues, there are government websites and university studies that need to be looked at. The goal is to be the place for LGBTQ data that everyone can find and use for their data science needs. We need to be seen, we need to be represented in data science visualizations.