Hate Speech Detection

Project Aim: To build a machine learning system that can detect hate-speech in social media

In recent years, disseminating hate speech around the world has increased significantly. To counter this ever-growing threat to society, many automated systems to detect hate speech in social media have been recently proposed by the research community. This project aims to develop hate speech detection systems in Turkish and Arabic separately, taking into account the specificity and richness of the morphological structure inherent in each language. Language resources such as labeled corpora and lexicon will be developed specifically for hate speech detection; and shared with other researchers.

There is a continuum of hate-speech, from negative stereotyping to aggravated hate speech. In order to understand and obtain data to train a machine learning system that automatically categorize tweets, we started labelling tweets with hashtag #İstanbulsözleşmesiyaşatır into one of the 4 categories:

Project team: The project team consists of computer scientists and social scientists, in order to be able to address this challenging problem:

Berrin Yanıkoğlu, Reyyan Yeniterzi, Ayşecan Terzioğlu and İnanç Arın; in collaboration with Onur Varol and Kamer Kaya.

Graduate students: Buse Çarık and Fatih Beyhan

Support: Our project is supported by a TÜBİTAK Bilateral Collaboration Grant (119E358)

The datasets collected within the project can be downloaded from https://drive.google.com/drive/folders/1OYq1YLF7suQ-iPd32d61ocTj92G_RKD2?usp=drive_link