From: Celebrity profiling through linguistic analysis of digital social networks
Characteristic | Authors | Tittle | Database |
---|---|---|---|
Gender | Simaki et al. [50] | Evaluation and Sociolinguistic Analysis of Text Features for Gender and Age Identification. | The collection of blog posts on the website of 19,320 bloggers. These publications were extracted from blogger.com in August 2004. The corpus size is 681,288 publications containing more than 140 million words. |
Gender | Argamon et al. [33] | Gender, Genre, and Writing Style in Formal Written Texts | British National Corpus |
Gender | Simaki et al. [27] | Sociolinguistic Features for Author Gender Identification: from qualitative evidence to quantitative analysis. | 2936 posts from blog hosting sites and blog search engines. |
Age | Moreno-Sandoval et al. [77] | Age Classification from Spanish Tweets: The Variable Age Analyzed by using Linear Classifiers | Spanish Colombian Tweets pre-processing 50,819 accounts of people linked to universities and 734,037 accounts of people linked to celebrities. |
Age and gender | Johannsen et al. [51] | Cross-lingual syntactic variation over age and gender | International user review websites |
Age and gender | Peersman et al. [28] | Predicting Age and Gender in Online Social Networks | Chat texts from the Belgian social networking site, Netlog. |
Age and gender | Rangel et al. [11] | Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter | Twitter texts and images from a corpus covering Arabic, English and Spanish were analyzed. |
Age, gender, socioeconomic level | Moreno-Sandoval et al. [78] | Spanish Twitter Data Used as a Source of Information About Consumer Food Choice | 1.3 million Spanish Twitter texts where 11,691 tweets mentioned food with an initial food knowledge base of 1128 words and an own generation of Knowledge Base. |
Personality, gender and age | Schwartz et al. [8] | Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach | Facebook messages from 75,000 volunteers |
Age, gender, occupation and fame | Radivchev et al. [24] | Celebrity Profiling using TF-IDF, Logistic Regression, and SVM–Notebook for PAN at CLEF 2019 | Approximately 53 million PAN tweets at CLEF 2019 |
Age, gender, occupation and fame | Petrik and Chuda [26] | Twitter feeds profiling with TF-IDF–Notebook for PAN at CLEF 2019 | Approximately 53 million PAN tweets at CLEF 2019 |
Age, Gender, Occupation and Fame | Matinc et al. [25] | Who is hot and who is not? Profiling celebs on Twitter–Notebook for PAN at CLEF 2019 | Approximately 53 million PAN tweets at CLEF 2019 |
Gender, education, party affiliation and yearbirth | Przyby and Teisseyre [32] | Analyzing Utterances in Polish Parliament to Predict Speaker’s Background | 100 statements by the same author and multilevel annotations from the corpus source. |
Gender and class Role | Milroy [31] | Mechanisms of change in urban dialects: the role of class, social network and gender | Previous sociolinguistic studies |
Age, occupation and social class | Sloan et al. [35] | Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-data | Profile of Twitter Users in the United Kingdom (UK). Metadata data collected through the Collaborative Social Networking Observatory (COSMOS) |
Occupation | Huang et al. [29] | Multi-source integration framework for user occupation inference in social media systems | Micro blog platforms: Sina Weibo. |