Skip to main content

Table 1 Demographic and social variables for profile detection

From: Celebrity profiling through linguistic analysis of digital social networks

Characteristic

Authors

Tittle

Database

Gender

Simaki et al. [50]

Evaluation and Sociolinguistic Analysis of Text Features for Gender and Age Identification.

The collection of blog posts on the website of 19,320 bloggers. These publications were extracted from blogger.com in August 2004. The corpus size is 681,288 publications containing more than 140 million words.

Gender

Argamon et al. [33]

Gender, Genre, and Writing Style in Formal Written Texts

British National Corpus

Gender

Simaki et al. [27]

Sociolinguistic Features for Author Gender Identification: from qualitative evidence to quantitative analysis.

2936 posts from blog hosting sites and blog search engines.

Age

Moreno-Sandoval et al. [77]

Age Classification from Spanish Tweets: The Variable Age Analyzed by using Linear Classifiers

Spanish Colombian Tweets pre-processing 50,819 accounts of people linked to universities and 734,037 accounts of people linked to celebrities.

Age and gender

Johannsen et al. [51]

Cross-lingual syntactic variation over age and gender

International user review websites

Age and gender

Peersman et al. [28]

Predicting Age and Gender in Online Social Networks

Chat texts from the Belgian social networking site, Netlog.

Age and gender

Rangel et al. [11]

Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter

Twitter texts and images from a corpus covering Arabic, English and Spanish were analyzed.

Age, gender, socioeconomic level

Moreno-Sandoval et al. [78]

Spanish Twitter Data Used as a Source of Information About Consumer Food Choice

1.3 million Spanish Twitter texts where 11,691 tweets mentioned food with an initial food knowledge base of 1128 words and an own generation of Knowledge Base.

Personality, gender and age

Schwartz et al. [8]

Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

Facebook messages from 75,000 volunteers

Age, gender, occupation and fame

Radivchev et al. [24]

Celebrity Profiling using TF-IDF, Logistic Regression, and SVM–Notebook for PAN at CLEF 2019

Approximately 53 million PAN tweets at CLEF 2019

Age, gender, occupation and fame

Petrik and Chuda [26]

Twitter feeds profiling with TF-IDF–Notebook for PAN at CLEF 2019

Approximately 53 million PAN tweets at CLEF 2019

Age, Gender, Occupation and Fame

Matinc et al. [25]

Who is hot and who is not? Profiling celebs on Twitter–Notebook for PAN at CLEF 2019

Approximately 53 million PAN tweets at CLEF 2019

Gender, education, party affiliation and yearbirth

Przyby and Teisseyre [32]

Analyzing Utterances in Polish Parliament to Predict Speaker’s Background

100 statements by the same author and multilevel annotations from the corpus source.

Gender and class Role

Milroy [31]

Mechanisms of change in urban dialects: the role of class, social network and gender

Previous sociolinguistic studies

Age, occupation and social class

Sloan et al. [35]

Who Tweets? Deriving the Demographic Characteristics of Age, Occupation and Social Class from Twitter User Meta-data

Profile of Twitter Users in the United Kingdom (UK). Metadata data collected through the Collaborative Social Networking Observatory (COSMOS)

Occupation

Huang et al. [29]

Multi-source integration framework for user occupation inference in social media systems

Micro blog platforms: Sina Weibo.