阿根廷vs墨西哥竞猜
 library logo
    • login
    view item 
    •   knowledge commons home
    • electronic theses and dissertations
    • electronic theses and dissertations from 2009
    • view item
    •   knowledge commons home
    • electronic theses and dissertations
    • electronic theses and dissertations from 2009
    • view item
    javascript is disabled for your browser. some features of this site may not work without it.
    quick search

    browse

    all of knowledge commonscommunities & collectionsby issue dateauthorstitlessubjectsdisciplineadvisorcommittee memberthis collectionby issue dateauthorstitlessubjectsdisciplineadvisorcommittee member

    my account

    login

    harnessing generative ai for overcoming labeled data challenges in social media nlp

    thumbnail
    view/open
    liyanagec2023m-1a.pdf (1.176mb)
    date
    2023
    author
    liyanage, chandreen ravihari
    metadata
    show full item record
    abstract
    with the introduction of transformers and large language models, the field of nlp has significantly evolved. generative ai, a prominent transformer-based technology for crafting human-like content, has proven powerful skills across numerous nlp tasks. simultaneously, social media emerges as a rich source for nlp explorations, offering vast and diverse datasets that capture real-time language usage, making it a valuable resource for understanding and advancing nlp techniques. given that supervised learning is the most popular machine learning training method, numerous nlp studies necessitate labor-intensive annotation of social media text. however, despite the large amount of data available, the social media data annotation process is usually difficult for human experts due to unique characteristics of text, such as shortness, lack of context, embedded socio-cultural perspectives, and varied writing styles. the challenges in constructing labeled social media datasets often result in a scarcity of labeled data and the generation of low-quality labels. moreover, these datasets frequently face class imbalance due to the limitations of labeled samples. hence, ensuring a balanced, high-quality dataset in sufficient quantities is crucial for the robust and accurate development of nlp models. to address these challenges, this study has identified the usage of generative ai for social media labeled text generation. specifically, this study focuses on two key objectives: augmenting existing labeled text samples and annotating unlabeled text samples using generative ai. as the generative ai technology, the generative pre-trained transformer model, a prevalent choice for ai-based content generation is employed in different versions throughout the study and evaluated its performance against traditional text augmentation and annotation methods. while both studies centered around multi-class classification problems, the text augmentation approach delves into augmenting human wellness dimensions using reddit posts, and text annotation tackles stance detection on abortion legalization using twitter posts. by employing various classifiers, the subsequent investigations aim to enhance classification performance in social media nlp, emphasizing the common goal of expanding labeled datasets, while enhancing the quality of labels.
    uri
    https://knowledgecommons.lakeheadu.ca/handle/2453/5275
    collections
    • electronic theses and dissertations from 2009 [1612]

    阿根廷vs墨西哥竞猜 library
    contact us | send feedback

     

     


    阿根廷vs墨西哥竞猜 library
    contact us | send feedback