Ahmedabad: How can hate speech or offensive content online be detected when characters used are masked by replacing ‘o’ with zero ‘0′, or when words are written using asterisks to evade algorithms, or when allusions and euphemisms are used? Let’s make the detection even more challenging. What if such messages on social media are in mixed languages, for example with English and an Indian language, or are written in English script but with an Indian language lexicon?
In the era of fake news and deepfakes, experts gathered at DAIICT from Dec 12 to 15 for the Forum for Information Retrieval Evaluation (FIRE) 2024 where they discuss the latest trends in large language models (LLM) of machine learning for natural language processing (NLP) tasks. These tasks range from social media surveillance to real-time translation of Indian languages and generative AI for programming codes to medical treatment.
Prof Prasenjit Majumder, coordinator for the event and a faculty member at DAIICT, said that annually, top scientists in the domain participate in FIRE and share the latest research with the community. “With AI and ML remaining buzzwords for the past few years, there is more awareness about what they can do and how they should be trained further. For example, one of the major projects we are working on is real-time translation of Parliament proceedings in Indian languages such as Bengali, Gujarati, Odia, Marathi, Tamil, Telugu and Malayalam, among others. We are part of two out of four cohorts working in the domain, and it will be a gamechanger in terms of making Parliament proceedings accessible to a wider audience,” said Prof Majumder.
One of the presentations at the conference was on generating software architecture codes and analysing legacy codes with tech giants. Srijoni Majumdar, a postdoctoral research fellow at the University of Leeds, said that companies have years of data when it comes to codes for specific tasks, and several of them are termed ‘legacy codes’ or core codes. “The firms often need to link more than one application or modify some of the functions. Coders often keep notes with these codes, but they are not updated after several years. Here we are employing the power of AI to analyse the codes and notes and provide insight into software architecture,” said Majumdar.
Thomas Mandl, professor of information science at the University of Hildesheim, is working with an Indian cohort on hate speech patterns and detection. Mandl said that it is an ongoing project that indicates both generating and detecting hate speech have improved over the past few years. “Thus, we must constantly update the system and train AI for patterns, lexicon, trending topics and specific interest groups. We are working on both English and Indian languages,” he said.