Research Topics in English

Our research topics cover natural language processing and related fields.

Natural Language Processing refers to application of computers for handling human language. Typical applications include machine translation and information retrieval. The opportunities for us to come into contact with language through PCs and smartphones are becoming more and more diverse, and new applications appear continuously. It is also one of the fields that have benefited the most from recent breakthroughs in Artificial Intelligence, such as ChatGPT.
Here are some of our current research topics.

Machine Translation

Machine translation is a traditional and cutting-edge research topic that has been a major application of computers since the dawn of the computer age. In recent years, translation from written Japanese to English has become quite fluent with the introduction of a technology called neural machine translation, but human correction is still essential in situations where accuracy is required, and there are still issues to be solved in spoken English translation.

Japanese-English translation question correction system (demo system (under maintenance))
Verification of machine translation capability using a large-scale language model

Automated Essay Scoring

There are cases in which a rating is assigned to a statement, such as a written question in a school exam or a review of a product on an online shopping site. This is a labor-intensive process, so our motivation is to automate it. Currently, we are working on the task of predicting the evaluation of descriptive tasks for which there are a certain number of examples of evaluation, but for which the evaluation criteria are complex.

Automatic scoring of Japanese essays

Detection of generated text using large-scale language models

Texts generated by large-scale language models such as ChatGPT are much more fluent than those generated by conventional models, and are becoming indistinguishable from texts written by humans. Systems that can detect whether text is automatically generated or not can be built using the same models, and can at least discriminate more accurately than humans, but it is difficult to confirm that the text is automatically generated. We will focus on the problems that arise in the widespread use of text generation models, and how each problem can be solved through automatic text detection.

Sentiment Analysis for Implicit Expressions

When performing sentiment analysis to automatically determine the sentiment polarity of a text, e.g., positive or negative, it is not possible to derive the correct sentiment from superficial information alone if the text contains euphemisms such as sarcasm. We are conducting research to improve the detection accuracy of implicit expressions by using knowledge of the background and rationale for why the expression is treated as sarcasm or hate speech.

Hate speech detection based on the rationales
Analysis of sarcasm using commonsense extraction system

Chat dialogue system

Among dialogue systems, there is one that targets chats between humans and is expected to be effective in preventing dementia. However, unlike dialogue systems that aim to provide tourist information or other such information, chatting requires a system to behave more like a human being. Currently, our research focuses on improving the accuracy of speech recognition in a dialogue environment and estimating the degree of willingness of the conversation partner to talk about the topic in order to successfully continue chatting with the dialogue system.

Patent Information Processing

The documents that make up patents are voluminous and have a unique style that differs from ordinary written language in order to obtain legal validity. For this reason, natural language processing for patent-related documents needs to be specialized for such documents. In addition, there are issues arising from the unique circumstances of patents, such as patent search, classification, and translation of patent documents.

Self-Introduction Page in the Faculty of Informatics, Shizuoka University