Leveraging Linked Open Data to Automatically Answer Arabic Questions

DOI

10.1109/ACCESS.2019.2956233

Document Type

Journal Article

Publication Date

1-1-2019

Publication Title

IEEE Access

Volume

7

First Page

177122

Last Page

177136

Keywords

Arabic language, natural language processing, question answering systems, Semantic web, structured data

Abstract

The interchangeably connected Web technologies and the advancements that accompany the semantic web content's leaps, have raised many challenges in the results' retrieval process especially for the Arabic Language. This research targets an important, yet insufficiently precedent, area in using Linked Open Data (LOD) for Automatic Question Answering systems in the Arabic Language. The significance of work presented, comes from its ability to overcome many challenges in querying Arabic content. Some of these challenges are: (a) bridging the gap between natural language and linked data by mapping users' queries to a standard semantic web query language such as SPARQL, (b) facilitating multilingual access to semantic data, and (c) maintaining the quality of data. Another challenging aspect was the lack of related work and publicly available resources for Arabic Question Answering Systems over Linked Data, despite the vastly growing Arabic corpus on the web. This paper presents a novel approach that targets Automatic Arabic Questions' Answering Systems whilst bypassing many featured challenges in the field. A hybrid approach that evaluates the effectiveness of using LOD to automatically answer Arabic questions is developed. The approach is developed to map users' questions in Modern Standard Arabic, to a standard query language for LOD (i.e. SPARQL) through: (i) extracting entities from questions and linking them over the web using Named-Entity Recognition and Disambiguation (NER/NED), and (ii) extracting properties among extracted named entities using a dependency parsing approach integrated with Wikidata ontology. To evaluate our proposed system, an Arabic questions dataset was created including: (a) Question body in Arabic language, (b) Question type, (c) SPARQL Query formulation, and (d) Question answer. Evaluation results are promising with a Precision of 84%, a Recall of 81.3%, and an F-Measure of 82.8%.

Open Access

Gold

Share

COinS