Arabic Language Computing applied to the Quran



Download 32.67 Kb.
Date31.07.2017
Size32.67 Kb.
#25393
Arabic Language Computing applied to the Quran
- YouTube and PowerPoint presentation by Kais Dukes, University of Leeds
TRANSCRIPT
=========================
[0:00] Hello. This is a talk on Arabic language computing applied to the Quran.

A PhD research project by Kais Dukes at the Institute for Artificial

Intelligence and Biological Systems, at the School of the Computing,

University of Leeds.


[0:15] If you Google Kais Dukes you will discover his website

which shows he's a financial software engineer in the financial industry

in the city of London. He's also doing a part-time PhD. Unfortunately he is

very busy at the moment and unable to present in person.

So I, his supervisor Eric Atwell, am presenting this for him.
[0:30] The challenge is to try and find an interdisciplinary approach to

understanding the Quran using ideas from Quranic studies, traditional Arabic

linguistics and computing research, and hopefully feeding back to all three

areas.
[0:45] The Quran is the last in a series of five major religious texts.

Believers hold that God gave the message to the angel Gabriel to

pass it on to Muhammad to learn by heart and pass on to all mankind.


[1:00] It's written in the language of 1300 years ago.

All believers are supposed to try and understand the original text

rather than translations or interpretations. It has guided philosophy,

science and other aspects of knowledge...


[1:15] ...particularly Arabic linguistics, which was developed to try and help

understand the Quran, and it's been the guiding light in theories of syntax,

semantics and discourse analysis, used today on modern English too.
[1:30] As far as computers are concerned, there are many websites where you can

access the Quran, but you can only search for verse-by-verse. You can search for

individual verses which contain words; so basically Google-style searching.
[1:45] It would be nice in theory to be able to ask questions in plain English,

like "How long should I breastfeed my child for?" and have an AI system which

computes the meaning, and finds the verse which has relevant meaning to answer

the question.


[2:00] Machine learning works by taking data, and then learning patterns and

classifications in the data. If we augment the data with linguistic and

semantic concepts, then the AI system can learn conceptual patterns
[2:15] So, we need to augment the Quran text with linguistic annotations.

However, this is challenging, as the Quran is written in a complex script

with very difficult word structure, grammar and semantics.
[2:30] But Computational Linguistics research methods offer a solution.

We can get hold of the text of Traditional Arabic grammar textbooks,

extract the meanings of the of the grammatical descriptions, use this for

machine learning, and then put results online for volunteers to correct.


[2:45] The first task is to get hold of an authentic version of the text. If you

just use modern Windows encoding or Unicode, this doesn't display the original

text correctly.
[3:00] Luckily, there was a project - called the Tanzil project – which started

around the time Kais started this research effort, which came up with a Unicode

XML encoding which allows the text to be displayed authentically in its original

form.
[3:15] So, Kais had to start by developing a Java API or large set of code which

allowed you to read this XML and display the original text authentically on a

web page.


[3:30] This then allows us to do morphological analysis as a next stage, and

there are tools for morphological analysis for modern standard Arabic and there

has been some progress on analysing the Quran at the University of Haifa, there

are also formal lexical representations developed at Columbia University.


[3:45] The trouble with the Haifa corpus is that they didn't really complete it,

so each word has many possible analyses and they were not verified by experts

who know what was correct, and it's a non-standard annotation scheme.
[4:00] So Kais' answer was to develop the Quranic Arabic Corpus website, do a

lot of analysis, and put it all online, for people to see and use, and correct

if necessary, including word structure, word-for-word translations, grammatical

and semantic representations.


[4:15] So here we have the base - the verified Uthmani script for a word. You

have to read the Arabic from right-to-left. Of course, if you don’t speak

Arabic, this doesn’t mean much to you. But if you do speak Arabic, you can see

this is the correct original format.


[4:30] For non-Arabic speakers, there is also a phonetic transcription – not

using true international phonetic alphabet, but something like the standard

roman alphabet, so English speakers, if you learnt English as a second language,

you can probably work this out.


[4:45] Also the assumption is that an awful lot of learners of the Quran do

speak English. So we’ve added an interlinear word-by-word exact translation of

what the Arabic morphemes mean.
[5:00] And there is a referencing system which allows you to locate any

particular chapter, verse, word, and even segment so you can find others which

have the same ones - a complex referencing system.
[5:15] Now on top of that, each Arabic word is quite complex, so a typical word

may have a root, for example a verb, and then a conjunction at the start of it,

and then a subject and object pronoun after it. So, you have to segment the word

into individual parts.


[5:30] And there is quite a lot of detailed information as to what the grammatical

categories of the individual parts are. So this is – reading from right-to-left

– a conjunction, followed by a main verb, followed by subject pronoun, followed

by an object pronoun.


[5:45] And for use by Arabic grammarians, as there are an awful lot of Arabic

grammarians in the Arab world who prefer to speak Arabic, there is also an

automatically generated Arabic translation of the grammatical description.
[6:00] Somewhat more complicatedly, there is a parse structure tree, or diagram,

showing the grammatical structure for each sentence, based on the traditional

Arabic grammar of I’rāb rather than modern linguistics.
[6:15] There is also a quite complex ontology, which is a set of all the

entities or ‘things’. Every noun or pronoun refers to some ‘thing’ and this is

linked to from the text, and you can find all the instances of that from the

ontology.


[6:30] On top of this there is quite a complex framework for collaboration. A

message board, so that anybody finding anything wrong can point it out and a

large set of downloadable resources including the software and the data.
[6:45] This is used by researchers and members of the public worldwide.

This map shows where the users are. Many in America and Britain, but

also around the whole world. And these are not just lay people trying to

read the Quran, but many researchers worldwide.


[7:00] So as far as AI and computational linguistics – what’s new. Well, it’s

the first treebank of parsed trees for Classical Arabic, and it’s the only one

that’s freely available. And it’s also a formalism for traditional Arabic

grammar, used in machine learning parsers.


[7:15] This is a novel part-of-speech tagging system. So for each word there is

quite a detailed grammatical category, gleaned from the traditional Arabic

grammar textbooks – but formalized in a computational sense.
[7:30] Kais has also developed a parser, which takes examples of these trees,

and can using machine learning to work out the patterns for parsing and then

apply the parser to new sentences, such as other Classical Arabic sentences.
[7:45] So how does he meet the criteria for postgraduate researcher of the year?

Able to communicate research to the lay and non-specialist audience, and impact

on the rest of the world, and engagement to the public.
[8:00] Well, there is a feedback page, on the website, that includes lots of

feedback from members of the public, but also some academic researchers

non-specialists, such as professor Michael Arthur, Vice Chancellor of Leeds

University.


[8:15] And in terms of impact, over a million users have used it in the past

year alone, and obviously it’s just starting, so there will be many more. And

there are lots of interesting users such as a chaplain in the correctional

center of the state of Missouri, as an interesting example.


[8:30] Scientific impact in terms of the subject area - he has already

published many papers including a significant journal entry, and quite a few

citations even though he is only half way through his PhD, and has lots of

positive feedback from other researchers.


[8:45] And there has been news articles, for example in the Muslim Post, and the

website itself has gots lots of public users and feedback that

is definitely public engagement on a worldwide scale, shall we say.
[9:00] So, to conclude, well this isn’t the conclusion because he’s only half

way through his PhD project, given it’s a part-time PhD project. So, I hope you



are going to give him the award of Postgraduate researcher of the year, if not

he can come back and try again next year.

Download 32.67 Kb.

Share with your friends:




The database is protected by copyright ©ininet.org 2024
send message

    Main page