Show simple item record

dc.creatorHakawati, Mohammed Ragheb
dc.date2018
dc.date.accessioned2023-09-05T03:17:36Z
dc.date.available2023-09-05T03:17:36Z
dc.identifier.urihttp://dspace.unimap.edu.my:80/xmlui/handle/123456789/79144
dc.descriptionDoctor of Philosophy in Computer Engineeringen_US
dc.description.abstractExtensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total, XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. Consequently, it has become increasingly important to provide a full model which is able to detect, and correct inconsistencies recognized as violations of data dependencies causing the decrease of XML data quality. XML integrity constraint plays an important role in keeping XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of schema rather than that of data. The purpose of this study is to improve the quality of XML documents by introducing an enhanced cleaning model based on a new type of XML integrity constraints called XML Conditional Inclusion Dependencies (XCIND) and XML Conditional Functional dependencies (XCFD). The notations of the new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. Finally, data inconsistencies are detected using denial queries for mined rules and repaired using a different set of update statements as solutions for inconsistent data values. Through the extensive experimental evaluation of real XML datasets, proposed mining algorithms demonstrated their efficacy and high performance in discovering all conditional dependencies with different support and confidence thresholds. The results showed that the new model could increase XML quality by detecting more real spurious data values than previous models based on traditional dependencies. Furthermore, the XML Cleaner can sense inconsistencies between same tree tuples or even between multilevel tree tuples insides the XML tree using the mentioned conditional dependencies. Moreover, the quality of the documents was assessed using two measures (Precision and Recall), and the accuracy of XML documents was improved over 94%, 83% respectively for these measures. To this end, XML conditional integrity constraints, just as their relational counterpart, prove their ability to pave the way toward new standards of cleaning applications for XML data model, especially in the big data era.en_US
dc.language.isoenen_US
dc.publisherUniversiti Malaysia Perlis (UniMAP)en_US
dc.rightsUniversiti Malaysia Perlis (UniMAP)en_US
dc.subjectXML (Document markup language)en_US
dc.subjectExtensible Markup Language (XML)en_US
dc.titleXML cleaning model for data quality improvement using conditional integrity constraintsen_US
dc.typeThesisen_US
dc.contributor.advisorYasmin, Mohd Yacob, Dr.
dc.publisher.departmentSchool of Computer and Communication Engineeringen_US


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record