ⅠIntroduction
Traffic monitoring and facilities are enormous problems for Intelligent Transportation Systems (ITSs). ITSs are unable to monitor every corner of the transportation network based on a sensor network. It needs data which can be used to quickly identify those places and activities that have a risk for transporters. But due to the remarkable explosion of social network in the last decade, communication platforms, such as blogs, Facebook, Twitter, and online forums have become the richest source for real-time data [1, 2, 3]. People use the social network websites to publish their views on various issues related to transportation activities (e.g., traffic accident, road jammed, and sliding). The new users see the reviews of others and respond them regarding the same topic (e.g. street, city road jammed, state, road side companies and organizations). However, a continuous increase of reviews or tweets may create confusions for the travelers to identify a direct and safe destination route. In many cases, the people give their opinions about transportation activities in terms of features like "Traffic police information was in very detailed and helpful" or "At least 3 seriously injured in a two-vehicle accident near downtown" or "The Victoria downtown has a lot of facility, but the road is jammed" or opinion reviews are hidden in blogs and forums, which is difficult for user to extract meaningful information from it. Opinion-mining is the process of extracting meaningful information from public reviews and tweet’s about a specific topic using natural language processing methods or text analysis methods [4].
Presently, the researchers use Naïve Byes, Maximum Entropy, and SVM techniques to classify the social network reviews [5, 6, 7]. Most of these approaches are unable to classify the correct positive and negative opinion words and to identify the degree terms of feature polarity. It is important to decide whether a tweet conveys strong positive, positive, neutral, negative, and strong negative polarity. Further, the information extraction systems are mostly based on crisp ontology. The crisp ontology addresses only crisp data and is unable to retrieve desirable results from the hazy source of social network data. Various systems and websites of transport and travel provide a rating score for a city based on a percentage of different factors, such as green spaces, clean, tourism facilities (restaurants, hotel, parks, etc.), safe, quiet, low crime rate, and so on [8]. These rating scores do not provide precise information, although tweets or reviews are meaningful because they help the transportation section and travelers to know about each factor of the city. However, it is difficult for ITSs and travelers to read all the tweets and reviews and acquire a meaningful sentiment regarding their requirements of the city factors. Generally, people hide their opinions about the city, instead discussing it in terms of city features, for example, "an awesome foggy weather of our beautiful Manhattan" [9]. Therefore, it is important to find the individual city features polarity and then overall city polarity. In this regard, this paper proposes a fuzzy domain ontology-based opinion mining, which is a key solution for these problems and will help the travelers and ITSs. This proposed system retrieves the tweets and reviews related to transportation activities (e.g., accident, collision, crash, congestion, and jam) and city features (e.g., parks, bus and train stations, bridges, airports, medical centers, restaurants, and hotels etc.), extract the individual transportation activities and city features with their opinion words and then use a fuzzy domain ontology (FDO) to compute transportation activities polarity, city features polarity and overall city polarity in the form of more degrees of terms (strong positive, positive, neutral, negative, and strong negative). This system will enrich the performance of the ITSs by knowing the traffic problems and will provide the safe route and opinion map of the city for travelers.
This paper is structured as follows. Section 2 illustrates the basic concept of the proposed system architecture. The overall scenario and internal process of the proposed system are explained in section 3. Section 4 presents the experimental results.
ⅡThe proposed system architecture
The core of the proposed system is based on fuzzy domain ontology to handle any type of real scenario related to Transportation and city tweets and reviews. The architecture of the proposed system is shown in <Fig. 1>. It is divided into five parts. These parts are as follows.
-
Retrieve tweets and reviews
-
Pre-processing of tweets and reviews, feature extraction of the transportation activities and city
-
Opinion mining
-
Fuzzy domain ontology (FDO)
-
Transportation polarity and city features polarity map
In the first task, the system uses APIs of Twitter and e-commerce sites to retrieve tweets and reviews of social network sites (such as tripadvisor.com, Twitters, Facebook and booking.com) for an appropriate city and transportation activities. In the second task, these tweets and reviews are pre-processed to remove stop words, tag parts-of-speech, splitter the sentence, extract transportation activities, and city feature’s. In feature extraction, it extract features with related reviews from the unstructured review collection. The third task is opinion mining; this takes tweets and reviews and generates their individual polarities and accumulative polarities. The fourth task is FDO; it is executed offline and must be performed with opinion mining by semantic concepts. FDO describes the concepts of transportation and city features and their features polarity generation. It is effectively used for tweets and reviews categorization. The main task is a precise data collection, which can accelerate the development process of FDO. The FDO contains analyzed information such as
-
"P_ticket"(sub-class) "is-a" pass of "Park" (super-class of P_ticket),
-
"Hospital" (sub-class) "is-a" feature of "Medical centers" (supper-class of Hospital)
-
"Foods" (sub-class) "is-a" part of "Restaurant" (supper-class of foods)
An ontology as set of concepts, instances, properties (datatypes) and relationship (object properties). A fuzzy concepts of a park-ticket is a concepts whose instances belong to a certain degrees, such as High-park-ticket is fuzzy concept. Because high is blur predicate, the concept is also blur, therefore it should be declared as a fuzzy concept. There are two types of relations, fuzzy object relations and fuzzy data type relations. Fuzzy object relations connect instances at a certain degree and allow fuzzy role declaration such as Park-ticket has-rate high at a degree of 0.7. A fuzzy data type is used to assign a instance (e.g., park-ticket has-fuzzy-value high), which includes the price fuzzy predicate. We gathered all information such as “speed”, "Parks", "Medical centers", "Cemeteries", "Bridges", "Tunnels", "Bus station" and "Train stations", "Airports", "Jail", and "Sewage facility", "Accident", "Vehicle", "Street", "Crash", “Traffic” and delivered it to FDO. <Fig. 2> presents our ontology classes, data properties, object properties and fuzzy data types. These classes show the concept of city and transportation knowledge. Object and data properties describe the relationships and attributes of class linked to the basic data types. The fuzzy data type presents the interval of membership variable. Opinion mining uses this semantic knowledge and evaluates each pair of city and transportation feature and verify its polarity. In the six task, these feature’s polarity score is gathered from all the tweets and reviews. The final opinion mining result and polarity values are achieved for transportation and city feature’s polarity.
An ontology is domain knowledge among people and systems. It is written in a specific language called OWL. To achieve the efficiency for the proposed ontology, a classical ontology is developed using Protege OWL and then a fuzzy OWL plugin is used to convert it into fuzzy ontology [10]. The fuzzy OWL plugin is employed to declare fuzzy terms in the ontology. The axioms, instances, concepts and classes of classical and fuzzy ontology are same. However, all the concept values of classical ontology are blurry terms. A classical ontology cannot handle the uncertainty. A fuzzy ontology generally defined to express vague knowledge using fuzzy concepts. Therefore, this system needs fuzzy ontology to handle any type of scenario related to opinion mining.
The DL query and SPARQL query is used to retrieve the feature polarity instance from FDO. The idea of traffic accidents knowledge management based on ontology is presented in [11]. The authors accumulate useful information regarding vehicles, climate, environment and road and delivered them to ontology for the traffic accident management system. Opinion mining based on machine learning methods is proposed to identify tourist attraction target [12]. Both the sentence level and document level opinions are analyzed and compare it to achieve the target opinions. Mathematically, ontology can be defined as follows.
In the above equation, the notation’s C, P, R, V, and Vc stand for concepts, properties of concepts, relationship among concepts, values of concepts, and constraint values of properties, respectively [4, 13]. First, classical ontology is designed, and then a semi-automatic plug-in of Protégé, called fuzzy OWL is used to convert classical ontology into fuzzy ontology. The concept of fuzzy set is presented by Lofti Zadeh in 1965 [14]. Fuzzy set theory represents vague boundaries, such as positive, neutral, and negative. A fuzzy set ‘F’ over the universe of discourse ‘A’ can be represented by its membership function μF , which presents an element ‘A’ in the interval [0, 1].
In the above equation, A ′ belongs to A and μF presents the membership degree by which A ′ єA. A ′ is considered a full member of set A if μF (A) =1. A is considered a partial member of set A if μF (A) is between zero and one (e.g., 0.63). <Fig. 3> illustrates the graphical model of FDO classes, generated by a Protégé OnToGraf plugin.
FDO exchanges the knowledge among feature extraction, reviews classification, and an feature polarity identification. Therefore, the representation of polarity for each pair of features and feature classification using FDO expedites the proposed opinion mining system. This system categorizes the features and extract the correct feature polarity terms.
ⅢOpinion mining and city opinion mapping
A system uses the ontology words to retrieve highly related tweets and reviews. Different queries are designed for tweets retrieval and then used those queries which have more than 85% recall. After the retrieval of tweets and reviews, it is observed that the precision is very low. Therefore, SVM classifier is used to identify related tweets and reviews, and remove unrelated. Unigram, bigram, and trigram techniques are used to extract features from tweets. A bigram and trigram refer to extract two and three adjacent feature words. These bigram and trigram features are shown in <Fig. 4>. The proposed system uses specific functions to find the value of each tweet. If tweet value is greater than 0, then it indicates that the tweet is related to transportation or city feature; otherwise, the tweets will be filter out.
The reviews of city features are also retrieved from e-commerce sites. The FDO is applied on tweets and reviews to evaluate the polarity of each feature. To more deeply explain the polarity computation, a city features ‘Restaurant’ and ‘Hotel’ reviews are extracted to find their polarity and then over all city polarity. These reviews are “Service is very fast and the location is good”, “The restaurant is not good as we expected” and “The room has a lot of facility, but a bit dusty.” Before feature extraction, it is important to remove stop-words, prepositions (on, in, of), and articles (the, a, an) from reviews. After this, the review sentences are checked to confirm whether it is a complete clause with a noun and verb phrase. In the above reviews, a connective word ‘and’ is used to discriminate the review sentences. The first sentence in review 1 "service is very fast and the location is good" and in review 2 "the room has a lot of facility, but a bit dusty" has conjunctions and verbs. It should be split to identify a complete clause, which will include one noun, one conjunction, and one verb. The other sentences are comprised of one noun and one verb and are considered complete sentences. The extraction of a feature from a single sentence will be performed by selecting the noun phrase. For example, the ‘service’ is a noun phrase in the review sentence "service is very fast" and can be easily identified. FDO describes the concepts of transportation activities and city features and its relation with those concepts. We imported feature extraction and polarity computation information into FDO. It is used to identify features in reviews sentences such as ‘medical center,’ ’restaurant’, and ’hotels.’ Every extracted feature from reviews is compared with FDO classes. The matched feature will be considered to predict its polarity; otherwise, it will be eliminated. Each feature has its own polarity words. For example, The feature-room has ‘clean’, ‘big’ and ‘good’ opinion words with polarity value p=1, ‘dusty’ and ‘small’ are opinion words with polarity value p=-1 and ‘normal’, ‘average’ and ‘medium’ are opinion words with polarity value p=0. Similarly, the opinion words of feature "staff" are ‘good,’ ’excellent,’ ‘great’ and ’satisfactory’ with polarity value P=1, ‘bad’ and ‘poor’ with polarity value p=-1 and ‘okay’ with polarity value p=0. Furthermore, SentiWordNet is used to verify the initial value of the corresponding opinion words. SentiWordNet is a lexical resource in which each synsets of wordNet is associated to three numerical scores positive, objective, and negative [15]. These scores show how positive, neutral, and negative the terms contained in the synset are. The opinion word ‘fast’ is used as an adverb and its opinion value is 1, which is its linguistic value in ontology. The interval for each output is as follows: the strong negative interval is [0.0-0.25], negative is [0.25-0.5], neutral is [0.5], positive is [0.5-0.75] and strong positive is [0.75-1]. The proposed system is based on these intervals, which finds the polarity by using the input opinion words value. The proposed ontology uses various rules to compute the polarity as shown in <Fig. 2>. Some rules are explained that a readers can understand the utilization of all the rules as follows.
Rule 1: Accident(?B), Road(?A), Traffic(?D), Vehicle(?C), OpinionOf(?B, StrongNegative), Speed(?C, VerySlow) -> OpinionOf(?D, StrongNegative), PolarityIs(?A, StrongNegative), TrafficIsJammedBy(?A, ?B) Rule 2: hotel(?D), restaurant(?E), Airports(?C), City(?A), Medical_Centers(?G), Parks(?B), Traffic(?F), Weather(?H), OpinionOf(?A, StrongNegative), OpinionOf(?B, StrongNegative), OpinionOf(?C, StrongNegative), OpinionOf(?D, StrongNegative), OpinionOf(?E, StrongNegative), OpinionOf(?F, StrongNegative), OpinionOf(?G, StrongNegative), OpinionOf(?H, StrongNegative) -> PolarityIs(?A, StrongNegative)The above rule 1 shows that if the opinion of ‘accident’ is ‘strong negative’ and ‘vehicle’ speed is ‘very slow’ then the ‘traffic’ are jammed because of ‘accident’, and the ‘traffic’ opinion and the ‘road’ polarity will be counted as ‘strong negative.’ Rule 2 shows that the city polarity will be ‘strong negative’ if the polarity of city features is ‘strong negative.’ Every opinion word uses SentiWordNet values and then allocates to the fuzzy inference layer to find the value of polarity terms (Strong Negative (SN), Negative (Neg), Neutral (Neu), Positive (P) and Strong Positive (SP)). The architecture of the fuzzy inference layer is based on FDO, which is shown in <Fig. 5>.
The fuzzy inference layer integrates the extracted opinion words and their polarity value to verify the features polarity of transportation and city. The fuzzy inference layer has four parts: fuzzification, inference, knowledge and rule base, and defuzzification. In our approach, different opinion words (adjective, adverb, verb) are inputs and its parameters are classical values that are associated with SentiWordNet such as very (adjective) P: 0.5 O: 0.375 N: 0.125; noise (verb) P:0 O: 1 N: 0; and not (adverb) P: 0 O: 0.375 N: 0.625. Second, the triangular member function is described to find the membership function value for each input variable. There are five linguistic values (SN, Neg, Neu, P, SP) for each input variable. The fuzzification part obtains the membership value of opinion words. The inference part applies the rules of FDO on the fuzzy interval membership function. All of the rules and linguistic values are stored in the ontology. The defuzzifier converts the fuzzy output to conventional expressions and provides the result in the form of value, which is called the polarity value.
ⅣExperiment Results
To evaluate the effectiveness of the proposed system, different types of search queries are composed to retrieve highly related and maximum amount of tweets and reviews from Twitter and e-commerce sites (booking.com, tripadvisor.com, and hotel.com). These tweets and reviews are stored in the database for further processing. The irrelevant reviews are filtered out from the database and confirmed that every sentence is in valid format. This system retrieved 3438 tweets and reviews related to six different features (3 features related to transportation and 3 related to city). The average length of the tweet is 35 and review is 56 words. The total number of transportation opinion words are 3101 and city opinion words are 3775. At first, simple classical ontology is used to classify tweets and reviews, predict polarity, and record the precision, recall, accuracy, and function measure. Later, fuzzy ontology is used to compute the results. Mathematically, the precision, recall, accuracy, and function measure can be calculated using the following equations.
Where A is the total number of records that are extracted from the internet, B and C represent the true and false elements in the extracted records. <Table 1> illustrates the results of transportation and city features. <Fig. 6> clearly shows the performance of the proposed approach based on crisp ontology and fuzzy domain ontology. It is observed that the average accuracy and average precision are increased significantly, whereas recall and function measure are decreased during opinion mining in the case of a fuzzy domain ontology.
ⅤConclusion
This paper proposes a fuzzy domain ontology based opinion mining and information extraction system. Since a number of realistic issues, for example, feature extraction, opinion word’s extraction, the declaration of feature polarity value in ontology, and polarity computation using fuzzy logic, are effectively considered. This proposed method helps the transportation systems to formulate a well-timed traffic congestion map and offers travellers the city features opinion map to know about the city before traveling. Even, the proposed system effectively classifies the intensively blurred reviews and intelligently computes the transportation and city feature’s polarity. This system can be applied to various information retrieval, text classification, and opinion mining systems, because it has the ability to extract a feature from vague tweets and reviews, extract feature opinion words, and classify these feature opinion words into more degrees of polarity terms.