PREDICTION OF DESIGN ASPECTS OF WEB PAGE BY HTML PARSER OF DESIGN ASPECTS OF WEB PAGE BY HTML PARSER.”

: Inform plays a very important role in life and nowadays, the world largely depends on the World Wide Web to obtain any information. Web comprises of a lot of websites of every discipline, whereas websites consists of web pages which are interlinked with each other with the help of hyperlinks. The success of a website largely depends on the design aspects of the web pages. Researchers have done a lot of work to appraise the web pages quantitatively. Keeping in mind the importance of the design aspects of a web page, this paper aims at the design of an automated evaluation tool which evaluate the aspects for any web page. The tool takes the HTML code of the web page as input, and then it extracts and checks the HTML tags for the uniformity. The tool comprises of normalized modules which quantify the measures of design aspects. For realization, the tool has been applied on four web pages of distinct sites and design aspects have been reported for comparison. The tool will have various advantages for web developers who can predict the design quality of web pages and enhance it before and after implementation of website without user interaction .


Introduction
To stay and progress in modern world, the information is the cut throat need. World Wide Web is the most usable and fastest information source in recent days. Websites are the heavy databases to get the information online as well as to avail a lot of services like entertainment, online shopping, hotel booking, paying bills etc. It becomes impractical to exist and survive without a website for any organization. Moreover, due to the technological changes in hardware as well as software and due to the desire for growth of users, websites are dynamic in nature and demands the changes in interface very frequently. The success of website needs a well-designed interface which generates a positive effect on its users [19]. Any site cannot attain fruitful results from poor technical design. Perfect design quality leads and motivates the customers of a shopping site to proceed for online shopping [22]. To maximize the gains and to increase the utility of their services and products, an effective website design is must. Website design is perfect if its web pages are well organized. So, the preliminary target for the organizations and researchers is to improve perceived design quality [41] as it stimulates the customer's purchase behavior [52]. It is not an easy task to predict the design quality as it depends upon humanexpectations [37] but design aspects of a web page can be quantized and analyzed to improve the predicted quality. For user satisfaction, the main parameters that are used for evaluation are accessibility, cost, usability, speed, security, maintenance and quality etc. [10]. Mostly researchers concentrate on one or two aspects arbitrarily as represented in Table 1.
None of these methods describe how to quantize and normalize the measures for structural aspects of individual web page. This paper presents an automated tool which has been designed by HTML parsing to measure the parameters of structural aspects. The tool has been proposed to evaluate the technical design quality of a webpage. Background study has been embodied in section 1 whereas section 2 explains the proposed structural aspects along with their evaluation tool. The tool implementation has been presented in section 3 with the help of a case study of four web pages of distinct sites. Results are analyzed and proposals to improve the quality has been described in further section.

Literature Review
In the present paper, literature review has been carried on the basis of design quality aspects, the criteria and methods taken for their evaluation and studies performed by various researchers for evaluating websites.

Design Quality
As quality is a multidimensional aspect so its assessment depends upon the purpose of evaluation. For a website developer, quality is the well-designed interface whereas for user, a quality website is the site which provides complete interaction, security, content and usability. Similarly, for an organization a site is perfect if it is fully devoted to achieve its objectives. According to DeMarco, "Quality is the function of a product that changes the world for the better." [15]. ISO 9126 (1991) defines quality for software from user point of view [26]. ISO 13407 (1999) describes that the task of measuring quality is multidisciplinary in nature which involves several human factors and knowledge of various disciplines such as ergonomics, behavioural studies, sociological measurement and working techniques [28]. The final aim is to improve operability by optimizing the efficiency and the productivity as well as by eliminating adverse effects on human. ISO 9241- 11 (1998) and ISO/IEC 25010 (2011) defines new standard ISO 9126-1 which incorporates usability into quality system and redefine quality as "The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use." [27,29].
Effendi and Alfina have defined the web quality from five perspectives viz -website design, usability, service interaction quality, information quality and playfulness/enjoyment [17]. The last four perspectives are from user point of view and are part of operational quality. So, ultimately web quality has two aspects, i.e. design quality and operational quality [38] but actually operability is highly dependent on the design of interface. So, end quality can only be achieved if website provides perfectly designed interface.

Criteria and Methods Used
A large number of design parameters have to be taken by numerous website developers and academicians to evaluate and enhance the design quality of a website [3,6,22,30,[32][33][34][35]41].
They have used different criteria as well as different methods. Some studies show work on websites which belongs to specific domain by taking a fixed set of evaluation criteria e.g. Cebi [6], Schafer et al. [47] and Orehovacki et al. [44] have evaluated the commercial sites, Chiemeke et al. [7], Chmielarz et al. [9] and Kaur et al. [31] have evaluated the e-banking sites, Garcia et al. [21] and Petricek et al. [46] have worked on e-government sites where as Bastida et al. [4], Corigliano et al. [11] and Gavalas & Kenteris [20] have assessed the tourism sites. Law et al. [36] has classified the methods used to evaluate the sites into five types:  Counting: In this method, the number of features, information and services provided to user by website e.g. search engine, sitemap, number of images, number of hyperlinks; multimedia elements etc. are counted by preparing a checklist. The main evaluators are web developers or some automated software.  Automation: Automated software such as web miner or tracker is used to analyse the website pages by using web logs data such as pageviews, clicks and bounce rates. Data analysts are the evaluators which use some statistical techniques for results.  User Judgement: Data has been collected through questionnaires, interviews and then their satisfaction levels are measured on Likert scale by users and web developers.  Numerical Computation: Mainly, mathematical models and computations are used to predict website performance by domain experts and mathematicians.  Combined Methods: It involves the combination of above methods.
All methods involve either one or two categories of persons, first category is of experts i.e. webdesigners, programmers, domain expert users or website owners whereas other category is of novel users who actually operate the website. Only the first method can be implemented before uploading a website where as all others are meant for online websites. Similarly, a lot of criteria are involved for evaluation but one should use which aspects, it depends solely on the motive of evaluation. It has been observed that there is a lot of heterogeneity exists in evaluating design criteria of websites. It solely depends on motive of evaluation which criteria one should used. As, Mich [42] has proposed to enhance the overall quality by reducing gaps of quality in different phases, this paper aims to evaluate the structural quality aspects of a single web page. If primary language used for display is English

Materials and Methods
The adopted research methodology has been divided into two phases which comprises of selection of design quality aspects, preparation of HTML parser and discussion of results to enhance quality of web page.

Design Quality Aspects
After examining the aspects used in previous studies, six aspects have been taken for evaluating design quality which are further parameterized and sub-parameterized into various factors. Two primary aspects have been shown in Table 2a whereas four secondary aspects in Table 2b.

Aesthetics
This is the most important aspect as it concerns with visual appeal of the site. One must have an attractive user interface to make the interest of users. So, this evaluates the quality of images, tables and their resolution, color attributes and displayed text. Color attributes also measure the total number of colors used on a web page. The total colors are compared with the optimum number of colors as site becomes dull with less colors but too many colors create a mesh to user and reduce understandability. One sub-parameter, 'Climitations' also checks the accessibility (an important aspect by W3C) of site to users with various color recognition disabilities. One parameter, 'Refresh' also checks the situation so that content or layout of the site should not change automatically once it is perceived by user till session is over as it will interrupt the user's task.

Ease of Use
It means the user friendly environment. Environment can be user friendly only if site has consistency among elements (layout of elements, font type and size, color effects and mouse effects), efficient navigation facilities and proper annotation i.e. meta-information, Logos, Trademarks etc. Site should also provide site-map, search engine facilities along with important sections like contents, news and reporting section, quick access pages and breadcrumbs. This aspect also checks the facility for providing contact information, company or author information and e-mail address. Last parameter, 'Lang' checks the facility to operate the website in multilanguage so that number of users can be increased.

Multimedia Support
Due to advancements in social media as well as making internet as primary source for advertising, multimedia becomes the primary need for the sites. For its successful implementation, an efficient planning of multimedia design is mandatory. One should include the plugin support with completely defined attributes of video, audio, object and embed tags for quality multimedia support. This aspect also checks that only one media should be displayed on one web page. It also determines whether all images with hyperlinks have defined thumbnails or not.

Throughput
This aspect measures the size and speed characteristics of the site. Size measures the total number of lists, text, buttons, forms, frames, documents, internal as well as external links, pictures and tables. One can use the online tool (http://www.freewebsubmission.com/cgibin/metatag-analyze.cgi) or make the coding in the 'Normaliser' for this job. Each type of elements can be normalized by taking the ratio of their total numbers with all types of total number of elements present on web page. Speed attribute can be only measured for sites which are online, however one can predict the speed of downloading from size and speed of internet connection. One can also use the tool (http://www.websiteoptimization.com/services/analyze) to represent speed characteristics.

Reputation
A site is reputable only if it has standard domain. To check its reputation it is also necessary to take the feedback from its live users and analyze it.
Security: This is the most important aspect of sites those deal with transactions and users private data. Users will do their end tasks only if they can rely on site with full capacity. Reliability can be measured by checking their site addresses which ensure their SSL certification from third parties. Site can also have its own private security and this can be predicted from their URL as it should starts with 'shttp'.

Automated Tool
This step mainly focused on structure of automated tool i.e. the modules which are required for assessing the quality aspects from home page address of website and then evaluate the subparameters as described in Table 2. To depict the different steps for computation of technical design quality aspects, three modules have been integrated together in the designed tool as shown in Figure 1.
Interface: This module has two parts. First part deals with taking the website home address so deals with designing of input interface where as other part is concerned to display the resulted value of various key measures in tabulated form.
Parser: According to Jati & Dominic (2009), automated testing of websites provides an opportunity to researchers as well as it is also a complete challenge. Keeping this point in view, HTML parser has been coded which passes the HTML code to different normaliser modules and receives the parameter values from them. It passes the resulted values for single page to interface.

Normaliser modules
These modules are responsible to evaluate the parameters from information which has been given by parser. They calculate the values of sub-parameters or parameters (when aspect has no sub-parameters) for all web pages. It passes the information to parser. So, this step comprises of coding of those modules which parse the HTML code and computes the final value of each subparameter or parameter if it is not sub-parameterized. Each value is easily normalized between the range '0' to '1', where value near '0' presents poor quality and value near '1' presents very high quality.

Realization of Structure with Sub-Module
Normalized values can be presented with the help of bar charts to show the quality of their aspect. The algorithm for computation of image parameter for web page has been depicted in Figure 1. The algorithms have been coded in VB.net. It also demonstrates the different components of tool that realize the execution of algorithm in its various steps. Similarly, other parameters are derived.

Implementation of Tool
Tool has been exercised on four home pages of distinct sites. First site is http://www.bcetgsp.ac.in/default.php (Web page 1), second site is https://incometaxindiaefiling.gov.in/ (Web page 2), third is http://web.gndu.ac.in/index.aspx (Web page 3) and fourth is https://www.tripadvisor.in/Tourism-g293860-India-Vacations.html (Web page 4). First and third page are of academic institutional websites whereas second and fourth page are from e-government and tourism sites respectively. The evaluated results for these web pages are shown in Table 3. Value towards '0' predicts the poor quality whereas towards '1' shows the high quality of respective sub-parameter or parameter.

Results and Discussions
Aesthetics had very good values for Web page 3, but only 13% images had defined alternate text field. Web page 1 had poor aesthetics as it did not have large image as well as option to display perfectly on all devices which can access it. Only 6.67% images had hyperlinks for Web page 1 whereas 63%, 68% and 48% images had hyperlinks for Web page 2, Web page 3, Web page 4 respectively. Resolution parameter was only present in Web page 3 whereas Color had been evaluated zero in Web page 4.
As far as Ease of Use aspect concerns, all web pages were well consistent as they had defined layout, font, color and mouse over effects. For Navigation, no frame was defined in Web page 1 whereas Web page 2, Web page 3 and Web page 4 had embedded 58%, 93% and 90% frames. Link back to home was absent in Web page 2 and Web page 4. Navigation bar had not been defined for Web page 1. All web pages had well defined meta tag whereas Logo was missing in Web page 1. No trade mark symbol existed in any web page but Web page 3 had included copyright symbol. Ease of Access parameter had very poor value for all web pages as search engine, site map, table of contents and breadcrumbs were completely absent. News and report section was present in first three web pages whereas quick access pages section had been defined in Web page 3 only. For Ease of Interaction, Web page 4 did not provide any information whereas

Conclusions and Recommendations
Enormous amount of diversity exists in studies for evaluation of website as they are domain dependent. Evaluation objective is also another reason which leads to different evaluation criteria. Each site needs popularity among users and it can be achieved only if it is perfectly designed. So, evaluation of design quality aspects becomes mandatory.
As few studies have designed the automated evaluator for measuring structural aspects of a web page, this paper has targeted towards development of an automated HTML parser for web page evaluation. Three types of modules have been embedded in it. Interface which receives the home address and presents the final measures of parameters or sub-parameters; a Parser which parses the HTML code of the web page to determine the measures of parameters and sub-parameters of each page, and Normaliser modules which normalize the each value between '0' and '1'. In future, websites from all domains can be experimented with the proposed tool by embedding the crawler for collecting the addresses of all web pages of website and passing them to HTML parser. Furthermore, with the help of some weighing methods, all parameters and aspects can be evaluated and final index value can be figure out to predict the design quality of website.