CONSTRUCTION OF A SOMATOSENSORY INTERACTIVE SYSTEM BASED ON COMPUTER VISION AND AUGMENTED REALITY TECHNIQUES USING THE KINECT DEVICE FOR FOOD AND AGRICULTURAL EDUCATION

© 2021 The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 1 CONSTRUCTION OF A SOMATOSENSORY INTERACTIVE SYSTEM BASED ON COMPUTER VISION AND AUGMENTED REALITY TECHNIQUES USING THE KINECT DEVICE FOR FOOD AND AGRICULTURAL EDUCATION


INTRODUCTION BACKGROUND AND MOTIVATION
The issue of agricultural development is always one of the main concerns of human beings in the world. The Food and Agriculture Organization (FAO) of the United Nations pointed out that agriculture not only has to provide people with food and clothing, but also needs to maintain sustainable operations while meeting the social objectives of food safety, nutrition supplies, and health promotion [1], [2], [3]. A good way to advocate this concept to the public many interactive devices for different applications. Integrating proper interfaces for users to interact in practical situations and providing them with good system-performance experiences is importance in the field of interactive design [26]. A good interactive design should follow the principles of "3e indicators," namely, "effectiveness," "easiness," and "enjoyment," as suggested by Yeh [24], which are elaborated in the following: effectiveness − corresponding to "functionality," meaning that the work must effectively guide users to complete tasks, solve problems, or achieve goals; easiness − corresponding to "usability," meaning that the design of the work needs to help users reduce memory, physical, visual, and comprehension works, and the use of the work must be easy for the users; and enjoyment − corresponding to "pleasantness," meaning that users must be able to enjoy the process while playing the work, which can be subdivided into four levels: physical, social, psychological, and ideological pleasures.
According to the surveys conducted by Sharma, et al. [50], Preece et al. [51], Jaimes and Sebe [25], Gibbon et al. [52], and Turk [53], the various forms of human-machine interfacing are summarized in this study as shown in Tables 1 and 2 for the input and output parts of the interface.  The user feels a special touch.

• Smell
The fragrance machine creates the smell.

COMPUTER VISION FOR HUMAN-MACHINE INTERACTION
Computer vision technology, when applied to human-machine interaction, is mainly to locate objects in images [27]. Jaimes and Sebe [25] divided this process in images by computer vision into four stages: motion segmentation, object classification, tracking, and interpretation, for which various techniques have been proposed [28], [29], [30].
In addition to requiring low-cost hardware, computer vision technology has the advantage of supporting human-machine interactive activities in wide-range fields [54]; the uses of matching appropriate scenarios and using computer vision techniques to implement perceptual human-machine interfaces can allow users to break hardware limitations, explore freely in the environment, and create natural experiences [11], [55]. Therefore, computer vision technology is often employed by interactive designers as a powerful design tool; participants using this technology can realize abstract interaction between the intangible and the tangible [56].
Based on Crowley, et al. [57] and Turk and Kölsch [58], the various computer vision techniques related to human beings' sensory capabilities for human-machine interaction are summarized in this study to be as shown in Table 3. From the table, it can be found that the aspects of interactive interfaces implemented by the computer vision techniques can be divided into two types, namely, object and human.
In this study, the "human" is adopted as a unit of the interface, for which the following capabilities with their details listed in Table 3 can be carried out to realize the function of human-machine interaction: 1) determination of the existence and position (of the human); 2) decision of the body posture (of the human); and 3) detection and recognition of the hand movement (of the human).
As to the case of adopting the "object" as the unit of the interface based on the tangible interfacing concept [31], [32] the following capabilities can be carried out: 1) determination of the existence and position; and 2) recognition of information. On the other hand, the use of computer vision techniques with the human as the interface can achieve interesting and free interactions, so that users are no longer limited to the use of physical button interfaces, and can freely use their body postures to explore during the experiencing process.
Computer vision technology with the "human" as the interface is based on the sensing of the user's body posture to generate various interaction activities. Jaimes and Sebe [12] proposed the concept of "human-centered vision," which suggests the combination of computer vision techniques with image depth sensing and other technologies to track the user's body posture by using 1) the contour of the grasping posture; 2) the appearance of the grasping posture features such as skin color, face, etc.; and 3) the real-time body model generated with components such as cylinders and spheres as illustrated by Figure 1 [67]. By analyzing the shape, contour, and movement of the body, or analyzing the user's body structure model, computer vision can be utilized to capture and analyze the human posture for various applications, like person recognition, movement analysis, etc. [65], [66].
Wang and Wu [33] derived four possible manifestation forms of computer vision with the human or object as the interface for computer vision-based interaction situations as shown in Table 4, where the red dotted arrows in each figure represent the image-taking directions of the cameras. In this study, an interactive game for food and agricultural education is proposed with the human as the interface, i.e., the interaction of the game is based on the human-machine interfacing realized by computer vision techniques.  Interactive space • Hung on the ceiling, right above the user • Hung on the wall, slightly tilted down to face the user As a composite application of the interactive wall and the interactive floor, sensing the presence, location, overall outline or movement of the user based on the location of the camera.

INTERACTIVE EXPERIENCE AND EXISTING WORKS WITH THE "HUMAN" AS THE INTERFACE
In human-machine interaction activities, the interactive experience is based on the user's cognition, emotion, and feeling, with the goal to promote the user's thinking, action, and pleasance [34]. Pine and Gilmore [35] think that a "pleasant experience" should cover four states, namely, esthetic, entertainment, educational, and escapist. With the popularization of technology, it has become a trend for museums to carry out educational exhibitions that offer interactive experiences. Mitchell et al. [36] suggested that if museum education in the 21st century could integrate educators, professional knowledge, and digital media tools to form cross-field cooperation, it would be able to effectively attract more crowds to the exhibition. By simulating the situation of the teaching theme and introducing interactive experiences, exhibition education can effectively convey knowledge and enhance the interactivity between visitors and exhibitions.
In recent years, education aided by digital media technology with the "human" as the interface to provide interactive experiences has also be implemented for use in various fields [59], including several works in the form of interactive walls. Some examples are introduced in the following.
"TABEGAMI SAMA" (Figure 2) [60] The projected interactive-wall work "TABEGAMI SAMA" (Eating God in Japanese) [60] can be explored in a darkened, immersive space to see Japanese-style food ingredients that grow up during the four seasons. By stacking rice grains into a mountain, and setting up a camera on the top to detect the rhythm and position of the participant's hand turning the rice grains, dynamic particle effects and contour lines can be generated and projected in real-time to integrate the experience of vision, touch and smell which brings the people close to rice agriculture.
"A Nong's Fantastic Adventure" (Figure 3) [61] "A Nong's Fantastic Adventure" is an interactive wall of agricultural experience. Through the use of computer vision and image synthesis technology, participants can experience the fun of planting rice and farmers' hard farming works. In this work, Kinect is placed in front of the wall, which can sense the positions of multiple participants' faces and feet, and synthesize the their appearances wearing hats and rain boots on the front projection wall. The participants only need to simulate planting seedlings within the scope of the interactive wall, and the seedlings will be planted in front of the participants in the image shown on the wall screen.  "Fire & Ice" is a public interactive art installation with its interactive wall made up of eight screens with two opposing elements, "fire" and "ice," allowing two participants to act as the characters of fire and ice, respectively. Two Kinects were set up on the wall, facing the body posture of the two participants at a slight oblique angle above; and computer vision techniques are used to track the participants' skeleton nodes, and analyze their gestures and movements. When the participants stand in front of the interactive wall and begin to condense "magical energy," if they wave their arms at the other party, condensed "magical special effects" will be launched towards the other party, forming a visual experience of ice and fire.

"University of Dayton Interactive Wall" (Figure 5) [63]
"University of Dayton Interactive Wall" shows what kind of school life a student can have when he/she enters the University of Dayton. Due to the large horizontal range of this work, a total of 4 Kinects were set up on the ceiling to widen the sensing range of the cameras, allowing the cameras to perceive whether there are pedestrians passing by. If not, the interactive wall will show the wave effect of square bricks; when pedestrians enter the sensing range of the wall, the square bricks will produce a peeling effect of the same size as the upper-view contour of the participant in front of the participant's position. After the tiles are peeled off, videos related to school life will be displayed. "NikeFuel Station" is an interactive experience wall built in the NIKE store on which hangs a Kinect facing the participant to sense his/her movement posture. When the participant stands in front of the interactive wall, a 3D body contour consisting of particles is generated in real-time, and a video recording function can be turned on by touching the virtual button with the body contour. When the participant continues to move his/her body, the 3D outline will gradually change from red to green, conveying the idea that exercise can lead to a healthy life. After the experiencing process is over, the participant can obtain the videos recorded on their mobile phones.
(a) (b) Figure 6: "NikeFuel Station" [64]. (a) Generating in real-time a 3D body contour consisting of particles. (b) When the participant continues to move his/her body, the 3D outline will gradually change from red to green.

DEVICES FOR COMPUTER VISION APPLICATIONS
Computer vision technology mainly uses cameras as sensors. With the increasing demand for human-computer interaction in recent years, cameras with the additional properties of measuring depth data have been widely developed. In this study, three types of cameras used by computer vision applications are identified with their types and functions shown in Table 5. The cameras used in general computer video conferences which can capture color images.
Infrared filter (IR-Pass) camera By adding an infrared filter (IR-pass filter) between the photosensitive element of the webcam and the lens to filter the visible light, the camera can only see infrared light, simplifying the image captured by the camera.

Motion sensing camera
With the 2nd-generation Kinect as an example, in addition to the built-in color camera, this type of camera also has an infrared emitter and a depth sensor, which can obtain the depth information in the taken image.

BRIEF DESCRIPTION OF THE PROPOSED SYSTEM
The above literature survey provides reviews of various concepts and case studies about food and agricultural education, human-machine interaction, and computer vison technology. Accordingly, relevant principles for designing a system for food and agricultural education were derived. The principles were followed to construct a real system on which a game can be played to learn knowledge about some fruit harvesting processes in a manner of high-freedom human-machine interaction. The system architecture is implemented by computer vision and augmented reality techniques utilizing an interactive wall with the 'human" adopted as the interface unit, as described in Table 6.

METHODS
The methods used in this study are introduced here, including the prototyping, observation, and interview methods.

THE PROTOTYPING METHOD
Prototyping is a method for quick and low-cost evaluation of a system before it is formally constructed [42]. According to the prototyping process proposed by Naumann and Jenkins [43] and Eliason [44], an interactive prototype system for conducting food and agricultural education was constructed in this study which includes the following six major steps: 1) conducting a literature survey about related theories and existing systems; 2) deriving principles accordingly and follow them to design a prototype; 3) carrying out relevant experiments using the prototype; 4) evaluating the effectiveness of the prototype according to the users' opinions; 5) improving the prototype to be a formal system; and 6) exhibiting the system in a public space for further testing.
Construction of a Somatosensory Interactive System Based on Computer Vision and Augmented Reality Techniques Using the Kinect Device for Food and Agricultural Education

THE OBSERVATION METHOD
The observation method is useful for qualitative analysis of the data collected from subtle observations of the users' performances from the perspectives of onlookers [45]. In this study, this method was adopted to collect and analyze the data collected from observations of the participants' performances of the proposed system from two aspects, "operation situation of human-machine interface" and "participant's behavior," with the detailed observation items listed in Table 7. The observation results were used for further improvement on the proposed system resulting from the prototyping method. The details will be presented later in this paper. 2) The feedback given by the participants in the operation of this system. 3) The participants' reaction when interacting with this system. 4) The participants' additional actions in operating this system.

THE INTERVIEW METHOD
In the interview method, invited persons are asked questions about the theme of the survey to collect objective facts from their answers [46]. This method was used in this study in two ways, namely, interviews with experts and interviews with users.

INTERVIEW WITH EXPERTS AS THE INTERVIEWEES
In this study, experts in related fields were invited to conduct in-depth interviews both before and after the users' experiencing activities using the proposed system. The first expert interview was conducted during the design stage of the proposed system (i.e., after the prototype was constructed and before the final system was completed). Three experts were interviewed to collect their comments about the contents for the theme of food and agricultural education and about the design for the interactive experiencing process, based on which the prototype system is improved. The second expert interview was conducted during the analysis and evaluation stage of the proposed system after the users conducted their performances of the system. Four experts were interviewed to collect their opinions on the usability, experiencing process, and education content of the proposed system.

RESULT OF THE FIRST EXPERT INTERVIEW
The result of the first expert interview is presented here, leaving that of the second interview to be described later in this paper. As shown in Table 8, the three experts invited for the first interview include an elementary school teacher, a founder of an enterprise, and a CEO of a design company. More information about their backgrounds and expertises can be found in the table as well. Th experts were interviewed from three aspects of food and agricultural education: 1) the teaching content; 2) the experiencing activities; and 3) the introduction of computer vision and augmented reality technology. The opinions of the experts collected in the interview are listed in Table 9, from which the following conclusions can be drawn for prototype improvement: 1) the knowledge of food and agricultural education should be meaningful to the general public; 2) compared with textbook-style learning, the experience of food and agricultural education should add more than one sensory experience; 3) the "hands-on" sense of participation should be higher to effectively arouse the learners' interest; and 4) livelier animation may be used to enhance the interactive context for people of any age to watch.
According to the above summary of the expert's suggestions, special animation and sound effects were added into the prototype system to simulate the situations related to food and agriculture education, or specifically, related to fruit picking; and the Kinect device was adopted with the human as the interface to implement a gaming process with human-machine interaction, so as to make the interactive experience more vivid and to increase the degree of enjoyment of participants. What is the importance of learning in food and agricultural education?
• P1: food and agricultural education allows the public to understand the relationship between "food" and "agriculture," and the importance of the land where food is produced. • P1 & P2: the concept of food and agricultural education is not just a slogan, but should allow the public to understand the hardships of farmers to produce food. How to arouse people's interest in the learning process?
• P1: the hands-on experiencing session can arouse interest in all age groups because there is a sense of actual participation. • P1: introduction of information technology, using mobile phones, tablets or computers to assist teaching, is also a way to arouse interest. Experiencing activity What is the learning effect of the experiencing activity?
• P2: the experience of food and agricultural education should help the user learn things that cannot be learned in textbooks, and enhance his/her autonomy, life skills, and expression capabilies. How to design the experiencing process?
• P2: more than one sensory experience can be added to the experience of food and agricultural education, and attention should be paid to the safety of the experiencing process.

Introduction of computer vision and augmented reality technology
What do you think about using digital media to simulate the experiencing process and adding interactivity?
• P1: this approach is at the forefront of the times and can achieve the effect of information education. • P2: The focus of the introducing interactive experiences is to interact with the user by game playing; it is necessary to consider what knowledge the user can obtain after playing.
Construction of a Somatosensory Interactive System Based on Computer Vision and Augmented Reality Techniques Using the Kinect Device for Food and Agricultural Education What details should be paid attention to when developing the interactive experiencing process by computer vision technology?
• P3: in the design of exhibition venues and installations, it is necessary to eliminate interference from light sources as much as possible and maintain the stability of the environment. For the current design of the system, are there any suggestions or what can be improved?
• P1: the system must show the user's game results, and add proper contents that allow the user to understand and reflect. • P2: animation can be used to present the interactive context which will be more lively for any age group to watch. • P3: it is possible to design the interactive experiencing process by use of things that modern people like.

RESULT OF THE SECOND EXPERT INTERVIEW
The four experts invited to accept the second interview are listed in Table 10, including an elementary school teacher and three university professors. They were asked questions of three aspects, namely, 1) man-machine interface operation; 2) experience content of the system; and 3) views on interactive experience of food and agricultural education, as listed in Table 11. The opinions of the four experts will be shown later in this paper when they, together with the participating users' comments obtained from a user interview, are used to evaluate the effectiveness of the proposed system.  The list of the questions of three aspects asked in the second expert interview. Aspect Question Man-machine interface operation 1) What is your opinion on the interface of the proposed system? 2) Do you think the whole experience process of this system is smooth? Experience content of the system 1) What is your opinion on the exhibition layout and digital content design of the system? 2) Do you think the system has an entertaining effect? 3) What do you think of the educational nature of the system on agricultural knowledge? View on interactive experience of food and agricultural education 1) What is your opinion on the application of "the experiencing form via the somatosensory interactive wall" of the system to food and agricultural education?

INTERVIEW WITH USERS AS THE INTERVIEWEES
During the exhibition of the proposed system to the public, 50 users of the system were randomly selected for interviews, aiming at collecting the their comments for verifying the effectiveness of the proposed system. The "3e indicators" proposed by Yeh [24], namely, "effectiveness," "easiness," and "enjoyment," as well as the "four states of pleasant experience" proposed by Pine and Gilmore [35], namely, "esthetic," "entertainment," "educational," and "escapist" were adopted to design the questions asked in the interview process, resulting a set of questions of three aspects: "operation situation of human-machine interface," "operation experience," and "views on interactive experience of food and agricultural education," where 1) the first aspect of "operation situation of human-machine interface" comes from the two indicators of "easiness" and "effectiveness"; 2) the second aspect of "operation experience" comes from the "enjoyment" indicator and the "esthetic" and "educational" states; and 3) the last aspect of "views on interactive experience of food and agricultural education" is aimed at covering the remaining "entertainment" and "escapist" states.
The questions so designed are listed in Table 12 while the comments collected from the users accepting the interview will be presented later in this paper. Table 12: List of the questions of three aspects asked in the interviews with the users.
Aspect Question operation situation of humanmachine interface 1) What do you think about the use of body movements to interact? 2) What is your opinion on the operation interface of this system? operation experience 1) What is your opinion on the layout of the exhibition or the design of the digital content of this system? 2) What do you think of the educational nature of this system on agricultural knowledge? 3) What are your thoughts and feelings while playing this work? views on interactive experience of food and agricultural education 1) What is your view on applying the experience form of somatosensory interaction to food and agricultural education?

RESULTS
The details about the construction of the proposed system are described in this section, including the design idea, architecture, hardware, software, and game-play process.

IDEA FOR DESIGNING THE PROPOSED SYSTEM
In the countryside, fruit picking in most sightseeing farms allows the public to participate in related agricultural activities, providing a good way of food and agricultural education. During such a kind of agricultural production experiencing process, in addition to being able to understand the agricultural process of fruit planting and harvesting, it can also leave a good memory of farmers' daily tasks.
However, the traditional fruit picking activity is limited by the plant growth time and the available farm space, and so cannot be experienced by too many people at all time. Therefore, in this study, it is desired to simulate the situation of harvesting fruits in the farmland by a somatosensory interactive game-play method, hoping to break the limitations of time and space. Specifically, the activities of picking three kinds of fruit, namely, banana, orange, and cantaloupe, is implemented via computer vision technology in this study to offer an interesting interactive experiencing process, followed by the action of taking an augmented-reality photo, so as to keep the happy time in memory as well as to shorten the distance between the agricultural industry and the public.

THE USE OF THE INTERACTIVE KINECT DEVICE AND THE DESIGN OF THE EXPERIENCING PROCESS
The design of the proposed system with an interactive game called "Fruit Picking Fun," as illustrated by Figure  7, is based on the uses of the second-generation somatosensory camera Kinect as the sensor and a series of related computer programs written in this study, allowing the participant to use various body postures to play the game, by which the participant can experience the food and agricultural education of simulated fruit-picking activities in farmlands as mentioned previously. Besides the Kinect device, the system also includes an LED display screen and a loud speaker, both connected to a computer for performing the tasks of image taking, computer vision processing, graphic displays, data transmission, message announcing, augmented-reality photo taking and downloading, etc.

DESIGN OF THE FRUIT-PICKING EXPERIENCING PROCESS
The game of the proposed system implemented in this study for interactive experiencing of fruit-picking activities is played in the following way.

Stage 1: initialization and AR-based dressing up
At the beginning, the system displays an initial screen, with a message of inviting the participant to select, by a grabbing gesture (extending a hand to grab a fruit in front the Kinect), one of the above-mentioned three kinds of fruit that they want to "harvest," as shown in Figure 8(a). Then, the system displays a farm scene corresponding to the selected fruit with the participant appearing in the middle of a group of trees of the selected fruit, and a hat-like object (a fruit-tree leaf, a decorated hat, or a colored cap) is generated to appear over the participant's head to dress him/her up to be like a farmer, like the example shown in Figure 8(b). The artificial object is fixed on the participant head even when he/she is moving around, and this augmented reality (AR) effect of dressing up the participant is realized by a face detection and tracking program written in this study.

Stage 2: fruit harvesting
The participant starts to harvest the selected fruit in this stage by making certain body actions and hand operations, as illustrated by the example shown in Figure 8 Harvesting is judged to be successful after the body action is carried out for a certain number of times. Each type of body action is displayed according to certain body models and hand gestures with specific parameters, whose images can be recognized and tracked by the Kinect device using some written programs that carry out the operations of body-model matching and tracking and hand-gesture recognition.

Stage 3: AR photo taking and downloading
After the harvest is successful, the participant can take a digital photo of him/herself with the harvested fruit held by the two hands as shown by the example of Figure 8(d). This photo is generated by augmented reality techniques again and is kept in the computer. Furthermore, the photo may be downloaded to the user's mobile phone by scanning a QR code appearing on the display screen using the phone, as illustrated in Figure 8(e). This photodownload operation is carried out by a commercially-available QR code identification program.

Stage 4: game restarting
The user shows a "T-shaped pose" to trigger the system to go back to the initial screen, meaning that the game is started over again. Recognition of such a posture is carried out also by the body-model matching and tracking programs mentioned previously for Stage 2.

ARCHITECTURE OF THE PROPOSED SYSTEM
As shown in Figure 9, with the Kinect device used to capture the participant's motion images, the game engine Unity3D is utilized in this study for multimedia integration and development for the game "Fruit Picking Fun" played on the proposed system, and the Microsoft Kinect for Windows SDK (hereinafter referred to as the Kinect SDK) is used to develop the computer programs used in the proposed system.  The second-generation Kinect as shown in the last row of Table 5 is used as a somatosensory camera, which is composed of a depth sensor, an RGB camera, and a microphone array with four units of microphones. By the sensors of the Kinect, color, 3D depth, and infrared images, as well as sound information of the target object, can be acquired.
Furthermore, via written programs of the Kinect SDK associated with the Kinect device, the functions of human body tracking and body skeleton identification can be implemented by use of the three-dimensional coordinates of up to 2525 joint points of the human body and fingers obtained from the acquired images. These functions can be used to implement the tasks of face detection and tracking, hand-gesture recognition, and body-model matching and tracking needed to implement the previously-mentioned four stages of actions involved in the fruit-picking activity.
After the participant has taken a digital photo after the experiencing process, the system will immediately transmit the photo to the cloud server via a wifi channel, and generate a QR code which includes a photo-download link. The participant can download the photo into his/her mobile phone by scanning the QR code with the phone.

IMPLEMENTED COMPUTER VISION TECHNIQUES WITH THE HUMAN AS THE INTERFACE
In this study, the Kinect device (shown in the third row of Table 5) and the Kinect SDK are used to implement the computer vision techniques used in the proposed system for playing the game of "Fruit Picking Fun" with the "human" as the interface. The Kinect device consists of a depth sensor, an RGB color camera, and a four-unit microphone array. These sensors can be used to obtain color images, 3D depth images, infrared images, and audio information. Combined with computer codes written in the Kinect SDK language, the Kinect device can be used to recognize and track many features of the human body and hands appearing in the color image taken by the Kinect. Specifically, it can extract up to 25 joint points of the human body and fingers and their three-dimensional coordinates.
In this study, the Kinect device and its related functions are used to implement the previously-mentioned functions of "face detection and tracking," "hand-gesture recognition," and "body-model matching and tracking" for implementing the experiencing process of "Fruit Picking Fun." The details of the implementation results are described in the following.

FACE DETECTION AND TRACKING FOR AFFIXING A HAT-LIKE OBJECT OVER THE USER'S HEAD
In this study, the color and depth images acquired with the Kinect device itself are used to detect and track the participant's face so as to implement the desired AR function of imposing a hat-like object over the participant's head, as illustrated by the example of the banana leaf appearing in Figures 8(b)~8(d).
A program named F1 was written in this study to realize this AR function which is based on using the Kinect device to conduct the detection of the bone joint points of the head, followed by the tracking of them via the use of the location coordinates and rotation values of these points. As an example, the rectangular frame drawn in Figure  10 is the human face detected and tracked by this program F1. A more detailed pseudo-code algorithm describing this program is shown in Table 13. In the table, the italic characters like detect are used to specify the action conducted by the Kinect device, and the bold characters like repeat or while are commands used by the pseudocodes.  Steps.
Step 1: repeat detect the user's face in image i c by the Kinect using the images i c and i d until detected.
Step 2: track the user's face and compute the 3D position of the user's face as p n .
Step 3: if v off ≠ 0 then add (0, v off , 0) to p n ; end if.
Step 4: exit with p n as the new 3D position of the hat-like object h o over the user's head.

BODY-MODEL MATCHING AND TRACKING
Some procedures of the Kinect SDK are used in this study to track the user's body joint points and obtain the coordinate information of each joint point for use in the interactive activities of playing "Fruit Picking Fun" on the proposed system Only the posture of the user's upper body is needed in this study, from which 10 joint points are detected and tracked as shown in Figure 11  The user's body joint points detected and tracked in this study using Kinect SDK procedures for playing the game "Fruit Picking Fun" on the proposed system.

HAND-GESTURE RECOGNITION
By use of the Kinect SDK procedures, the human-body joint points can be detected using the Kinect device, and the 3D coordinates of these points can be acquired. A function of hand-gesture recognition has been implemented in this study as an algorithm using the relative 3D positions between these joint points computed in terms of the acquired 3D coordinates of these points. The algorithm is named G0 as described in Table 14, in which the set of parameters of the human-body joint points detected from the input images acquired by the Kinect device is matched sequentially with the conditions of the various hand gestures listed in Table 15. The output of the algorithm is a hand-gesture event expressed currently by the user for use in other algorithms implemented in this study for the game "Fruit Picking Fun." It is noted by the way that the hand-gesture procedures G1 through G7 in Table 15 are originally built by use of the Kinect SDK for the Kinect device but with their conditional parameters modified to fit the applications of this study, while G8 through G10 are new hand-gesture procedures created in this study.
The above-mentioned hand-gesture recognition procedures, or simply hand-gesture events, are used in the interactions between the game process control and the fruit picking scenes. By tracking the coordinate values of the body joints, the effect of touching objects can also be achieved.  Table 15. Output: the gesture event g e of the user. Steps.
Step 1: detect the user's body joint points by the Kinect device to form a set p b .
Step 2: repeat match the parameters of p b with conditions G1 through G10; until a condition G i is matched.
Step 3: exit with the procedure corresponding to G i in the 2 nd column of Table  15 as the desired gesture event g e .

G4
Raising two hands over shoulders (Gestures. Psi) (1) The right hand is more than 10 cm above the spine.
(2) The left hand is more than 10 cm above the spine.
Chao-Ming Wang, and Yu-Hui Lin Raising up either hand and swiping (Gestures. Swipe Up) Stage 1 − either of the following two cases: (1) the right hand is lower than the left elbow for less than 15 cm; (2) the left hand is lower than the right elbow for less than 15 cm. Stage 2 (conducted within 1.5 sec.) − either of the following two cases: (1) the right hand is higher than the left shoulder by more than 5 cm, and the parallel movement of the right hand does not exceed 10 cm; (2) the left hand is higher than the right shoulder by more than 5 cm, and the parallel movement of the left hand does not exceed 10 cm.

G6
Raising up the right hand (Gestures. Raise Right Hand) (1) The right hand is higher than the left shoulder by more than 10 cm.
(2) The left hand is lower than the left shoulder.

Raising up the left hand (Gestures. Raise Left
Hand) (1) The left hand is higher than the right shoulder by more than 10 cm.
(2) The right hand is lower than the right shoulder.

Holding objects by two hands (Gestures. Raise Half
Hands) (1) The distance between the right/left hand and the right/left elbow is 15 to 80 cm.
(2) The parallel movement of the right/left hand should not exceed 15 cm from the right/left shoulder.
(3) The height of the right/left elbow from the right/left hand should not exceed 15 cm. (4) The right/left hand is higher than the hip.

G9
Raising the right hand flat (Gestures .Hold Right Hand Up) (1) The height of the right shoulder should not exceed 10 cm from the right hand.
(2) The height of the right shoulder should not exceed 10 cm from the right elbow.
(3) The distance seen from the front between the right hand and the right shoulder should not exceed 15 cm. (4) The parallel movement of the right hand is not more than 20 cm away from the right shoulder.
The parallel movement of the right elbow should not exceed 20 cm from the right shoulder. (6) The distance seen from the front between the right hand and the right shoulder should not exceed 20 cm.

INTERACTION PROCESS AND PROCESS CONTROL ALGORITHMS
The interactive process of the game "Fruit Picking Fun" is divided into 11 parts, including three farm scenarios of the fruits of banana, orange and cantaloupe. The interactive scenario process is shown in Table 16. Each step in  the table can be implemented by the algorithms in Table 13, Table 17, and Table 18, where Table 13 has been  presented before for user face recognition and tracking, and Tables 17 and 18 are presented subsequently with the former including the algorithms for flow control of the game "Fruit Picking Fun," and the latter including the algorithms for interactions with the farm scenes. Selecting the fruit in the intial screen Initial screen: a farm scene with three types of fruit appearing on the top and a basket shown below.
Messages: inviting the user to "extend a hand to grab a fruit" and put it into the basket to enter the experiencing farm scene of the selected fruit. S1 S2

2-A Experiencing a banana farm
Screen: banana trees appear around, and a banana leaf appears on the user's head for use as a hat.
Messages: inviting the user to "wave his/her left and right hands," respectively, to chop off the bananas.

2-B Experiencing an orange farm
Screen: orange trees appear around, and a hat decorated with orange leaves appears on the user's head.
Message: inviting the user to "flick his/her arms up to touch the oranges to harvest them." Screen: cantaloupe scaffoldings appear around, and a cantaloupe-colored cap appears on the user's head.
Message: inviting the user to "wave hands to cut the cantaloupes to let them bounce for a fixed number of times to harvest them."

3-A Harvesting bananas
Action: the user waves the right or left hand with a knife to cut off the banana bunch on the tree.
How to pass this level: the harvest is successful when each of the left and right banana bunches is cut three times and falls.

3-B Harvesting oranges
Action: the user flicks his/her arms up to touch the oranges on the tree like shaking them off the tree.
How to pass this level: the harvest is successful when all the oranges have been shaken off.

3-C Harvesting cantaloupes
Action: the user touches the cantaloupes to pick them from the vine, and makes the falling cantaloupes bounce.
How to pass this level: the harvest is successful when the cantaloupes are picked off and touched to create more than 20 bounces.

S5 4
Returning to the initial screen Action: the user shows a "Tshaped pose" with both hands held to the sides for a period of time.  Retaking or confirming in the photo taking process Screen: a photo preview appears in the middle.

Result
Messages: inviting the user to select one of the two choices, "retake a photo by raising the left hand flat" or "OK to confirm by raising the right hand flat."

C3 C4
8 Transmitting the photo to the cloud server Screen: a countryside with mountains far away.
Message: "The photo is being uploaded." None 9 Inviting to download the photo Screen: the photo and a QR code appear.
Messages: inviting the user to "scan the QR code to download the photo" using a mobile phone, and then "adopt a Tpose" to return to the initial screen (see Step 11).

None 10
Downloading the photo Action: the user scans the QR code on the screen by a mobile phone to download the photo into the phone.
Result: the system downloads the photo to the user's mobile phone.

None 11
Going back to the initial screen Action: either the user shows a "T-shaped pose" or no human motion is detected for 10 seconds.
Result: the system returns to the initial screen.

C1 S7
Chao-Ming Wang, and Yu-Hui Lin (2) the procedures described in Table 15. Output: the instruction for returning to the initial screen. Steps.
Step 1: repeat detect the T-shaped pose of the user's body by the Kinect device using the procedure G3: Gestures. Tpose as well as the input images i c and i d ; until detected successfully for three times consecutively.
Step 2: exit with the instruction for returning to the initial screen.
C2: triggering photo taking − maintaining holding objects in hand to trigger photo taking

Input:
(1) color image i c and depth image i d acquired by the Kinect device; (2) the procedures described in Table 15. Output: the instruction for photo taking. Steps.
Step 1: repeat detect the user's posture of holding objects by two hands by the Kinect device using the procedure G8: Gestures.RaiseHalfHands as well as the input images i c and i d ; until detected successfully for three times consecutively.
Step 2: exit with the instruction for photo taking.
C3: triggering photo downloading − maintaining raising up the right hand or keeping it flat afterwards to trigger photo downloading

Input:
(1) color image i c and depth image i d acquired by the Kinect device; (2) the procedures described in Table 15. Output: the instruction for photo downloading. Steps.
Step 1: repeat detect the user's posture of raising up the right hand or keeping it flat afterwards by the Kinect device using the procedure G6: Gestures.RaiseRightHand or the procedure G9: Gestures.HoldRightHandUp, respectively, as well as the input images i c and i d ; until detected successfully for three times consecutively.
Step 2: exit with the instruction for photo downloading.
C4: triggering photo re-taking − maintaining raising up the left hand or keeping it flat afterwards to trigger photo re-taking

Input:
(1) color image i c and depth image i d acquired by the Kinect device; (2) the procedures described in Table 15. Output: the instruction for photo re-taking. Steps.
Step 1: repeat detect the user's posture of raising up the left hand or keeping it flat afterwards by the Kinect device using the procedure G7: Gestures.RaiseLeftHand or the procedure G10: Gestures.HoldLeftHandUp, respectively, as well as the input images i c and i d ; until detected successfully for three times consecutively.
Step 2: exit with the instruction for photo downloading. (2) the procedures described in Table 15; (3) fruit objects O 1 , O 2 , and O 3 of the three kinds of fruit and their original positions P 1 , P 2 , and P 3 , respectively. Output: a new position P i ' of a selected object O i . Steps.
Step 1: repeat detect the joint point of the "left (or right) hand" (J9 or J10) as shown in Figure 11 by the Kinect device using the input images i c and i d ; until detected.
Step (2) the procedures described in Table 15; (3) a fruit object O i selected by S1 above and its position P i ' computed by S1; (4) the basket object B. Output: the farm scene of the selected fruit object O i or going back to the initial screen. Steps.
Step 1: repeat detect the user's posture of opening his/her palm with the five fingers spread out by the Kinect device using the procedure G2: HandEventType. Release as well as the input images i c and i d ; until detected.
Step 2: if the position P i ' of fruit object O i is not close to the basket object B within a preset tolerable range then reset the position P i ' of fruit object O i to its original value P i and goto S1 in this Step 3: if BB right is not cut off then perform the following operations: 3.1 repeat detect the joint point of the right hand (J10) by the Kinect device using the input images of i c and i d ; until detected; 3.2 repeat track the joint point P of the right hand (J10); until P is found to be close to BB right for three times; 3.3 show the animation A right of cutting off the right banana bunch BB right and goto Step 1; end if. Input: (1) color image i c and depth image i d acquired by the Kinect device; (2) the procedures described in Table 15; Steps.
Step 2: if all the oranges in SO have been shaken off then exit; Step 3: repeat detect all the user's 10 body joint points by the Kinect device using the input images of i c and i d ; until detected.
Step 4: repeat track the 10 joint points and detect the following user's postures by the Kinect device using the joint points and the procedures listed in Table 15: Steps.
Step 1: if both sets of cantaloupes CT left and CT right have been picked then goto Step 4; end if.

Chao-Ming Wang, and Yu-Hui Lin
Step Steps.
Step 1: repeat detect the joint points of both the left and right hands (J9 and J10) by the Kinect device using the input images of i c and i d ; until detected.
Step 2: repeat (a) track the the joint points P left and P right of the left and right hands (J9 and J10), respectively; (b) case based on the type of selected fruit: case "banana": (a) dispaly the graphic G left of the left banana bunch over the joint point P left ; (b) display the graphic G right of the right banana bunch over the joint point P right ; case "orange": display the graphic G p of the orange pile over the middle point of P left and P right ; case "cantaloupe": display the graphic G c of the cantaloupe group over the middle point of P left and P right ; end case; until a 5-second countdown is ended.
S7: returning to the initial screen − when no user is detected for more than 10 seconds.

Input:
(1) color image i c and depth image i d acquired by the Kinect device while the screen display shows the QR code; (2) a time counter T in unit of second. Output: display of the initial screen or doing nothing. Steps.
Step 2: while no human motion is detected and T < 10 do (a) wait for a second and set T = T + 1; (b) detect all the user's body joint points J 1 through J 10 by the Kinect device using the input images i c and i d ; (c) if none of J 1 through J 10 is detected then decide "no human moiton is detected"; else exit; end if; end while.
Step 3: display the initial screen and exit.

PUBLIC EXHIBITION
The system was exhibited in a public space in a university for 10 days and visitors were invited to be participants in the study. The exhibition space is large enough to allow participants to interact with the system using body postures and conduct photo taking. Some pictures taken in the exhibition space are shown in Figure 12.

ANALYSIS
The proposed system was exhibited publicly in the design museum of a national university in May, 2020 for 10 days. People from the public older than the age of 18 and four experts were invited to experience the proposed system. Each participant's experiencing activity lasts about 25 minutes, including 5 minutes to introduce the system to the participant, 10 minutes for carrying out the game-play process, and 10 minutes for an interview with the user. During the game-playing process, each user's interaction was observed and recorded. Afterwards, the four experts and 50 randomly-selected users were invited further for interviews with their comments collected.

ANALYSIS OF OBSERVED USERS' BEHAVIORS FOR EVALUATING THE EXPERIENCING PROCESS
The actions of the participants who performed the proposed system were observed during the public exhibition period by video recording as well as by pens and paper. The observations were directed to two aspects: "operation situation of human-machine interface" and "participant's behavior." The results are listed in Table 19 and summarized as follows.
1) The interaction process of "Fruit Picking Fun" can attract participants to come and experience; and the participants found it interesting and wanted to play different farm scenes again and again.
2) The experience of taking and downloading photos is attractive to the participants, who took the initiative to pick up their mobile phones and scan the QR code.
3) The participants' attention was on the display screen, so posture prompts and process guidance shown on the screen need be strengthened. 4) The height and angle of the Kinect device should take into account the average height of most participants; and it is necessary to avoid placing objects too close to the side to make the experience smoother. 5) It is necessary to eliminate the interference of other onlookers to make the sensing of the Kinect device more stable. 3) The location of the cantaloupe object is relatively nearby, and the participants cannot smoothly experience the harvest. 4) When the participant grabs the fruit and puts it in the basket in the initial screen, due to the illumination angle of the Kinect device, it is less sensitive to detect the hand gestures of grip and opening. 5) If the participants wear masks, it will affect the accuracy of face detection and body skeleton detection. Participant's behavior Whether the participants are interested in the system 1) The participants are interested in the exhibition layout and music of the proposed system and take the initiative to visit it. 2) The participants look forward to the context and interaction of the three kinds of fruits on the initial screen, and experience each of the three scenes. The feedback given by the participants in the operation of this system 1) The participants were curious about how the Kinect detects their actions. 2) The participants were curious about where the photos will be uploaded.
3) The participants feel that it is commemorative to be able to download photos to their mobile phones. The participants' reaction when interacting with this system 1) The participants felt very happy during the process of interacting with the fruit objects.
2) The participants were surprised when they saw decorations on their heads or harvested fruits on their hands.
3) The participants did not know that they could change their actions when taking photos and counting down. 4) The participants' attention was focused on the screen in front of them, and were less likely to notice the operation instructions for harvesting and returning to the initial screen.
Chao-Ming Wang, and Yu-Hui Lin The participants' additional actions in operating this system 1) The participants wanted to take photos with other friends who were visiting together. 2) Some participants did not want to use the photo-taking function or scan the QR code to download photos and just wanted to experience the interaction with fruits.

ANALYSIS OF COMMENTS COLLECTED FROM INTERVIEWS WITH THE USERS
During the public exhibition period, as mentioned previously 50 participants were randomly selected to accept interviews of three aspects, namely, "operation situation of human-machine interface," "operation experience," and "views on interactive experience of food and agricultural education." Some researchers of this study recorded the responses and counted the number of people who expressed each opinion. If more than 75% of the participants have expressed an identical opinion, the opinion is regarded as a majority suggestion; and if there are only five participants or less, it is regarded as a minority one. The collected opinions are listed in Table 20 and are summarized in the following.
1) The operation process of the proposed system is simple and intuitive.
2) The way to play the game on the proposed system was fun and interesting.
3) The participants have positive feelings about the feedback of getting photos immediately after the experiencing process. 4) The use of body movements for interaction, coupled with rich digital content design, can help the participants integrate into the interactive experiencing situation. 5) The participants gave positive comments on the experiencing form of introducing somatosensory interaction in food and agricultural education: in addition to attracting the public to understand the content of food and agricultural education, it also has an edutainment effect. 6) It is necessary to strengthen the part of prompting and guiding, so that participants can know what posture to interact with. 7) The height difference of participants and the placement of virtual objects affect the fluency of the operation. 8) More interactive actions, dynamic images, sound effects, or knowledge content feedbacks may be added into the system so as to create more educational effects. 1) Using body movements to interact is a novel and interesting way of experience. (42) 2) Using body movements to interact is a convenient and easy-to-understand way of experience. (28) 3) Using body movements to interact can be well integrated into the interactive experience situation, and have a sense of being immersed in the environment. (7) What is your opinion on the operation interface of this system?
1) The overall operation interface of this system is simple and easy to operate. (34) 2) The gesture design of this system is intuitive and in line with the real situation. (8)  3) The objects in the screen are placed at the two sides so that if the gestures are not standard enough, the operation will become unsmooth.

ANALYSIS OF COMMENTS COLLECTED FROM SECOND EXPERT INTERVIEWS
In the interviews with the four invited experts (named P4 through P7 as seen in Table 10), questions of three aspects, namely, "operation situation of human-machine interface," "experiencing the educational content of the system," and "view on the interactive experiencing process," were asked. The collected comments are listed in Table  21 from which the following conclusions can be drawn.
1) The play experience of "Fruit Picking Fun" is both entertaining and educational, and receives positive reviews from the experts. 2) The somatosensory posture used by "Fruit Picking Fun" is in line with the context of fruit harvesting and the range of human capabilities, and the operation is simple and intuitive.
3) The series of the photo-taking experiencing process of "Fruit Picking Fun" is smooth. 4) When the participants encounter difficulties in operation, they must be given more prompts on the system interface. 5) Introducing somatosensory interaction in food and agricultural education is more attractive than general education using books. 6) Sharing the photos after the experiencing process through social media helps promote food and agricultural education. 7) It is suggested to further define the target user group and present more suitable content of food and agricultural education for the group. 8) A series of contexts of fruit harvesting can be explored further to enhance the depth of food and agricultural education. What is your opinion on the interface of the proposed system?
• P4 and P7: the design of the somatosensory actions used in the system is intuitive and easy, in line with the scope of human capabilities. • P4: when participants encounter difficulties in operation, they must be given more prompts on the interface and more guidance to perform the system. • P6: somatosensory interaction is related to the user's body proportion, height, and behavioral ability; and the target group needs to be clearly defined. Do you think the whole experience process of this system is smooth?
• P4 and P7: the entire interactive experiencing process is very smooth. • P5 and P6: the Kinect is slightly delayed in gesture recognition due to hardware limitations, but the delay also depends on the operating context of the system; and sometimes such delays can be allowed. Experience content of the system What is your opinion on the exhibition layout and digital content design of the system?
• P5: it is good to create the feeling of having a good harvest by tracking the position of the hand and letting the fruit be held in the hands. • P6: it is necessary to allow the participants to experience the difference between traditional agriculture and digital experience to a certain extent; and it needs to think about how to deepen the impression of participants in the feedback design during the experiencing process. • P7: the same fruit can have images of different angles and sizes, so that the results of the interaction can have some changes. Do you think the system has an entertaining effect?
• P4, P5 and P7: the concept and game-play of the overall system are fun and interesting.
• P5: because the participant's hands must be used for picking fruits, his/her limbs will have relatively large movements, which can drive interest. What do you think of the educational nature of the system on agricultural knowledge?
• P4 and P7: more relevant knowledge content can be expanded for the system. • P6: it needs to conduct a more detailed investigation of the series of contexts behind the harvested fruits, in order to understand the needs of the target user group for food and agricultural education and to present the experiencing process in more relevant details. View on interactive experience of food and agricultural education What is your opinion on the application of "the experiencing form via the somatosensory interactive wall" of the system to food and agricultural education?
• P4 and P7: compared with traditional "flat" written education, through somatosensory interaction, food and agricultural education can be more attractive. • P5: the use of somatosensory interaction can create a situational atmosphere for food and agricultural education, and sharing photos through social media may attract people to experience the field. Such a virtuous cycle can help promote food and agricultural education.

CONCLUSIONS
Based on the design principles drawn from an extensive literature reviews of related human-machine interaction theories and existing cases of interactive devices, an interactive system with the Kinect as the core device using the "human" as the interface has been designed, on which a game named "Fruit Picking Fun" can be played for the aim of food and agricultural education. The interaction capability of the game is realized by the uses of computer vision and augmented reality (AR) techniques using the Kinect device as the sensor as well as a series of programs written in this study. The human-machine interaction is realized by these programs that implement the somatosensory functions of face detection and tracking, hand-gesture recognition, as well as body-model matching and tracking. The education content is taught in the way of playing the game to understand the harvesting processes of three typical types of fruit. The AR photo of the user with the harvested fruit held in hands may be taken by the system as a souvenir and downloaded to the user's mobile phone.
During the public display of the system, the observation and interview methods were used to collect opinions from the participants and several invited experts. The effectiveness of the proposed system was evaluated according to these comments to reach the following positive conclusions.
1) The interactive experience of this work is simple and intuitive  the design of gestures is simple, intuitive, and in line with the real situation; it is easy to connect with the actual fruit harvesting experience; and the experience flow of the overall system is smooth.
2) The use of body movements for interactive experiencing is given positive reviews  using body movements to interact is a novel and fun way of experience, which can be well integrated into the context of fruit harvesting and create a sense of personal experience. The posture of holding the fruits after playing the game also has the concept of realistic harvest, allowing users to have the joy of harvest like the farmer.
3) The introduction of somatosensory interactive food and agricultural education can arouse the interest of the participants and achieve the effect of edutainment  the somatosensory interaction offered by the system is quite suitable for use in food and agricultural education, which is more attractive than ordinary book education, and can achieve the effect of edutainment. 4) In addition to being commemorative, the experience of taking AR photos can achieve the effect of publicity and promotion of food and agricultural education through sharing on social media  being able to scan the QR code to get the AR photos after the experiencing process is commemorative, and sharing through social media is also a way to publicize and promote food farmer education.
The three types of fruit used in the game "Fruit Picking Fun" are just examples; other agricultural products may also be included in the future. Furthermore, the interactivity of the game "Fruit Picking Fun" may be increased, and more special animation effects and sound feedbacks related to the knowledge of food and agricultural education can be added. Finally, the target groups to use the system may be extended, and the knowledge content of food and agricultural education may be improved to be richer.

SOURCES OF FUNDING
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.