Click here to login if you're an NAE Member
Recover Your Account Information
Author: Shusaku Tsumoto
This article briefly outlines progress in Big Data education and training, research, and applications in Japan. Several universities have developed advanced programs and internships to prepare students to use and analyze Big Data in real-world settings. At the national level, a Data Scientist Training Network was established earlier this year to prepare researchers, consultants, and managers in this rapidly growing area. Two new research funding programs support collaboration and innovation in the use of Big Data to address social and economic challenges. In all these initiatives, the focus is on the practical application of Big Data, in areas such e-commerce, traffic flow control, hospital management, agriculture, genome sequencing, and disaster mitigation. Big Data are changing the social life and economic development of Japan.
A master’s degree program in Human Resource Development for Big Data Innovation was developed in 2013 at the graduate schools of Keio University. Students are required to take one of three classes on Big Data tools during their first year, a course on Practical Big Data during their second year, and three of eight classes related to technologies for Big Data to complete their degree. Through practice the students develop skills and knowledge in Big Data management and analysis. The program features a unique curriculum in which students and professors work with Big Data providers Hadoop, MongoDB, and Mahout to tackle real Big Data problems in the following five application areas: e-commerce, life science, administration, traffic flow, and crowd control. Three of these areas are described in the following sections.
For the program’s unit on e-commerce, Rakuten, which manages a major e-commerce site, is planning to serve as a data provider. The company makes recommendations for customers by using patterns extracted from Big Data analytics, especially temporal patterns. For example, purchases of commodities such as rice, mineral water, pet food, and alcoholic drinks have their own periodicity for each customer. If data analytics shows that over the past several months a customer purchased a particular product every 28 to 32 days, then as the next timeframe for buying the product approaches the site will prompt the client.
Data can also yield useful information about collective behavior. Hay fever, mainly due to the pollen of cedar trees, affects more than 20 million people in Japan from February to April each year, and a large number of Japanese customers seek pills or other healthcare products for the prevention or relief of allergic reactions during this period. The specific seasonal effect is detected through tracking of an online search on the keyword “hay fever”; analysis of these and related data may enable new service recommendations for customers.
Rakuten is planning to open its Big Data to this program to support new joint projects with Keio University to train data scientists.
Life Sciences: Genome Sequencing
The availability of genomic data is expanding at an extraordinary rate thanks to technological advances in genome sequencing. Rapid progress in bioinformatics has made it possible to analyze these Big Data, yielding a significant amount of knowledge just since the beginning of this century.
In this component of the master’s program the focus is on genomic analysis of Bacillus subtilis natto, which is used for a traditional Japanese food made from fermented soybeans. Although the bacteria is very small, the genomic data are very rich and their analysis has produced the following information: the length of the genome is more than 4 million base pairs and the number of gene types is 4,429, which includes the genes for three major enzymes: gamma-polyglutamic acid, nattokinase, and elastase. The first is used in skin moisturizers, the latter two are used in products for the prevention of arteriosclerosis and vessel occlusion. Data analytics can thus increase knowledge of enzymes and lead to innovation in chemical products.
Traffic Flow Control
Traffic jams are a major problem in developed countries. In Japan, image processing devices and ultrasonic detectors are set above the traffic on major roads to monitor traffic flow and provide real-time information to drivers about traffic jams. The Ministry of Land, Infrastructure, Transport, and Tourism uses the Big Data generated for each road to assess and improve traffic flow. The results of data produced by about 60 sensors in the Shinjuku area of Tokyo are shown in Figure 1.
The ministry is planning to further analyze the data and combine them with GPS information. These and other types of related Big Data will be distributed to the students for analysis.
Data Scientist Training Network
This joint program of the Japanese Institute of Statistical Mathematics and the University of Tokyo, supported by the Japan DataScientist Society, was established in March 2014. The Society defines three types of data scientists: specialists in information technology/engineering who analyze data for their research, consultants who analyze data for their customers, and managers who use data for business management decisions (Figure 2). Since the integration of the three types of characteristics in a single person is very rare, project teams should be coordinated to include all three types. Crowdsourcing may facilitate such coordination.
In an internship program designed to train data scientists, selected companies invite students who have studied data analytics in physics, information science, statistics, or management science. The program offers three types of experience for the interns, in which they (1) deeply analyze data focusing on a specific domain, (2) learn the total process of Big Data analytics (as shown in the upper left portion of Figure 2), and (3) work with managers who make decisions based on data analytics.
In 2013 two research funding programs were started by the Japan Science and Technology Agency (JST). “Advanced Core Technologies for Big Data Integration” supports the creation, advancement, and systematization of innovative information technologies and their underlying mathematical methods for obtaining new knowledge and insight from the use of Big Data across different fields. It is supervised by Prof. Masaru Kitsuregawa, director general of the National Institute of Informatics. The other program, “Advanced Application Technologies to Boost Big Data Utilization for Multiple-Field Scientific Discovery and Social Problem Solving,” focuses on the application of Big Data technologies. It is supervised by Prof. Yuzuru Tanaka in the Graduate School of Information Science and Technology at Hokkaido University. Competition for grants is very high and only four proposals for the first program and two for the second were approved for funding last year.
Advanced Core Technologies for Big Data Integration
This program supports (1) the creation, advancement, and systematization of next generation core technology to solve challenges common to a number of data domains and (2) integrated analysis of Big Data in a variety of fields. Specific development targets include technology for the stable operation of large-scale data management systems that compress, transfer, and store Big Data; technology for efficiently retrieving necessary knowledge by means of search, comparison, and visualization across diverse information; and the mathematical methods and algorithms that enable such services.
The following four proposals have been accepted for funding under this program.
Advanced Application Technologies to Boost Big Data Utilization for Multiple-Field Scientific Discovery and Social Problem Solving
This program supports collaborative projects and research in which the use of Big Data can bring about great social impact by solving challenging social and economic problems and achieving innovative value creation. Specific areas of interest are the life sciences, materials science, health and medical care, society and the economy, urban infrastructure systems, disaster prevention and mitigation, agriculture, forestry and fisheries industry, outer space, and the Earth’s environment. The long-term aims are new empirical creation and enhanced sophistication of next-generation application technologies necessary for achieving the objectives, and the establishment of comprehensive and integrated Big Data analytics system technology for use in a variety of areas.
The following two proposals have been accepted for funding under this program.
Current Practical Applications
Japan is facing a sharp drop in birth rate, which is associated not only with an aging society but also the loss of traditions of craftsmanship and agricultural practice. A decrease in the farming population is of particular concern as it will directly influence food supply. Various kinds of sensors are being used to collect data on the behavior of farmers, environmental settings, and cultivation to develop principles of efficient agricultural techniques. One university research group is proposing a mining process for big agricultural data, shown in Figure 3, based on multilevel modeling.1
Social media yields quite a lot of data and is thus an appropriate source for Big Data analysis. One of the most interesting aspects of this analysis is that the resulting patterns can reveal useful information about important events, such as large-scale incidents, accidents, and disasters, and show how people behave and think in real time.
Sakaki and colleagues (2013) applied web-mining techniques to Twitter data just after the 2009 earthquake in the Shizuoka area east of Nagoya and mapped the information flow based on messages associated with the event (Figure 4). Analysis showed that messages were distributed from the center of the earthquake to its environs, which means that the information flow may roughly reveal the location of the event.
In recent years most large-scale hospitals in Japan have introduced an information system for the management of clinical activities. One of the main components is a computerized physician order entry (CPOE) system that not only transmits the orders of a physician or nurse to other medical staff but also can store order histories, which can be analyzed to reveal trends and particular events in the clinical process.
At Shimane University Hospital, for example, where about 1,000 patients visit per day and about half that number are admitted, more than 50 gigabytes of archived data are stored per month. Analysis of the hospital’s stored data provides knowledge about each clinician’s decision process and may also reveal sudden order revisions that could be signs of medical errors or the risk of such errors. Two sequential pattern mining algorithms have been introduced for the analysis of order histories to evaluate prescription order changes, which may increase the risk of medical errors (Tsumoto and Abe 2013). The empirical results showed that the method captured characteristics of clinician behavior in real time, shedding light on the decision-making process in clinical environments.
The author would like to thank Profs. Takahira Yamaguchi, Satoshi Kurihara, and Yutaka Matsuo for providing information about their projects.
NILIM [National Institute for Land and Infrastructure Management]. 2014. Report of the Investigative Commission of Traffic Flow from Data of Advanced Cruise-Assist Highway System (AHS) in the Sangubashi Region in Shinjuku, Tokyo. Available at www.nilim.go.jp/lab/qcg/sangubashi/committee/pdf/no01/ vics_driver.pdf.
Sakaki T, Okazaki M, Matsuo Y. 2013. Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Transactions on Knowledge and Data Engineering 25(4):919–931.
Tsumoto S, Abe H. 2013. Mining clinical process in order histories using sequential pattern mining approach. 2013 Pacific-Asia Conference on Knowledge Discovery and Data Mining Workshop, April 14–17, Gold Coast, Australia. Springer Verlag Lecture Notes in Computer Science 7867:234–246.
1 Personal communication with Satoshi Kurihara, professor, Graduate School of Information Systems, University of Electro-Communication in Tokyo.