In This Issue
Winter Bridge: A Global View of Big Data
December 15, 2014 Volume 44 Issue 4

Big Data Education and Research in Japan

Wednesday, January 7, 2015

Author: Shusaku Tsumoto

This article briefly outlines progress in Big Data education and training, research, and applications in Japan. Several universities have developed advanced programs and internships to prepare students to use and analyze Big Data in real-world settings. At the national level, a Data Scientist Training Network was established earlier this year to prepare researchers, consultants, and managers in this rapidly growing area. Two new research funding programs support collaboration and innovation in the use of Big Data to address social and economic challenges. In all these initiatives, the focus is on the practical application of Big Data, in areas such e-commerce, traffic flow control, hospital management, agriculture, genome sequencing, and disaster mitigation. Big Data are changing the social life and economic development of Japan.

Educational Programs

A master’s degree program in Human Resource Development for Big Data Innovation was developed in 2013 at the graduate schools of Keio University. Students are required to take one of three classes on Big Data tools during their first year, a course on Practical Big Data during their second year, and three of eight classes related to technologies for Big Data to complete their degree. Through practice the students develop skills and knowledge in Big Data management and analysis. The program features a unique curriculum in which students and professors work with Big Data providers Hadoop, MongoDB, and Mahout to tackle real Big Data problems in the following five application areas: e-commerce, life science, administration, traffic flow, and crowd control. Three of these areas are described in the following sections.

E-commerce

For the program’s unit on e-commerce, Rakuten, which manages a major e-commerce site, is planning to serve as a data provider. The company makes recommendations for customers by using patterns extracted from Big Data analytics, especially temporal patterns. For example, purchases of commodities such as rice, mineral water, pet food, and alcoholic drinks have their own periodicity for each customer. If data analytics shows that over the past several months a customer purchased a particular product every 28 to 32 days, then as the next timeframe for buying the product approaches the site will prompt the client.

Data can also yield useful information about collective behavior. Hay fever, mainly due to the pollen of cedar trees, affects more than 20 million people in Japan from February to April each year, and a large number of Japanese customers seek pills or other healthcare products for the prevention or relief of allergic reactions during this period. The specific seasonal effect is detected through tracking of an online search on the keyword “hay fever”; analysis of these and related data may enable new service recommendations for customers.

Rakuten is planning to open its Big Data to this program to support new joint projects with Keio University to train data scientists.

Life Sciences: Genome Sequencing

The availability of genomic data is expanding at an extraordinary rate thanks to technological advances in genome sequencing. Rapid progress in bioinformatics has made it possible to analyze these Big Data, yielding a significant amount of knowledge just since the beginning of this century.

In this component of the master’s program the focus is on genomic analysis of Bacillus subtilis natto, which is used for a traditional Japanese food made from fermented soybeans. Although the bacteria is very small, the genomic data are very rich and their analysis has produced the following information: the length of the genome is more than 4 million base pairs and the number of gene types is 4,429, which includes the genes for three major enzymes: gamma-polyglutamic acid, nattokinase, and elastase. The first is used in skin moisturizers, the latter two are used in products for the prevention of arteriosclerosis and vessel occlusion. Data analytics can thus increase knowledge of enzymes and lead to innovation in chemical products.

Traffic Flow Control

Traffic jams are a major problem in developed countries. In Japan, image processing devices and ultrasonic detectors are set above the traffic on major roads to monitor traffic flow and provide real-time information to drivers about traffic jams. The Ministry of Land, Infrastructure, Transport, and Tourism uses the Big Data generated for each road to assess and improve traffic flow. The results of data produced by about 60 sensors in the Shinjuku area of Tokyo are shown in Figure 1.

Figure 1

The ministry is planning to further analyze the data and combine them with GPS information. These and other types of related Big Data will be distributed to the students for analysis.

Data Scientist Training Network

This joint program of the Japanese Institute of Statistical Mathematics and the University of Tokyo, supported by the Japan DataScientist Society, was established in March 2014. The Society defines three types of data scientists: specialists in information technology/engineering who analyze data for their research, consultants who analyze data for their customers, and managers who use data for business management decisions (Figure 2). Since the integration of the three types of characteristics in a single person is very rare, project teams should be coordinated to include all three types. Crowdsourcing may facilitate such coordination.

Figure 2

In an internship program designed to train data scientists, selected companies invite students who have studied data analytics in physics, information science, statistics, or management science. The program offers three types of experience for the interns, in which they (1) deeply analyze data focusing on a specific domain, (2) learn the total process of Big Data analytics (as shown in the upper left portion of Figure 2), and (3) work with managers who make decisions based on data analytics.

Research Programs

In 2013 two research funding programs were started by the Japan Science and Technology Agency (JST). “Advanced Core Technologies for Big Data Integration” supports the creation, advancement, and systematization of innovative information technologies and their underlying mathematical methods for obtaining new knowledge and insight from the use of Big Data across different fields. It is supervised by Prof. Masaru Kitsuregawa, director general of the National Institute of Informatics. The other program, “Advanced Application Technologies to Boost Big Data Utilization for Multiple-Field Scientific Discovery and Social Problem Solving,” focuses on the application of Big Data technologies. It is supervised by Prof. Yuzuru Tanaka in the Graduate School of Information Science and Technology at Hokkaido University. Competition for grants is very high and only four proposals for the first program and two for the second were approved for funding last year.

Advanced Core Technologies for Big Data Integration

Program Description

This program supports (1) the creation, advancement, and systematization of next generation core technology to solve challenges common to a number of data domains and (2) integrated analysis of Big Data in a variety of fields. Specific development targets include technology for the stable operation of large-scale data management systems that compress, transfer, and store Big Data; technology for efficiently retrieving necessary knowledge by means of search, comparison, and visualization across diverse information; and the mathematical methods and algorithms that enable such services.

Accepted Proposals

The following four proposals have been accepted for funding under this program.

  • Establishment of Knowledge-Intensive Structural NLP [Natural Language Processing] and Construction of Knowledge Infrastructure: Sadao Kurohashi, professor, Graduate School of Informatics, Kyoto University
  • Privacy-Preserving Data Collection and Analytics with Guarantee of Information Control and Its Application to Personalized Medicine and Genetic Epidemiology: Jun Sakuma, associate professor, Graduate School of Systems and Information Engineering, University of Tsukuba
  • Extreme Big Data—Convergence of Big Data and HPC for Yottabyte Processing: Satoshi Matsuoka, professor, Global Scientific Information and Computing Center, Tokyo Institute of Technology
  • Discovering Deep Knowledge from Complex Data and Its Value Creation: Kenji Yamanishi, professor, Graduate School of Information Science and Technology, University of Tokyo

Advanced Application Technologies to Boost Big Data Utilization for Multiple-Field Scientific Discovery and Social Problem Solving

Program Description

This program supports collaborative projects and research in which the use of Big Data can bring about great social impact by solving challenging social and economic problems and achieving innovative value creation. Specific areas of interest are the life sciences, materials science, health and medical care, society and the economy, urban infrastructure systems, disaster prevention and mitigation, agriculture, forestry and fisheries industry, outer space, and the Earth’s environment. The long-term aims are new empirical creation and enhanced sophistication of next-generation application technologies necessary for achieving the objectives, and the establishment of comprehensive and integrated Big Data analytics system technology for use in a variety of areas.

Accepted Proposals

The following two proposals have been accepted for funding under this program.

  • Development of a Knowledge-Generating Platform Driven by Big Data in Drug Discovery through Production Processes: Kimito Funatsu, professor, Laboratory of Chemoinformatics, Department of Chemical System Engineering, Graduate School of Engineering, University of Tokyo. Although massive amounts of quantitative data are accumulated from a drug candidate’s initial discovery through its production process, data analysis for each of the discovery and production processes remains isolated. The aim of this project is to establish a platform enabling unification of the data and to advance research that will optimize systems to approach pharmaceutical development from a comprehensive, correlated, and high-level perspective. The research will be driven by Big Data, beginning with the identification of patterns for directions in lead molecule development based on large volumes of compound and protein data. These patterns will be combined with a virtually generated compound library and identification of targets for the compounds, with candidate compounds further evaluated for their synthetic and production feasibility. Massive quantities of production-related data will also drive the development of methods for assessing the safe operation of production plants to ensure that they are adequately equipped. The ultimate goal is the establishment of enhanced models for risk assessment, risk management, and quality control.
  • Innovating Big Data Assimilation Technology for Revolutionizing Very-Short-Range Severe Weather Prediction: Dr. Takemasa Miyoshi, Data Assimilation Research Team, RIKEN Advanced Institute for Computational Science. This research aims to innovate assimilation technology to fully take advantage of Big Data from Japanese next-generation technologies such as the phased array weather radar, the geostationary weather satellite, and the world’s leading 10-Petaflops K computer. An innovative 30-second super-rapid-update numerical weather prediction system for 30-minute severe weather forecasting will be developed, aiding disaster prevention and mitigation while effecting a scientific breakthrough in meteorology.

Current Practical Applications

Agriculture

Japan is facing a sharp drop in birth rate, which is associated not only with an aging society but also the loss of traditions of craftsmanship and agricultural practice. A decrease in the farming population is of particular concern as it will directly influence food supply. Various kinds of sensors are being used to collect data on the behavior of farmers, environmental settings, and cultivation to develop principles of efficient agricultural techniques. One university research group is proposing a mining process for big agricultural data, shown in Figure 3, based on multilevel modeling.1

Figure 3

Social Media

Social media yields quite a lot of data and is thus an appropriate source for Big Data analysis. One of the most interesting aspects of this analysis is that the resulting patterns can reveal useful information about important events, such as large-scale incidents, accidents, and disasters, and show how people behave and think in real time.

Sakaki and colleagues (2013) applied web-mining techniques to Twitter data just after the 2009 earthquake in the Shizuoka area east of Nagoya and mapped the information flow based on messages associated with the event (Figure 4). Analysis showed that messages were distributed from the center of the earthquake to its environs, which means that the information flow may roughly reveal the location of the event.

Figure 4

Hospital Management

In recent years most large-scale hospitals in Japan have introduced an information system for the management of clinical activities. One of the main components is a computerized physician order entry (CPOE) system that not only transmits the orders of a physician or nurse to other medical staff but also can store order histories, which can be analyzed to reveal trends and particular events in the clinical process.

At Shimane University Hospital, for example, where about 1,000 patients visit per day and about half that number are admitted, more than 50 gigabytes of archived data are stored per month. Analysis of the hospital’s stored data provides knowledge about each clinician’s decision process and may also reveal sudden order revisions that could be signs of medical errors or the risk of such errors. Two sequential pattern mining algorithms have been introduced for the analysis of order histories to evaluate prescription order changes, which may increase the risk of medical errors (Tsumoto and Abe 2013). The empirical results showed that the method captured characteristics of clinician behavior in real time, shedding light on the decision-making process in clinical environments.

Acknowledgments

The author would like to thank Profs. Takahira Yamaguchi, Satoshi Kurihara, and Yutaka Matsuo for providing information about their projects.

References

NILIM [National Institute for Land and Infrastructure Management]. 2014. Report of the Investigative Commission of Traffic Flow from Data of Advanced Cruise-Assist Highway System (AHS) in the Sangubashi Region in Shinjuku, Tokyo. Available at www.nilim.go.jp/lab/qcg/sangubashi/committee/pdf/no01/ vics_driver.pdf.

Sakaki T, Okazaki M, Matsuo Y. 2013. Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Transactions on Knowledge and Data Engineering 25(4):919–931.

Tsumoto S, Abe H. 2013. Mining clinical process in order histories using sequential pattern mining approach. 2013 Pacific-Asia Conference on Knowledge Discovery and Data Mining Workshop, April 14–17, Gold Coast, Australia. Springer Verlag Lecture Notes in Computer Science 7867:234–246.

FOOTNOTES

 1 Personal communication with Satoshi Kurihara, professor, Graduate School of Information Systems, University of Electro-Communication in Tokyo.

About the Author:Shusaku Tsumoto is a professor in the Department of Medical Informatics, Faculty of Medicine, Shimane University, Japan.