Nigul Olspert

Nigul Olspert · Wed, 11.01.2017


2016 IEEE International Conference on Big Data was held during Dec. 5-8, 2016 at Washington D.C., USA. The conference contained a number of sessions with different topics primarily related to datasets with massive volume and/or velocity (i.e. generation speed). In addition to that there were six keynote speeches, panels, one symposium, workshops, tutorials and poster session. I submitted the paper to the Workshop on Solar and Stellar Astronomy Big Data (SABID), which took place on the first day of the conference. During the given workshop there were total of ten talks, majority of them being related to data mining and applications of machine learning techniques on huge solar proxy datasets. Two talks were related to solar analogues focusing on Kepler mission and exoplanet detection. The workshop was concluded by an invited talk about the Helioviewer project – web-based software for visualizing solar image data.

My talk was a little different from the others in a sense that instead of dealing with observational datasets it involved an artificially generated dataset. It was concentrating on a time series analysis method dedicated for cycle detection from multidimensional datasets. As a case study we analysed the magnetic field data from the magnetohydrodynamical simulation of a Sunlike star. Importance of analysing such datasets lies in the fact that except for helioseismology we cannot see into the real Sun, but can make conclusions only based on surface activity tracers.

During the first day of the conference I also attended the Workshop on Big Data Challenges, Research, and Technologies in the Earth and Planetary Sciences. From this and the SABID workshop I would like to highlight maybe one central point: as the datasets in both domains are really massive then algorithms for processing the data should be run on the site of data, downloading being practically impossible. The other big challenge is how to handle the heterogeneity of the data (each data provider has their own formats/standards etc.). Tools that hide these complications from the end user, allowing to make queries combining information from different datasets have now been developed.

During the main conference I listened to all six keynote speeches which were all very interesting. Here I would like to bring out three of them which I found especially fascinating. “Database Decay and How to Avoid It” was addressing the question why schemas in real world relational databases diverge from 3rd normal form and how to fix it. “Data Security and Privacy” was focusing on aspects of security and privacy in the domain of big data. It was shown that sensitive information leakage can happen due to linking the data between different sources e.g. via social security number. One of the important question asked was: is nowadays the privacy totally lost or can it be gained back? Illustrative example of cryptographic multiplication was given during the talk. “Cognitive Computing: From breakthroughs in the lab to applications on the field” introduced current state of the art in cognitive systems which learn from data and in certain situations perform better than humans. Most likely in the near future these systems serve as advisors to field specialists (e.g. radiologists) who need to make difficult decisions. Several examples about IBM Watson in practice were given during the talk.

From the rest of the sessions I attended those with topics most relevant or interesting to me. These included primarily cloud, high-performance or parallel computing. On the last day I also attended a tutorial on Dynamic Big Data Processing in the Web of Things (WoT): Challenges, Opportunities and Success Stories. WoT being definitely one of the hot topics in the area of big data now and in the near future. The full program of the conference can be found from web at http://cci.drexel.edu/bigdata/bigdata2016/files/BigData2016ProgramSchedule.pdf.

Regardless of the quite tight schedule of the conference I had also little bit time to take a walk in the Washington D.C. park and visit most of the famous memorials. The overall organization of the conference was very good including the food and accommodation. In conclusion I am very pleased with getting the opportunity to attend the given 2016 IEEE Big Data Conference