I recorded the panel session for the question on big data and cloud computing. I was sitting in the back and the audio is not very good. To help with that, I transcribed the video. This text is available as CC in in the video. It should be pretty close to what Dawn Wright [ESRI] and Jenifer Austin Foulkes [Google] were saying. I've added some links and the video is at the end of this post.
MARCIA: [missed the first part of the question]
grabbing video data, photographic information, high bandwidth data. What are your thoughts as representatives of organizations that are familiar with handling large amounts of data as to what are the prospects for rapid QA/QC and effective effective technologies for serving up that data to the public and in particular when we think of like HD (high definition) data that scientists are going to demand for doing research, how one can actually deliver that data to those who might need it in it's full quality?
[Jenifer passes the question to Dawn to go first]
DAWN: Fantastic question Marcia. And I think in terms of dealing with all of these, I mentioned among the 4 V's, was "variety." We had not really had a discussion yet about video data as a part of that V, and that's going to be very critical.
[The 4 V's: Volume, Variety, Velocity, Veracity]
MARCIA: So that's a 5th V?
DAWN: No. That's actually within the V of variety. In terms of, we have photographs, videos, text files, points, lines and polygons and observations, visualizations, scientific models. But within all that variety, we have not spoken that much about videos. And Christian Germain's [possibly the right person] comments to that extent were very pertinent to this, because with these long archives of these videos that we want to not only preserve, but also make available quickly, that's again where partnerships are very important. Because, with our academic and government agencies there has been a lot of work and a lot of funding to creating these archives, maintaining them, but sometimes the next step is not always easily attainable, particularly if your funding runs out And so for instance with Kate's situation with Neptune Canada, where there's been a data management system put in place, and your looking to do the next step, this is where I think the public-private partnership can really play a role. Because if your are able to work with companies who are actually looking at some of these problems in terms research. So I'd like to just step to the side a little bit to talk about how important this is when we think about research versus exploration. We've talked about "Is exploration part of research?" Is there a continuum of these two. Are they broad end points? Or are you really a scientist? I think in terms of ocean exploration. The ocean exploration challenges are actually the research questions, the research problems of information technology or data science. So, we want those kinds of challenges. So, being able to archive, QA/QC, and quickly disseminate video and other kinds of observations is something that is DELICIOUS to Google, to ESRI, to our partners. So, Marinexplore is a new startup company in Silicon Valley, that is building a marine platform. They are even building a marine operating system. And they are working now on specific machine learning techniques to automatically go through and QA/QC satellite observations and I think that could be applied to video. So I think it is just a matter of getting in. There was a discussion of a national "huddle." We could have a data or technology huddle and talk about some of these challenges in smaller groups and come up with very useful partner ships to solve these problems.
MARCIA: Okay. Great. Jen?
JENIFER: Yeah, I think that all sounds wonderful. You know I would say that towards your question, Google has some tools that can help dealing with large data. And those that I'm familiar with YouTube for video and so there's several groups like NOAA [NOAA Ship Okeanos Explorer - Best VIDEOS Of 2012-2010] and Schmidt Ocean Institute who are using YouTube Live, you know, to record and share video from their expeditions
MARCIA: Is it full HD TV?
JENIFER: It has full HD TV. Yeah. So I think that's a great tool for video. With regard to analysis, I'm not sure. I'm sure if it can do automated feature extraction. That would be something to look into. And Chris might have mentioned that. so I think the other thing would be Kurt has done a lot of work using our cloud hosting infrastructure to upload large amount of data. Specifically, looking at the space-based AIS ship tracking data [SpaceQuest] and then using Compute Engine to do really fast query analysis. I think that is a tool. He's been working on a bit of a how-to, how you can do it as well, to come out in the future. You can see his talk from Google I/O. [Google I/O 2013 - All the Ships in the World: Visualizing Data with Google Cloud and Maps] Kurt Schwehr, formerly from UNH, in the back there. He'd be happy to talk to anyone about his explorations using our Cloud Store for handling big data. And there is another tools, Google Maps Engine, which has been used for geospatial data and Earth Engine is a project where they look at cloud hosting large amounts of imagery data. So they put all the Landsat imagery in the cloud to allow scientists to do really fast change analysis for forest fires now and forest coverage changes. And that's been really powerful. Analysis that before took months, they do in like a day or a really short period of time. So I think that whole power of being able to, you know, do queries across lots and lots and lots of machines is a future tool that can be really really powerful. And I'm happy to be a part of the story.
DAWN: I think what Jenifer is referring to in terms of the cloud. You know, the cloud is another one of these terms that may be a mystery term. And I tend to think of a, literally, an atmospheric cloud in my head. But, we are talking about really a paradigm shift in computing and data distribution and there is some bit of trepidation about it But I think it is something that is really becoming more secure, more powerful, and easier to use. In the information technology world, we are all going that way. And so the cloud infrastructure is somthing that is going to be very powerful and something that ocean exploration really needs to embrace to a certain extent or to at least investigate. Be willing to have an open mind to investigate it. This is also It's going to be much more efficient, much faster. To the point that I made earlier about not just moving our datasets back and forth, but actually moving analysis, the analyses that Jenifer and I have been talking about, moving that to the data once and then you have your outputs from that can go out much more quickly including to the general public and to school kids on their tablets
MARCIA: So, let's suppose we're at a point [video cut off]
Since Dawn mentioned security, you can check out: Google-CommonSecurity-WhitePaper-v1.4.pdf: Googleâs Approach to IT Security A Google White Paper [2012]. Google has ISO 27001 certification. Michael Manoocheri: Google I/O 2011: Compliance and Security in the Cloud [YouTube video]