Cloud-based management of semantic urban data

Motivation

Enabling urban-scale social and physical sensing comes along with integrating highly heterogeneous data, which comprises data gathered through physical and social sensors but also data made available through open data channels and social networks. While the processing of large-scale complex data is the topic of intensive research, the specifics of urban systems that need to aggregate physical and social data challenge data management and associated cloud-based solutions.

In urban environments, a lot of data is continuously collected originating from a variety of sensors and from people’s smartphones and wearable devices. Clouds provide infrastructures of choice to store and process collected data, as cloud resources can be dynamically provisioned on-demand in self-service. As the volume and velocity of data to be stored and processed may vary over time, cloud-based elastic data management services are needed to enable near real-time processing of data coming from sensor networks and mobile phones possibly combined with other data such as open data. The data processing chain may involve filtering, aggregating and ultimately storing resulting data for further processing by data analytics applications. As personal data are involved, it is essential to follow a privacy-by-design approach in the design of cloud-based data management services.

Many stakeholders will be involved in the collection and the storage of data from sensor networks, the storage of open data, the provision of services and applications for citizens and city government. Thus, multiple private and public cloud providers are likely to be involved providing storage and compute services to urban services and mobile social applications.  This calls for elastic cloud-based services, which are portable on different clouds and can be deployed on several clouds.

Goals

This objective of this work is to design and implement a Platform-as-a-Service (PaaS) runtime environment to easily deploy and efficiently execute cloud components of urban scale applications involving urban and social data sensing and analytics. Among the key aspects there will be on the one hand the design of protocols for efficient data transfer from the sensors and mobile devices to the services running in the cloud and on the other hand the design of data management services preserving privacy. The proposed PaaS runtime environment will be designed to enable near real-time processing of heterogeneous data streams in a multi-cloud setting. In a complementary way and acknowledging the central role of semantic Web data for urban data engineering, we intend to develop custom cloud-based semantic data engineering platform.

State of the art and challenges

Recently a lot of mobile social applications have flourished. These applications are relatively basic for the time being in terms of features and data collected and processed. With the advent of the smart city era, many more such applications will be made available to people and are likely to be broadly adopted as their usefulness and ease of use will increase over time. We can foresee that mining and learning from the rich and various data generated in cities from different sources (sensor networks, wearable devices, social networks, user generated data, open data) will lead to a new generation of mobile social applications with tasks executed on the mobile devices and/or clouds depending on various considerations (data stream velocity, network, storage, processing capacity and power available…). Challenges relate to:

  • Scalability. At the city scale, there will be numerous urban services and applications to be executed, many of them processing large amounts of data from different sources. We also anticipate a wide adoption of mobile social applications by citizens, as these applications will become increasingly useful as they exploit data coming from different sources. Cloud-based data stream management services that constitute a core part in the urban services and applications will have to scale with dozens to hundreds of thousands end-users expecting a quasi immediate response to their requests.  It is an unprecedented scale for providing near real-time data processing in the cloud and an active research area, e.g., see [Ananthanarayanan13].
  • Ease of deployment and elasticity management. Urban services and applications will be built as a composition of services, some of them running in clouds. It is essential to ease the deployment of such applications and their dynamic reconfiguration as needed (e.g., when facing a peak of load due to changing conditions) or when resources can be removed (e.g., to decrease the cost in periods of low load when using a commercial cloud).  It should be simple to off-load tasks from mobile devices to clouds and vice versa. While some PaaS systems already exist such as ConPaaS[1] or Cloudfoundry[2] for easily deploying web-based applications, high performance services (MapReduce, Task farming services) on a cloud, designing PaaS systems easing the deployment and reconfiguration of data-intensive urban services and applications processing data streams remains an open research area.
  • Flexibility and extensibility. Considering the broad diversity of urban services and applications, users of the PaaS services should be able to easily customize the services for their needs. The integration of new services in the PaaS should also be easy to extend the range of supported services and applications, e.g., [Moreno-Vozmediano 2013].
  • Privacy. It is important that data analytics can be efficiently performed on sensed data while preserving anonymity. The privacy-by-design approach is quite novel in the context of cloud computing. Many existing services and applications exploit storage and compute services offered by giant cloud providers such as Amazon with little care of privacy. Some approaches based on community clouds (exploiting geographically distributed desktops as desktop grids and volunteer computing do) and personal clouds (exploiting the devices and computers owned by a citizen to store his data and host applications and services processing his personal data) have recently been proposed. However, these systems have not been designed for urban scale services and applications but mainly as a way to avoid storing data in commercial clouds.

Methodology

Our work will be divided in the following steps:

  • Based on the study the software architecture of urban-scale applications and underlying middleware, we will propose a simple web interface and an API for a platform-as-a-service environment dedicated to the development and deployment of their components that are to be executed in the cloud.
  • We will design and implement algorithms and mechanisms to seamlessly off-load storage and computing tasks from mobile devices into the cloud and to process heterogeneous data streams in near real-time.
  • We will validate the developed cloud-based data management services with some of the next generation urban services developed within CityLab@Inria.

The work will be conducted in collaboration with the other research effort undertaken within CityLab@Inria. Indeed, middleware solutions developed as part of privacy-preserving city-scale sensing need to be designed in close relation with the supporting cloud infrastructure for data gathering and distribution. In addition, cloud-based data engineering need to adequately support data analytics such as the solutions elaborated in the two next sections.


[2] http://www.cloudfoundry.com/

References

[Ananthanarayanan13] R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, A. Reznichenko, D. Ryabkov, M. Singh, S. Venkataraman.  Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams. SIGMOD ’13: Proceedings of the 2013 international conference on Management of data. 2013.

[Moreno-Vozmediano 2013] R. Moreno-Vozmediano, R.S. Montero, I.M.  Llorente. Key Challenges in Cloud Computing: Enabling the Future Internet of Services, IEEE Internet Computing. 17(4). 2013.

Permanent link to this article: http://citylab.inria.fr/cloud-based-management-of-semantic-urban-data/