Semantic Scalability Through Abstraction
In the digital age, information is being generated at an extraordinary pace. More data has been created in the last three years than in the past 40,000. Around 2008, this rate of expansion surpassed the generation rate of storage capacity, leading to a future where data will overwhelm storage capacity. Within the next five years, the amount of sensor data on the Web will surpass social data, and sensors will become the dominant source of information . Thus, the efficient representation and interpretation of sensor data at scale has become an important area of research. Abstraction may provide a solution.
What is Abstraction?
Context is a representation of the salient aspects of the environment in which a statement is uttered, or an observation is made. Such a representation of the environment may be utilized to enable a correct interpretation of the statement (or observation). For example, the statement "I’m hot" may be interpreted differently, depending on whether the speaker is in a doctor’s office or on the beach. If in a doctor’s office, then the speaker may be ill and the statement may be interpreted as the possibility of fever. Instead, if lying on the beach, then the statement may be interpreted as a comment on the ambient temperature due to exposure to direct sunlight.
Related to context is the concept of abstraction, which acts as a surrogate, for a set of related concepts or recurring pattern. Typically, only relevant information is retained by the abstraction. In perceptual theory, an abstraction is a conceptualization of some entity in the world, which is only known through sensory detection of its observable qualities. Thus, abstraction is a type of context which is derived through observation. The physical environment to be represented by abstraction can be any physical entity, an object or event in the world; and observable qualities are properties of the world that can be detected and/or measured by sensors (either people or machines).
(explained through a weather example)
Between April 1st and April 6th of 2003, a major blizzard hit the state of Nevada. Environmental data within the surrounding area was collected by weather-stations, encoded as RDF, and made accessible on the Web as Linked Data . For every two hour interval, and for each weather-station within a 400 mile radius of the blizzard, we collected the corresponding observations and derived an abstraction. In this experiment, an abstraction is a representation of the weather condition, such as blizzard, flurry, rain-storm, clear, etc, and is derived from observations such as precipitation, temperature, and wind speed. In total, for 72 time intervals and 516 weather-stations, 37,152 abstractions were generated; the clear condition was found 70% of the time, and the blizzard condition was found < 1% of the time (see Figure 1).
Figure 1. Distribution of abstractions found in the weather experiment.
Now consider the example of a weather-alert service application which provides an alert if a severe weather condition (i.e., blizzard, flurry, rain-storm) is found. For this application, only the alert-generating abstractions are needed. In addition, the clear weather condition is probably not interesting; so neither the abstraction, nor the corresponding observations are needed. To explore the storage benefits provided by the generation of abstraction, we defined and compared five dataset storage configurations that may be useful for different applications. Figure 2 shows the amount of data generated by each configuration during the weather experiment described above. The relevant abstractions include all weather conditions except for clear, and relevant observations include all observations used to derive a relevant abstraction. Notice that there is over an order of magnitude difference between the number of relevant abstractions generated, versus the total number of observations.
Figure 2. Semantic scalability of a weather application