On Path-Centric Navigation and Search Techniques for Personal Knowledge Stored in Topic Maps

Jens Heider and Julian Schütte
Fraunhofer Institute for Secure Information Technology, 64295 Darmstadt, Germany
Abstract. Leveraging interconnections across stored personal information is a novel concept to find desired information by context and structure. This paper introduces a combined navigation and search interface for homogeneous knowledge representations based on path and set calculation. It uses graph theory to provide a more intuitive way of supplying search criteria and retrieving related information.

1 Introduction

In many situations personal knowledge is stored in various locations and in different ways, requiring the user to use multiple applications to search for already known content. Even approaches providing search interfaces based on full-text search for multiple applications do not solve this sufficiently. They only shift the problem from finding the content to providing the right keyword contained in the content the user hopes to retrieve. Of course the office scenario of a knowledge worker handling e-mails, organizers, document management systems, and all other sorts of digital tools, trying to improve the daily work with information, is only one special domain to be covered.

This paper therefore introduces a search technique that leverages the redundancy of information contained in various data sources by autonomously building a homogeneous representation of it. Thus, useful connections between its metadata is created and used for a search based on graph theory. The approach presents one core aspect of the MIDMAY project1 (Mobile Information Distribution Management and Access for You!) initially described in [1]. It builds the framework for mobile devices such as smartphones to evolve into an everyday tool for the secure distribution, management and access of information. The proof of concept implementation uses Topic Maps as the underlying technology for handling personal knowledge and its homogeneous merging [2].

The need for search techniques in Topic Maps is often addressed by keyword search. In applications aiming at technical users, also query languages or Query-by-Example approaches are proposed [3]. Instead, the approach to be presented here is based on marking topics remembered by the user to be related in some way to the information to be searched. The user follows the paths he has in his mind to navigate to these topics in a quite natural way. Having marked two or more topics, the result set is then calculated using graph theory. Its application with Topic Maps is briefly described in Sect.2. Sect.3 shows the basic navigation methods inside the unified knowledge representation that are needed in order to be able to mark topics for the search process, which is described in Sect.4.

2 Preconditions

The topic maps used in the following are assumed to be generated from data sources containing information stored by the user. This is done with the help of software components, which we call extractors. They are built upon a framework, making it a simple task to provide access to new protocols of data sources, because only the mapping between the underlying information model to the access protocol has to be implemented. Each extractor is specialized this way and retrieves meta data from the data source to store it in a homogeneous topic map. Since each extractor is specialized on its data source, it can extract or create the meta data efficiently in the context of the extractors purpose. The resulting topic map structure and its transformation to a graph model is described in the following sections.

2.1 Topic Map Structure

The generated topic map contains resource references to data sources from where the actual information can be retrieved. The structure of the map therefore is designed to help the user find the desired references by preserving original structures he has given his data sources already. Additionally, the redundancy of data in multiple sources is used to interlink information in a way that enriches the representation. All information is then accessible via a single interface. In this context building a general representation with a homogeneous structure requires that

  1. unified Public Subject Indicators (PSI) are used for a global typing schema across all extractors that is applied strictly to all topics,
  2. associations also have a type to reflect their general semantic of a property, type or hierarchical relation for the global representation,
  3. there is no directionality inherent in an association as defined by XTM specification [4] and
  4. the topic map has to be made consistent, making each entry unique in the knowledge space of a user.

2.2 Graph Model

In contrast to a complete modeling of the entire Topic Maps concept as presented with the τ-model [5], we want to focus on the inherent structure of a topic map only. This has been done for directed acyclic graphs in [6], but in our application scenario we have to cover undirected cyclic graphs. The following model therefore represents our transformation from topic maps to a graph model that is useful for algorithms dealing with the concept of paths.

In our model the graph G, built by the topic map, is described by the pair (V,E). V is the finite set of vertices mapped to topics and E is a binary relation on V representing the undirected associations between the topics. Additionally, E explicitly contains the binary relations between topics and other Topic Maps’ entities (e.g., topic types) to preserve this structural information. Reification is addressed by creating a new vertice in V , built out of the reified topic map construct (e.g., an association), and including its relations from all involved topics into E. Each edge (vi,vj) ∈ E is given a constant configurable weight wij depending on the kind of connection and the search mode which will be described further in Sect.4.

A path of length k from topic ta to topic tb in this graph G = (V,E) is then a sequence v0,v1,v2,...,vkof vertices such that ta = v0,tb = vk and (vm-1,vm) ∈ E for m = 1,2,...,k. A path pi connecting ta and tb is written as (ta↔ptb)i. The set Pab of all n paths connecting the topics ta and tb is defined as Pab := { i=1n(ta↔pt b)i}. In this application field it can be assumed that the user is only interested in simple paths, so all paths in Pab are defined to have distinct vertices to prevent cycles.

3 Navigation

As Topic Maps represent a network of topics interconnected by associations, it is obvious to use these paths for navigating through the node-centric network (see [7]). The interesting question is, which topic should be chosen as the starting node. Although no information about the search interest has been specified so far, the user has to be offered an entrance point to the topic map. Some approaches therefore use rich visualization to present several levels of detail for the representation. The user can enter it trough different dimensions, which supports the user in visually exploring the representation (see [8]). However, these approaches often overburden non-technical users and also require large displays.

Finding information stored in databases is often solved by query languages. A similar approach has also been developed for Topic Maps with the query language Tolog2. These query languages with their precise description capabilities of the desired result have much power. However, the inherent complexity also has some drawbacks for the non-technical user. He has to know the correct syntax and some attributes of the data structure to produce meaningful queries.

Instead, this approach is based on choosing topics known to be connected somehow to the desired information. These topics are selected during navigation inside the topic map. Therefore, the general concepts for navigation are described fisrt, before Sect.4 goes into detail about the search techniques and their background in graph theory.

3.1 Points of Entrance

This section presents the methods we have integrated into MIDMAY to provide an easy way to enter the generated topic map for navigation. Each of them serves a slightly different purpose, depending on the kind of topic the user is looking for and the strategy he wants to follow.

Hierarchy Root

The most classic method to present an entry point is based on the hierarchy information contained in the topic map. The Hierarchical Classification Pattern and Faceted Classification Pattern proposed by Kal Ahmed [9] can be applied to unify a representation, as described in [2]. Therefore, all indexed data sources are represented in the topic map with their hierarchical structure. They share a common designated root that is presented to the user (see Fig.1). This way, the user is enabled to follow the paths to the leaves of the tree preserving the hierarchical structure. Of course, all other associations existing in parallel - not displayed in Fig.1 for the sake of clarity - can also be used to navigate within the topic map. This method is useful if the user remembers the hierarchical location of the original data source, which can now be accessed uniformly by using a single interface.


PIC

Fig. 1: Point of entrance using hierarchy root


Type List

This point of entrance leverages the typing information of all topics. Therefore, the user is presented with a list of all topic types contained in the topic map. By selecting one type, a list of all topics of that type (see Fig.2) is shown. In our scenario, this is useful for example to find documents of a certain file type regardless of its physical location in a data source. Additionally, this type list is useful to mark the complete topic type to be relevant for a search query instead of a single topic instance of the type.


PIC

Fig. 2: Point of entrance using type lists


Keyword Search

In other cases the user might remember a certain part of text contained in the metadata of the information he is looking for. Therefore, it is useful to provide an interface to enter keywords contained in the topic that represents the searched resource. To make it easier for the user, a personal vocabulary is automatically generated from all topics by splitting the Base Name Strings into terms. This way, the user can browse through available keywords and the search interface can selectively provide suggestions for keywords while he types some letters of them. Especially on mobile devices with limited input interfaces, this improves the search process significantly. Void search queries are prevented and the input of the search criteria is sped up.

A second benefit of a search based on pre-calculated terms is the implicit generation of linguistic relations, which can assist the user find related information that is not explicitly linked by the topic map. Of course, these relations sometimes are not correct in the sense of the actual search, like in the case of homonyms or acronyms, but as the framework is designed for humans, it can be assumed that they are used to deal with those multiple meanings.

3.2 Navigation Aid

After the user has entered the topic map at the desired point, he moves from topic to topic by following the associations between them (see Fig.3(a)). In this node-centric interface (in MIDMAY called Click and Cycle because of its closed interaction cycle), a list of terms is presented at first. Choosing an entry leads to a list of topics containing the selected term. In this topic view, either a topic is marked for a topic operation such as using it inside a search query, or the topic is selected to see all of its associations. In this association view the user can select the desired association, which leads to the view of all related terms contained in associated topics. An additional redo phase is possible in all views in case a list would contain too many items to handle. In this case, the items will be presented in a staged view, clustering items alphabetically to provide a tradeoff between view levels and displayed items per level.

To help the user keep track of the navigation we propose the integration of a path-centric approach. Its implications for the navigation will be shown in the next subsections. But it is also a key aspect for calculating search results as described in Sect.4.

History Path

When navigating through personal knowledge it is often necessary to have the possibility to go back and forth between previously visited topics. To help the user find the desired topic he has already used for navigation, a history presents his path from the entrance point to the current position in the graph (see Fig.3(b)). This path visualizes the relation of the intermediate topics and serves as a navigation shortcut to topics and associations. At the same time it can be displayed quite easily even on small displays. If more space is available on the display, the visualization can be enhanced with more sophisticated display techniques by showing adjacent topics in the surrounding of each contained node in the path as well.

Favorites

Topics used often can be marked as favorites. This is useful for example, if the topic represents a document used periodically, like the submitting of trade terms in reoccurring business processes. Choosing it from the list of favorites provides an instant access to the document, which can then be transmitted easily to any recipient without performing a search for it.

Depending on the data and its personal structuring it often also makes sense to use favorites for topics that are frequently used in search queries. This way selecting the required topics as input for the query can be reduced to only a few selections. Optionally it is also possible this way to store an actual selection of topics for a query to be easily reproducible for future requests.


PIC
(a) Navigation Graph
PIC
(b) History of Navigation

Fig. 3: Navigating through the Topic Map by cycling the interface


4 Search

Besides navigation techniques, the inherent structure of a topic map offers other interesting aspects. These mainly arise from the linked information contained in the topic map. Every portion of information is associated with other instances. This will be used in the following sections to provide an easy to use yet powerful search technique, without the need for the user to learn a query language. The approach is based on marking topics that are related in one way or another to the topics the user is looking for. To use a topic for a search, the described navigation inside the topic map is used to reach that desired topic. Then it is marked, which puts it in a virtual bag of accentuated topics. After this bag is filled, one possible interaction with it is using the search functionality.

At this point it is important to note that any topic can be used when specifying a query. Besides topics representing collected meta data, this also includes topics representing a type, such as person, event or date. As everything in a topic map is a topic, the user does not experience any restrictions nor must he perform different actions when formulating the search query, no matter what information he provides as input.

The next sections will describe the calculation of results based on marking topics. First the background for the used principle will be described, followed by the presentation of a simple algorithm we have used as proof of concept for this search technique. The shown sample search queries were performed with a MIDMAY prototype implementation presenting the results in its user interface.

4.1 Paths as Search Results

Let ta be the first topic marked and tb a second one. Because every topic has to have a topic type and the faced root topic connects all hierarchies, there is at least one path p with tap↔tb. For a simplified calculation we assume that paths with the smallest weight in Pab are of the most interest. These paths represent the most direct relation in the context of the search, and paths with increasing weight are deemed to decrease in interest for the user.

Therefore every path p ∈ Pab connecting the topics ta and tb is sorted by its weight and displayed sequentially in that order starting with the lowest value. The paths are shown together with the descriptive name of the corresponding association to visualize the relation between them. The name is chosen by the role the topic plays to display the path readable from left to right or, for smaller displays, from top to bottom. Every topic and association of a path can then be used as an entry point for a subsequent navigation to their associations and topics respectively.

Path Calculation

A first simple approach only uses the topic map as an unweighted graph using wij = 1 for all edges (vi,vj) ∈ E. As a proof of concept an adapted Bidirectional Search is used that starts at ta and tb simultaneously. At both ends the connected vertices are inspected with a weighted Breadth-first Search in spreading waves, until the path with the lowest value is found. To produce the next path, the edge that connects both waves is removed from the search graph and a new path with lowest value is calculated. This significantly limits the amount of displayed paths, which is a desirable effect in this case, as the user only wants to see relations between the starting topics. In this case it is considered sufficient, if all topics in the possible paths are presented at least one time in a displayed path, preventing the display of all other possible combinations of topics in paths computable in a dense connected graph. This way the user gets the chance to find any topic that is related to the marked topics ta and tb.

However, the topic map not only contains undirected edges represented by associations to be used for path calculation. The search can be further improved by taking the association types into account. These are assigned for all associations, assured by the extractor framework (see [2]). This way it is possible to distinguish for example between hierarchical relations and property relations during path calculation. By changing the weighting wij of the edges for an association type, the user is enabled to further specify his search interests.

An example for this is shown in Tab.1, which presents a weight assignment to the association types hierarchy (h), property (p) and type (t) to modify the search results. These values were chosen to deal with structures in the graph where multiple short paths connect two vertices. In the office scenario this is often the case because of a dense connection of topics (e.g., two topics ta and tb with associations to the same property topic tc and an additional hierarchical relation between ta and tb). By setting the weight w(x) of a relation type x to 23 < w(x) < 1 and the weights for the other types to 2, the paths through one or more of these typical three-cornered structures are selectable. Of course, this is not accurate in all and every case one can imagine, but these values have shown in many tests to work very well as an easy to use aid for the search process. By predefining these weights, the user only has to pick a descriptive mode to select the related weight assignment expressing his search interest.


Table 1: Example weight values for path search modification





 
 
  mode w(h)w(p)w(t)result
 
 





 
 
  0 1.0 1.0 1.0 default case; shortest paths is calculated
  1 0.7 2.0 2.0 hierarchical information in result path is desired
  2 2.0 0.7 2.0 path should contain interconnection of properties
  3 2.0 2.0 0.7 the interests in the relation of types is expressed





 

Example: Leveraging Paths

The following simple example should explain how marking topics and calculating paths can be used as a search technique. The example is based on a topic map autonomously extracted by MIDMAY from a digital personal organizer, filesystem folders used for storing documents, and the personal e-mail folder. The user is looking for a document but can’t remember its filename nor its author. However, the user recalls the town where he attended the conference and met the person who e-mailed the document. Provided that the personal organizer’s calendar application contains the meeting along with the location and attendants’ information, the extractors have already included this information into the topic map.

To use the path generation, the user navigates to the topics that build the endpoints of the query. In the example the location topic Graz and the filetype topic PPT are marked, because the user is searching for a Microsoft Powerpoint presentation sent by someone he met at a meeting in the town Graz. The computed paths (see Fig.4) now show the resulting relations between these topics and the user can browse through the paths to find the one containing the topic referencing the attachment he is looking for. There is also the possibility to narrow the search in a second step by using the topics contained in the paths to navigate. After finding a path containing the name of the person who sent the attachment, the user can replace the location topic with this person topic. This excludes all other persons who also attended meetings in Graz.

Another strategy would be to mark topic Graz and topic Person - the type representing the concept of a person - to find all persons that are related to Graz. After selecting one person topic of the calculated paths, the search can be continued by marking other known aspects about the searched document.


PIC

Fig. 4: Calculated shortest path example


4.2 Sets as Search Result

Section 4.1 assumes the selection of two topics to be used as query input. The next section deals with the case of a user wanting to provide more information in order to specify the desired result more precisely with three or more topics. The user will be presented with a set of topics that are considered related regarding the search parameters.

Calculation of Sets

To provide meaningful results in the case of multiple, marked topics, we decided to define the resulting sets based on the already defined shortest paths. First, all the shortest paths between all m marked topics in S := {s1,...,sm} are calculated to extract the relations between the topics. The resulting paths are then transformed to bit vectors. Therefore, all involved k vertices of all paths are indexed. These unique IDs are used as positions in all vectors. The vector Babp indicates the presence or absence of vertices v1,v2,v3,...,vk in path p between topic ta and tb. By using an OR operation defined as Babp Babp := {vj|(vj ∈ Babp) (vj ∈ Babp)} on all bit vectors the resulting vector Bab = Bab1 Bab2 ⋅⋅⋅Babn represents the topics that are contained in all n shortest paths between ta and tb. This is performed for all pairs of the topics selected in S. The results are m(m-1)
---2-- bit vectors involved in the actual search. These bit vectors are then combined to a single one by applying a binary operation. In the default case this is an AND operation defined as Babp Babp := {vj|(vj ∈ Babp) (vj ∈ Babp)}. The resulting vector Bres(S) := { i=1,j=2i=m-1,j=mBsisj|si,sj ∈ S,i < j} represents the topics with a maximum of interestingness defined by the smallest weight of the paths to all of the marked topics. The topic selection view presents these topics as a result of the search query for further interaction.

Using this approach together with the weight assignment for the underlying path calculation presented in Tab.1, the user of the described scenario has the option to calculate sets based on

Compared to query languages, this simple approach already produces valuable results and is easy to handle even for non-technical users. But of course the transformation into bit vectors was done to have the ability to expand the possibilities for search requests to other binary operations. In the first case the ordering of the marked topics does not matter. In an advanced mode it can be used to prioritize their meaning by specifying a binary operation between sets of marked topics. The operation is then used on the resulting bit vectors of the sets of marked topics. Other binary operations such as OR and XOR are defined similarly to the AND operation and produce according results. OR accumulates result topics, whereas XOR presents the difference between result sets.

Example: Marking Multiple Topics to Search

Imagine the situation of a user trying to find authors of presentations that are related to a certain project. His topic map is kept up to date automatically by MIDMAY and is composed by its extractors, indexing local document folders, his e-mail account and the company’s LDAP3 directory. The latter contains, besides many other useful information about the employees, their project memberships as well. Again the required information to answer the user’s question is already stored in maintained data sources. By letting MIDMAY combine them in a personal knowledge map represented by a topic map, the user can use a single interface to mark topics that come to his mind when thinking of the desired result. He marks the topic Person and the topic MIDMAY that represents the project he is interested in (see fig.5). This would show all persons involved in the project MIDMAY. But by also marking the topic PPT, representing the type “PowerPoint presentations”, the result is narrowed to persons related to this file type, too. The user can now navigate from this topic inside the topic map or browse for additional, related information such as e-mail addresses or telephone numbers. Additionally, the user can retrieve or distribute the information the topic stands for.


PIC

Fig. 5: Calculated result of multiple marked topics in search mode 0


5 Conclusion

Connecting existing data though a unified representation can be used to offer an intuitive way to search for information. The combined interface for browsing, managing and searching, resulting from this concept, is a first step to extend the possibilities users can gain from their stored data. Together with the circular interaction process, the user interface gets independent from current work tasks inside the new information environment. This simplifies the usage and is aligned with the concept of knowledge represented in topic maps.

The concept of marking topics to specify queries allows using the graph theory for searching, thereby leveraging the inherent structure across data sources, created by data occurring in multiple sources. The proof of concept implementation already showed the capabilities created by this approach. Technically, its usage in the field depends on efficient Topic Maps engines that are capable of handling maps containing hundreds of thousands of topics. Compared to the indexing technologies of full-text desktop search engines, this requires currently about ten times more disk space. But by investing these resources, users are not forced to recall specific attributes or keywords and benefit from relations between their stored data. Instead of manually tagging information, the users do not have to invest any effort, but gain the possibility to search directly via keywords, as well as via their own ways of correlating things.

These results are only a first step, but further search techniques with applied graph theory may even improve the approach in terms of flexibility and comprehensibility of the search process. Together with intuitive user interfaces that provide suitable interaction with the user’s personal knowledge, the Topic Maps paradigm shows new ways to tackle daily work with information.

References

1.     Heider, J.: Vision und Realisierung einer sicheren mobilen Informations-Verteilung, Verwaltung und Abfrage. In: Multikonferenz Wirtschaftsinformatik (MKWI) 2004. Bd 3. Mobile Business Systems. (2004)

2.    Heider, J., Bergmann, J.: Topicmaps: Unified access to everyday data. In: Proceedings of I-KNOW ’06, Graz, Austria. (2006) 473–480

3.    Wang, D., Dicheva, D., Dichev, C., Akouala, J.: Retrieving information in topic maps: the case of tm4l. In: ACM-SE 45: Proceedings of the 45th annual southeast regional conference, New York, NY, USA, ACM Press (2007) 88–93

4.    Topicmaps.org: XML Topic Maps (XTM) 1.0 Specification. http://www.topicmaps.org/xtm/1.0 (2001)

5.    Barta, R., Salzer, G.: The Tau Model, Formalizing Topic Maps. In Hartmann, S., Stumptner, M., eds.: Second Asia-Pacific Conference on Conceptual Modelling (APCCM2005). Volume 43 of CRPIT., Newcastle, Australia, ACS (2005) 37–42

6.    Ma, Q., Tanaka, K.: Topic-structure-based complementary information retrieval and its application (2005)

7.    Dave, P., Karadkar, U.P., Furuta, R., Francisco-Revilla, L., Shipman, F., Dash, S., Dalal, Z.: Browsing intricately interconnected paths. In: HYPERTEXT ’03: Conference on Hypertext and hypermedia, ACM Press (2003) 95–103

8.    Le Grand, B.; Soto, M.: Visualisation of the Semantic Web: Topic Maps visualisation. In: IV’02: 6th International Conference on Information Visualisation. 344–349

9.    Ahmed, K.: Beyond PSIs : Topic map design patterns. In: Extreme Markup Languages. (2003)

Footnotes

1  http://www.project-midmay.de

2  Implementation by Ontopia of the requirements stated by ISO Topic Maps query language TMQL standardization effort

3  Lightweight Directory Access Protocol, an application protocol for querying directory services