About Athena/Theseus Technology

The core of the Athena system is based upon the Theseus fuzzy logic word recognition system.  Here is the description of the Theseus Technology from the Artificial Ingenuity website:

"The Theseus system is a proprietary fuzzy-logic based word recognition algorithm capable of recognizing words too jumbled for human recognition. This technology is based upon Artificial Ingenuity's theories of the method used when humans process words visually. This new technology was originally conceived in September of 2003, and has proven to be vastly superior to previous spell-check or context based interpretation systems. The most effective comparable systems approach an average of 85% accuracy in recognition of misspelled or jumbled words. The Theseus technology approaches 100%.

This new technology can be applied to any system that requires human input, such as word processing, data entry, communications, etc. The Theseus system can be used in conjunction with our Proteus technology to create conversational interfaces that far outperform previous more limited technologies. 

Applications for this new software include but are not limited to: fault tolerant human input, optical text recognition systems, spell-check systems, automated translation systems, Spam filtering systems, automated data recovery, etc."

The Theseus system is used to turn the text contents of the call history database into a collection of tokenized records that represent a compressed view of the call notes.  This is in essence taking words and turning them into a 32 bit unique token.  The fuzzy logic is used to recognize words despite possible typographical errors, and includes a learning algorithm for extending the standard dictionary tokens to include application specific terms.

Here is a diagram showing an outline of the tokenization process:


 


Once the call history database has been translated to collections of tokenized call records there is a multi-step process to construct search index structures that allow rapid access to lists of calls that contain specific words and phrases.

The specific steps to generate these structures is as follows:

The call list index structure allows for set operations on the call lists to determine the candidate set of calls to be scored for matching and or phrase filtering.  This follows a very simple logic of:

[Token1 Call Set]  {intersection}  [Token2 Call Set]  {intersection}  [TokenN Call Set]   =>  Candidate Call Set

Here is a diagram describing the call list index structure:



The list of 1000 most common words is generated by traversing the call list index structure which buffers the number of times a token is used per call and total for all calls.  The most tokens list is then used for "exclude common words" functionality, and for construction of the Biples index.  There is also some statistical interest in the scoring of the most common words in the text corpora, which can certainly indicate specific problem terms proportional significance.

The Biples (word pair vector) index is a proprietary Artificial Ingenuity performance enhancement method derived from statistical modeling methods.  This structure provides a significant performance boost for identification of candidate call lists in cases containing specific common word pairs and phrase searching.  The general call list set model implemented in Athena generates performance that is an inversely proportional function of the frequency of token usage.  That means that it is faster at finding infrequently used words in a search phrase or word group.  Even with optimizations to the order of intersection set production this can be problematic with extremely frequent word tokens.  The Biples structure allows for optimization of the search terms by parsing the search words into a list of Biples and individual tokens.  A Biple provides an already intersected call list of the two common word phrase, which shortens the candidate calculate process significantly.

Here is a diagram representing the Biples structure:



The application structure of the Athena Search system is based upon COM+ technology with a dedicated server application that provides search results and text retrieval from the call history database.  The Athena Search Remote applications are a simple COM+ client that communicates with the server application, and does very little processing beyond the display of retrieved text and highlighting of search terms within a displayed call record.

The Athena Server supports as many connections as the hardware and operating system will allow, and should manage at least 100 connections on a relatively minimal platform.  The only requirement is 2GB of RAM to facilitate buffering of index structures for performance gain if full caching is enabled.  Full caching is in fact optional, and the server has been tested on a 600Mhz 256MB system resulting in adequate performance for most searches.

The unchanged textual call history database continues to reside on a separate SQL server, and is accessed directly only by the Athena Server application, which limits the number of SQL connections for the system to 1.  This provides a potential saving in the type of SQL server license required.  The remote search applications connect to the server application using COM+ via TCP/IP over a network, VPN, or even Internet connection.  The remote applications install themselves and register the COM+ interface upon first execution, and will attempt to connect to the server at the default location.  If the server application is already running on the Athena Server platform remote applications will connect to the existing instance.  If the server is not running it will be automatically started in the default caching configuration the first time a remote application attempts to connect.

Here is a basic configuration diagram:



The server application maintains a log of search requests and execution times, so that metrics can be maintained for determination of caching or other configuration changes.  There is also a basic status screen displayed by the server application which shows resource utilization.

Here is a screen shot of the server application in use:

 

 

 

Return to Athena Page

Return to Theseus Page


Return to Research


Home


Contact info@artificialingenuity.com
Copyright © 2005 Artificial Ingenuity, LLC
Last modified: June 29, 2005
Initial design by Webinizer, LLC