Featured Post

Trie implementation in C

To implement the kind of storage which stores strings as the search keys , there is a need to have special data structures which can store the strings efficiently and the searching of data based on the string keys is easier, efficient and faster. One such data structure is a tree based implementation called Trie.

Trie is a data structure which can be used to implement a dictionary kind of application. It provides all the functionality to insert a string, search a string and delete a string from the dictionary. The insertion and deletion operation takes O(n) time where n is the length of the string to be deleted or inserted.
Some of the application of tries involve web based search engines, URL completion in autocomplete feature, Spell checker etc.

Structure of Trie(Specific to this implementation):

The trie implemented here consists of nodes. Each node has these fields:
  1. Key - Part of the string to be serached,inserted or deleted.
  2. Value -  The value associated with a string (e.g In a dictionary it could be the meaning of the word which we are searching)
  3. Neighbour node address - It consists of the address of the neighbouring node at the same level.
  4. Previous neighbour address - It consists of the address of the previous node at the same level.
  5. Children node address - It consists of the address of the child nodes of the current node.
  6. Parent node address - It consists of the address of the parent node of the current node.
The additional nodes like Parent and Previous nodes are added to this implementation for making the search, and deletions easier.

Here is a diagrammatical view of a trie nodes I have used in this implementation. The field key is not represented in the diagram due to symmetry purposes.


  
Let us consider an example to understand tries in detail.

Suppose we have to implement a database for the HR department of an organisation in which we have to store an employee's name and their ages. There is an assumption for this example that there each employee's name is unique.So there is a strange policy in this organisation that any new employee which has a name that already exists in the organisation, it would not hire that new employee.

Let's use this hypothetical example just to understand how tries work.
  • Consider we have a new employee named Andrew with age 36. Lets populate our trie for "andrew".





  • Now add "tina".


  • Add "argo".



  • Add "tim".




  • Add "t".



  • Add "amy".


  • Add "aramis".



This is the complete Trie with all the entries. Now let us try deleting the names. I am not capturing the trivial cases.

  • Lets try deleting Argo.



  • Delete Tina



  • Delete Andrew





There is also a video from IIT Delhi which explains the tries. Tries Explained.
The implementation for this Trie is given below. Please provide your suggestions to further improve the implementation.
/*trie.h*/
typedef int trieVal_t;

typedef struct trieNode {
    char key;
    trieVal_t value;
    struct trieNode *next;
    struct trieNode *prev;
    struct trieNode *children;
    struct trieNode *parent;
} trieNode_t;

void TrieCreate(trieNode_t **root);
trieNode_t* TrieSearch(trieNode_t *root, const char *key);
void TrieAdd(trieNode_t **root, char *key, int data);
void TrieRemove(trieNode_t **root, char *key);
void TrieDestroy( trieNode_t* root );


/*trie.c*/
#include <stdio.h>
#include "trie.h"
#include <stdlib.h>

trieNode_t *TrieCreateNode(char key, int data);

void TrieCreate(trieNode_t **root)
{
 *root = TrieCreateNode('\0', 0xffffffff);
}

trieNode_t *TrieCreateNode(char key, int data)
{
 trieNode_t *node = NULL;
 node = (trieNode_t *)malloc(sizeof(trieNode_t));

 if(NULL == node)
 {
  printf("Malloc failed\n");
  return node;
 }

 node->key = key;
 node->next = NULL;
 node->children = NULL;
 node->value = data;
 node->parent= NULL;
 node->prev= NULL;
 return node;
}

void TrieAdd(trieNode_t **root, char *key, int data)
{
 trieNode_t *pTrav = NULL;

 if(NULL == *root)
 {
  printf("NULL tree\n");
  return;
 }
#ifdef DEBUG
 printf("\nInserting key %s: \n",key);
#endif
 pTrav = (*root)->children;



 if(pTrav == NULL)
 {
  /*First Node*/
  for(pTrav = *root; *key; pTrav = pTrav->children)
  {
   pTrav->children = TrieCreateNode(*key, 0xffffffff);
   pTrav->children->parent = pTrav;
#ifdef DEBUG
   printf("Inserting: [%c]\n",pTrav->children->key);
#endif
   key++;
  }

  pTrav->children = TrieCreateNode('\0', data);
  pTrav->children->parent = pTrav;
#ifdef DEBUG
  printf("Inserting: [%c]\n",pTrav->children->key);
#endif
  return;
 }

 if(TrieSearch(pTrav, key))
 {
  printf("Duplicate!\n");
  return;
 }

 while(*key != '\0')
 {
  if(*key == pTrav->key)
  {
   key++;
#ifdef DEBUG
   printf("Traversing child: [%c]\n",pTrav->children->key);
#endif
   pTrav = pTrav->children;
  }
  else
   break;
 }

 while(pTrav->next)
 {
  if(*key == pTrav->next->key)
  {
   key++;
   TrieAdd(&(pTrav->next), key, data);
   return;
  }
  pTrav = pTrav->next;
 }

 if(*key)
 {
  pTrav->next = TrieCreateNode(*key, 0xffffffff);
 }
 else
 {
  pTrav->next = TrieCreateNode(*key, data);
 }

 pTrav->next->parent = pTrav->parent;
 pTrav->next->prev = pTrav;

#ifdef DEBUG
 printf("Inserting [%c] as neighbour of [%c] \n",pTrav->next->key, pTrav->key);
#endif

 if(!(*key))
  return;

 key++;

 for(pTrav = pTrav->next; *key; pTrav = pTrav->children)
 {
  pTrav->children = TrieCreateNode(*key, 0xffffffff);
  pTrav->children->parent = pTrav;
#ifdef DEBUG
  printf("Inserting: [%c]\n",pTrav->children->key);
#endif
  key++;
 }

 pTrav->children = TrieCreateNode('\0', data);
 pTrav->children->parent = pTrav;
#ifdef DEBUG
 printf("Inserting: [%c]\n",pTrav->children->key);
#endif
 return;
}

trieNode_t* TrieSearch(trieNode_t *root, const char *key)
{
 trieNode_t *level = root;
 trieNode_t *pPtr = NULL;

 int lvl=0;
 while(1)
 {
  trieNode_t *found = NULL;
  trieNode_t *curr;

  for (curr = level; curr != NULL; curr = curr->next)
  {
   if (curr->key == *key)
   {
    found = curr;
    lvl++;
    break;
   }
  }

  if (found == NULL)
   return NULL;

  if (*key == '\0')
  {
   pPtr = curr;
   return pPtr;
  }

  level = found->children;
  key++;
 }
}

void TrieRemove(trieNode_t **root, char *key)
{
 trieNode_t *tPtr = NULL;
 trieNode_t *tmp = NULL;

 if(NULL == *root || NULL == key)
  return;

 tPtr = TrieSearch((*root)->children, key);

 if(NULL == tPtr)
 {
  printf("Key [%s] not found in trie\n", key);
  return;
 }

#ifdef DEBUG
 printf("Deleting key [%s] from trie\n", key);
#endif

 while(1)
 {
  if( tPtr->prev && tPtr->next)
  {
   tmp = tPtr;
   tPtr->next->prev = tPtr->prev;
   tPtr->prev->next = tPtr->next;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
   break;
  }
  else if(tPtr->prev && !(tPtr->next))
  {
   tmp = tPtr;
   tPtr->prev->next = NULL;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
   break;
  }
  else if(!(tPtr->prev) && tPtr->next)
  {
   tmp = tPtr;
   tPtr->parent->children = tPtr->next;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
   break;
  }
  else
  {
   tmp = tPtr;
   tPtr = tPtr->parent;
   tPtr->children = NULL;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
 }

#ifdef DEBUG
 printf("Deleted key [%s] from trie\n", key);
#endif
}


void TrieDestroy( trieNode_t* root )
{
 trieNode_t *tPtr = root;
 trieNode_t *tmp = root;

    while(tPtr)
 {
  while(tPtr->children)
   tPtr = tPtr->children;

  if( tPtr->prev && tPtr->next)
  {
   tmp = tPtr;
   tPtr->next->prev = tPtr->prev;
   tPtr->prev->next = tPtr->next;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
  else if(tPtr->prev && !(tPtr->next))
  {
   tmp = tPtr;
   tPtr->prev->next = NULL;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
  else if(!(tPtr->prev) && tPtr->next)
  {
   tmp = tPtr;
   tPtr->parent->children = tPtr->next;
   tPtr->next->prev = NULL;
   tPtr = tPtr->next;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
  else
  {
   tmp = tPtr;
   if(tPtr->parent == NULL)
   {
    /*Root*/
    free(tmp);
    return;
   }
   tPtr = tPtr->parent;
   tPtr->children = NULL;
#ifdef DEBUG
   printf("Deleted [%c] \n", tmp->key);
#endif
   free(tmp);
  }
 }

}


/*triedriver.c*/
/*
 * To Compile : gcc -o trie trie.c triedriver.c
 * To run: ./trie
 */
#include <stdio.h>
#include <stdlib.h>
#include "trie.h"

int main()
{
    trieNode_t *root;
    printf("Trie Example\n");
    
    /*Create a trie*/
    TrieCreate(&root);
    
    TrieAdd(&root, "andrew", 1);
    TrieAdd(&root, "tina", 2);
    TrieAdd(&root, "argo", 3);
    TrieAdd(&root, "timor", 5);
    TrieRemove(&root, "tim");
    TrieAdd(&root, "tim", 6);
    TrieRemove(&root, "tim");
    TrieAdd(&root, "ti", 6);
    TrieAdd(&root, "amy", 7);
    TrieAdd(&root, "aramis", 8);

    /*Destroy the trie*/
    TrieDestroy(root);
}


In order to print the debug messages, use -DDEBUG while compiling with gcc:
gcc -o trie trie.c triedriver.c -DDEBUG

Comments

  1. That's awesome!!! I have added it and results are amazing. Thanks for sharing.

    ReplyDelete
  2. Hi Varun ,

    Can u please explain the below mentioned condition for removing a key.

    if( tPtr->prev && tPtr->next)
    {
    tmp = tPtr;
    tPtr->next->prev = tPtr->prev;
    tPtr->prev->next = tPtr->next;
    free(tmp);
    break;
    }

    ReplyDelete
    Replies
    1. This condition is to delete the node which has both neighbors around it.

      Delete
    2. is this site still active? I tried to post something earlier and it is not appearing.

      Delete
    3. Yes the site is still active. It's just that the comments are moderated.

      Delete
  3. I think I may have found a bug, it seems it is not possible to insert words in non strict alphabetical order, for example if I insert "acceptable" and then "accept", the second one gets always inserted with a value (the trieVal_t member of trieNode) equal to 0, I'm now trying to understand what's wrong. I would be pleased if you could confirmmy problem.

    ReplyDelete
  4. As I previously posted I think you should change lines 93-95 to:

    if(*key)
    {
    pTrav->next = TrieCreateNode(*key, 0);
    }
    else
    {
    pTrav->next = TrieCreateNode(*key, index);
    res=pTrav->next;
    }
    pTrav->next->parent = pTrav->parent;
    pTrav->next->prev = pTrav;

    ReplyDelete
    Replies
    1. Thanks Walter for your suggestions. I'll look up the issue and update the code.

      Delete
    2. I have updated the code. Please try to run your tests now.

      Delete
  5. May i knw the advantages of trie algorithm over other algorithms... plz reply...

    ReplyDelete
    Replies
    1. Can you specify the other algorithms ?
      Tries are mostly used for efficient string based search.

      Delete
  6. Varun:
    Sorry to be a newbie, but wondering "where is main"? I'm trying to get an understanding of tries, but can't get this to compile without "main" in .c code.

    ReplyDelete
    Replies
    1. I have added a sample driver code and instructions in the comments.

      Delete
  7. Updated the code with few improvisations and bug fixes.

    ReplyDelete
  8. Wouldn't the linked list traversal be a massive bottleneck for large or non-Latin text? Indexing CJK characters could result in tens of thousands of traversals even if the strings themselves were only a few characters long, and long strings would result in an excessive amount of allocations and linked list traversals.

    That's not a knock on your code; I just heard recently that tries were perfect for everything but they seem incredibly limited outside of Latin dictionary lookups, and even then a pre-compressed structure optimized for lookups would be superior.

    ReplyDelete
    Replies
    1. That's an interesting point. Tries work well with limited set of characters as in Latin text. Allocations can be improved using some sort of memory pool , but yes, the traversals can become bottleneck in case of non-Latin text. Thanks for bringing out such a valuable point.

      Delete
    2. I researched it further and learned that Unicode is indeed a major limitation of the standard Trie implementation, but I was wrong about a few things:

      Since your code operates on individual bytes rather than Unicode codepoints, individual CJK characters will take no more than 256*4 lookups (four levels of 8-bit values), rather than 2^32 lookups (one level of 32-bit codepoints). That's over 4 MILLION times faster in the pathological worst case.

      The recommended fix seems to be replacing the linked list with another data structure. I'll leave it up to your readers to decide which is best for their needs. :]

      Also, it turns out that the major limitation regarding Unicode is using it for sorting (these tries operate on individual bytes, while Unicode sorting needs multiple bytes or characters), but your implementation doesn't sort anyway. So it looks like it works reasonably well!

      Delete
  9. If I tried to compile the code as above, gcc complains that TrieCreateNode is used before it is defined. Cheers,

    ReplyDelete
    Replies
    1. I'm experiencing the same issue, does anyone has a solution to this bug? I haven't enough skills in C yet to fix this code myself. Thanks!

      Delete
    2. Here's the exact error message:

      trie.c: In function 'TrieCreate':
      trie.c:10:8: warning: assignment makes pointer from integer without a cast [enabled by default]
      *root = TrieCreateNode('\0', 0xffffffff);
      ^
      trie.c: At top level:
      trie.c:13:13: error: conflicting types for 'TrieCreateNode'
      trieNode_t *TrieCreateNode(char key, int data)
      ^
      trie.c:10:10: note: previous implicit declaration of 'TrieCreateNode' was here
      *root = TrieCreateNode('\0', 0xffffffff);

      Delete
    3. Please try the updated code now. Only a declaration was needed before the usage.

      Delete
  10. When I run triedriver the output I get is:
    Trie Example
    Key [tim] not found in trie

    ReplyDelete
  11. Inside TrieCreateNode you return a NULL pointer in case malloc fails, but when you call TrieCreateNode you do not check if it has returned NULL.

    ReplyDelete
  12. In TrieSearch lvl and pPtr can be eliminated.
    Sorry for the many comments. Feel free to delete them once you've read them.

    ReplyDelete
    Replies
    1. Feel free to give more :-) Any thing which can improve the quality of the code is welcome.

      Delete
  13. Is TrieSearch supposed to work? I'm getting NULL while searching "tina" (after adding it).

    ReplyDelete
  14. Check TrieRemove function how it works. It has usage of TrieSearch.

    ReplyDelete
  15. For the TrieDestroy function, doesn't seem like you need the first two if statements( if( tPtr->prev && tPtr->next) and if(tPtr->prev && !(tPtr->next)) ) because you always loop until you get to the first child of a node, meaning that tPtr->prev is always NULL.

    ReplyDelete
  16. I'm confused on lines 50-66 and 120-136. What's the reason for the two print statements as they both create nodes in each section I'm confused on. Is the first create function just creating space and then the second create function actually puts information into the create function?

    ReplyDelete
  17. Hello,
    First thanks you to share your knowledges over this blog.

    I have a problem "double free or corruption (fasttop)" when I launch the code with this main :

    /*triedriver.c*/
    /*
    * To Compile : gcc -o trie trie.c triedriver.c
    * To run: ./trie
    */
    #include
    #include
    #include "trie.h"

    int main()
    {
    trieNode_t* root = NULL;
    TrieCreate(&root);
    TrieAdd(&root, "vic", 1);
    TrieAdd(&root, "joe", 2);
    TrieAdd(&root, "jar", 3);
    TrieRemove(&root, "joe");

    trieNode_t* ret = TrieSearch(root->children, "jar");

    if(NULL != ret) printf("value %d\n", ret->value);
    else printf("not found\n");

    TrieDestroy(root);
    return 0;
    }
    I would you know if someone already encountered this problem.

    Thanks,
    Victor

    ReplyDelete
    Replies
    1. Solved!

      In TrieRemove function, just add that :

      tPtr->next->prev = NULL;

      here :

      else if(!(tPtr->prev) && tPtr->next) {
      tmp = tPtr;
      // HERE
      tPtr->parent->children = tPtr->next;

      #ifdef DEBUG
      printf("Deleted [%c] \n", tmp->key);
      #endif
      free(tmp);
      break;

      Victor

      Delete
  18. I'm relatively new to C so I'm guessing there's a good reason, but why not simply use:

    void TrieDestroy (trieNode* node)
    {
    if(node)
    {
    if(node->children)
    {
    DeleteTrie(node->children);
    }

    if(node->next)
    {
    DeleteTrie(node->next);
    }

    free(node);
    }
    }

    ReplyDelete
  19. Hi Varun,

    Thanks for this great implementation. Any chance you publish a traverse function ? thanks ;)

    ReplyDelete
  20. Varun - How do you support special characters? I use a variation of the geeksforgeeks

    http://www.geeksforgeeks.org/trie-insert-and-search/

    implementation makes it even faster since, i reach head of a word at constant time. But, when I find special character, I fail to handle it.

    ReplyDelete
  21. in Add, instead of adding to the right most, we do a alphabetically sorted insertion, we would achieve faster search. - Lokesh Mogra

    ReplyDelete
  22. Hello, can you please explain how you decide how to set values in main() while doing tests on your Trie?
    TrieAdd(&root, "andrew", 1);
    TrieAdd(&root, "tina", 2);
    TrieAdd(&root, "argo", 3);
    TrieAdd(&root, "timor", 5);
    TrieRemove(&root, "tim");
    TrieAdd(&root, "tim", 6);
    TrieRemove(&root, "tim");
    TrieAdd(&root, "ti", 6);
    TrieAdd(&root, "amy", 7);
    TrieAdd(&root, "aramis", 8);

    Is that simply an INT variable incrementing, or how do you do it?

    Thanks.

    ReplyDelete
    Replies
    1. You can choose any values. It's basically a sample data stored at a particular node.

      Delete
  23. Hi Varun, is it possible to predict other nodes with just two letters as input. For eg over here, if I give input as 'ar', I get argo and aramis as suggestions? Can you please tell me how?
    Also this is great implementation.

    ReplyDelete
  24. I've a segmentation error deleting "andrew" (the first entry in the trie).
    how may it be?

    ReplyDelete
  25. yesterday I observed a segFault deleting the first entry.
    here I propose a patch:
    http://www.cloc3.net/corsoOnline/trie/trie.patch

    ReplyDelete
  26. I have a problem with this implementation.
    When I try to compile the code, I constantly receive the following error:
    ./trie.h:21:4: error: typedef redefinition with different types ('struct trieNode_t' vs
    'struct trieNode_t')
    } trieNode_t;
    ^
    ./trie.h:21:4: note: previous definition is here
    } trieNode_t;
    ^
    //the typedef struct is identical to the given though. Can you help me with this issue?

    ReplyDelete
    Replies
    1. Which compiler have you used?

      Delete
    2. The code compiles fine with gcc and msvc. Maybe it has something to do with clang compiler.

      Delete
  27. Hi Varun, is it possible to predict other nodes with just two letters as input. For eg over here, if I give input as 'ar', I get argo and aramis as suggestions? Can you please tell me how?

    ReplyDelete
    Replies
    1. Certainly, that's what the tries are extensively used for. When you visit a node, you can get all the subtrees of that node to achieve this. This is a simple tree operation. You can easily find the code by googling.

      Delete
  28. Hey, thanks for sharing the code, I found it very helpful.

    some little comments about the remove function:
    1. if we have a single entry in the trie and we remove it, it cause the root itself to be freed, I am not sure we want it.
    2. We may want to add this line right after line 221: tPtr->next->prev = NULL, so we will not accidentally reach an unallocated memory.

    Thanks.

    ReplyDelete
  29. How would one go about displaying parts of the tree?

    Thanks
    rw

    ReplyDelete
    Replies
    1. You need to implement a traverse function somewhat similar to TrieDestroy minus the memory deallocation part. And then print each node.

      Delete
  30. very easy you only said the example only.
    please you can send a same program and it user interactive to create a trie data stracture.
    like that accept a string and store and delte.

    ReplyDelete
  31. How can I call TrieSearch in main?
    Thanks

    ReplyDelete
  32. How would I call TrieSearch in main?

    Thanks

    ReplyDelete
  33. when i compiled the program an error occurred highlighting the line #include "trie.h" , saying 21 18 D:\C Prog\trie.cpp [Error] trie.h: No such file or directory.
    please help

    ReplyDelete
  34. when i compiled the program an error occurred highlighting the line #include "trie.h" , and stating
    21 18 D:\C Prog\trie.cpp [Error] trie.h: No such file or directory
    please help

    ReplyDelete
  35. Great.Thanks for the information, I like the way you put things together in an organized way, and explain your thoughts. Tutor services and tuition services are provided by TheTuitionTeacher in Delhi.
    Home Tutors in Delhi | Home Tuition service

    ReplyDelete

Post a Comment

Please post your valuable suggestions