To implement the kind of storage which stores strings as the search keys , there is a need to have special data structures which can store the strings efficiently and the searching of data based on the string keys is easier, efficient and faster. One such data structure is a tree based implementation called Trie.
Trie is a data structure which can be used to implement a dictionary kind of application. It provides all the functionality to insert a string, search a string and delete a string from the dictionary. The insertion and deletion operation takes O(n) time where n is the length of the string to be deleted or inserted.
Here is a diagrammatical view of a trie nodes I have used in this implementation. The field key is not represented in the diagram due to symmetry purposes.
Let us consider an example to understand tries in detail.
Suppose we have to implement a database for the HR department of an organisation in which we have to store an employee's name and their ages. There is an assumption for this example that there each employee's name is unique.So there is a strange policy in this organisation that any new employee which has a name that already exists in the organisation, it would not hire that new employee.
Let's use this hypothetical example just to understand how tries work.
This is the complete Trie with all the entries. Now let us try deleting the names. I am not capturing the trivial cases.
There is also a video from IIT Delhi which explains the tries. Tries Explained.
The implementation for this Trie is given below. Please provide your suggestions to further improve the implementation.
Trie is a data structure which can be used to implement a dictionary kind of application. It provides all the functionality to insert a string, search a string and delete a string from the dictionary. The insertion and deletion operation takes O(n) time where n is the length of the string to be deleted or inserted.
Some of the application of tries involve web based search engines, URL completion in autocomplete feature, Spell checker etc.
Structure of Trie(Specific to this implementation):
The trie implemented here consists of nodes. Each node has these fields:- Key - Part of the string to be serached,inserted or deleted.
- Value - The value associated with a string (e.g In a dictionary it could be the meaning of the word which we are searching)
- Neighbour node address - It consists of the address of the neighbouring node at the same level.
- Previous neighbour address - It consists of the address of the previous node at the same level.
- Children node address - It consists of the address of the child nodes of the current node.
- Parent node address - It consists of the address of the parent node of the current node.
Here is a diagrammatical view of a trie nodes I have used in this implementation. The field key is not represented in the diagram due to symmetry purposes.
Suppose we have to implement a database for the HR department of an organisation in which we have to store an employee's name and their ages. There is an assumption for this example that there each employee's name is unique.So there is a strange policy in this organisation that any new employee which has a name that already exists in the organisation, it would not hire that new employee.
Let's use this hypothetical example just to understand how tries work.
- Consider we have a new employee named Andrew with age 36. Lets populate our trie for "andrew".
- Now add "tina".
- Add "argo".
- Add "tim".
- Add "t".
- Add "amy".
- Add "aramis".
This is the complete Trie with all the entries. Now let us try deleting the names. I am not capturing the trivial cases.
- Lets try deleting Argo.
- Delete Tina
- Delete Andrew
There is also a video from IIT Delhi which explains the tries. Tries Explained.
The implementation for this Trie is given below. Please provide your suggestions to further improve the implementation.
/*trie.h*/
typedef int trieVal_t;
typedef struct trieNode {
char key;
trieVal_t value;
struct trieNode *next;
struct trieNode *prev;
struct trieNode *children;
struct trieNode *parent;
} trieNode_t;
void TrieCreate(trieNode_t **root);
trieNode_t* TrieSearch(trieNode_t *root, const char *key);
void TrieAdd(trieNode_t **root, char *key, int data);
void TrieRemove(trieNode_t **root, char *key);
void TrieDestroy( trieNode_t* root );
/*trie.c*/
#include <stdio.h>
#include "trie.h"
#include <stdlib.h>
trieNode_t *TrieCreateNode(char key, int data);
void TrieCreate(trieNode_t **root)
{
*root = TrieCreateNode('\0', 0xffffffff);
}
trieNode_t *TrieCreateNode(char key, int data)
{
trieNode_t *node = NULL;
node = (trieNode_t *)malloc(sizeof(trieNode_t));
if(NULL == node)
{
printf("Malloc failed\n");
return node;
}
node->key = key;
node->next = NULL;
node->children = NULL;
node->value = data;
node->parent= NULL;
node->prev= NULL;
return node;
}
void TrieAdd(trieNode_t **root, char *key, int data)
{
trieNode_t *pTrav = NULL;
if(NULL == *root)
{
printf("NULL tree\n");
return;
}
#ifdef DEBUG
printf("\nInserting key %s: \n",key);
#endif
pTrav = (*root)->children;
if(pTrav == NULL)
{
/*First Node*/
for(pTrav = *root; *key; pTrav = pTrav->children)
{
pTrav->children = TrieCreateNode(*key, 0xffffffff);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: [%c]\n",pTrav->children->key);
#endif
key++;
}
pTrav->children = TrieCreateNode('\0', data);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: [%c]\n",pTrav->children->key);
#endif
return;
}
if(TrieSearch(pTrav, key))
{
printf("Duplicate!\n");
return;
}
while(*key != '\0')
{
if(*key == pTrav->key)
{
key++;
#ifdef DEBUG
printf("Traversing child: [%c]\n",pTrav->children->key);
#endif
pTrav = pTrav->children;
}
else
break;
}
while(pTrav->next)
{
if(*key == pTrav->next->key)
{
key++;
TrieAdd(&(pTrav->next), key, data);
return;
}
pTrav = pTrav->next;
}
if(*key)
{
pTrav->next = TrieCreateNode(*key, 0xffffffff);
}
else
{
pTrav->next = TrieCreateNode(*key, data);
}
pTrav->next->parent = pTrav->parent;
pTrav->next->prev = pTrav;
#ifdef DEBUG
printf("Inserting [%c] as neighbour of [%c] \n",pTrav->next->key, pTrav->key);
#endif
if(!(*key))
return;
key++;
for(pTrav = pTrav->next; *key; pTrav = pTrav->children)
{
pTrav->children = TrieCreateNode(*key, 0xffffffff);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: [%c]\n",pTrav->children->key);
#endif
key++;
}
pTrav->children = TrieCreateNode('\0', data);
pTrav->children->parent = pTrav;
#ifdef DEBUG
printf("Inserting: [%c]\n",pTrav->children->key);
#endif
return;
}
trieNode_t* TrieSearch(trieNode_t *root, const char *key)
{
trieNode_t *level = root;
trieNode_t *pPtr = NULL;
int lvl=0;
while(1)
{
trieNode_t *found = NULL;
trieNode_t *curr;
for (curr = level; curr != NULL; curr = curr->next)
{
if (curr->key == *key)
{
found = curr;
lvl++;
break;
}
}
if (found == NULL)
return NULL;
if (*key == '\0')
{
pPtr = curr;
return pPtr;
}
level = found->children;
key++;
}
}
void TrieRemove(trieNode_t **root, char *key)
{
trieNode_t *tPtr = NULL;
trieNode_t *tmp = NULL;
if(NULL == *root || NULL == key)
return;
tPtr = TrieSearch((*root)->children, key);
if(NULL == tPtr)
{
printf("Key [%s] not found in trie\n", key);
return;
}
#ifdef DEBUG
printf("Deleting key [%s] from trie\n", key);
#endif
while(1)
{
if( tPtr->prev && tPtr->next)
{
tmp = tPtr;
tPtr->next->prev = tPtr->prev;
tPtr->prev->next = tPtr->next;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
break;
}
else if(tPtr->prev && !(tPtr->next))
{
tmp = tPtr;
tPtr->prev->next = NULL;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
break;
}
else if(!(tPtr->prev) && tPtr->next)
{
tmp = tPtr;
tPtr->parent->children = tPtr->next;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
break;
}
else
{
tmp = tPtr;
tPtr = tPtr->parent;
tPtr->children = NULL;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
}
#ifdef DEBUG
printf("Deleted key [%s] from trie\n", key);
#endif
}
void TrieDestroy( trieNode_t* root )
{
trieNode_t *tPtr = root;
trieNode_t *tmp = root;
while(tPtr)
{
while(tPtr->children)
tPtr = tPtr->children;
if( tPtr->prev && tPtr->next)
{
tmp = tPtr;
tPtr->next->prev = tPtr->prev;
tPtr->prev->next = tPtr->next;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
else if(tPtr->prev && !(tPtr->next))
{
tmp = tPtr;
tPtr->prev->next = NULL;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
else if(!(tPtr->prev) && tPtr->next)
{
tmp = tPtr;
tPtr->parent->children = tPtr->next;
tPtr->next->prev = NULL;
tPtr = tPtr->next;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
else
{
tmp = tPtr;
if(tPtr->parent == NULL)
{
/*Root*/
free(tmp);
return;
}
tPtr = tPtr->parent;
tPtr->children = NULL;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
}
}
}
/*triedriver.c*/
/*
* To Compile : gcc -o trie trie.c triedriver.c
* To run: ./trie
*/
#include <stdio.h>
#include <stdlib.h>
#include "trie.h"
int main()
{
trieNode_t *root;
printf("Trie Example\n");
/*Create a trie*/
TrieCreate(&root);
TrieAdd(&root, "andrew", 1);
TrieAdd(&root, "tina", 2);
TrieAdd(&root, "argo", 3);
TrieAdd(&root, "timor", 5);
TrieRemove(&root, "tim");
TrieAdd(&root, "tim", 6);
TrieRemove(&root, "tim");
TrieAdd(&root, "ti", 6);
TrieAdd(&root, "amy", 7);
TrieAdd(&root, "aramis", 8);
/*Destroy the trie*/
TrieDestroy(root);
}
In order to print the debug messages, use -DDEBUG while compiling with gcc:gcc -o trie trie.c triedriver.c -DDEBUG
That's awesome!!! I have added it and results are amazing. Thanks for sharing.
ReplyDeleteHi Varun ,
ReplyDeleteCan u please explain the below mentioned condition for removing a key.
if( tPtr->prev && tPtr->next)
{
tmp = tPtr;
tPtr->next->prev = tPtr->prev;
tPtr->prev->next = tPtr->next;
free(tmp);
break;
}
This condition is to delete the node which has both neighbors around it.
Deleteis this site still active? I tried to post something earlier and it is not appearing.
DeleteYes the site is still active. It's just that the comments are moderated.
DeleteI think I may have found a bug, it seems it is not possible to insert words in non strict alphabetical order, for example if I insert "acceptable" and then "accept", the second one gets always inserted with a value (the trieVal_t member of trieNode) equal to 0, I'm now trying to understand what's wrong. I would be pleased if you could confirmmy problem.
ReplyDeleteAs I previously posted I think you should change lines 93-95 to:
ReplyDeleteif(*key)
{
pTrav->next = TrieCreateNode(*key, 0);
}
else
{
pTrav->next = TrieCreateNode(*key, index);
res=pTrav->next;
}
pTrav->next->parent = pTrav->parent;
pTrav->next->prev = pTrav;
Thanks Walter for your suggestions. I'll look up the issue and update the code.
DeleteI have updated the code. Please try to run your tests now.
DeleteMay i knw the advantages of trie algorithm over other algorithms... plz reply...
ReplyDeleteCan you specify the other algorithms ?
DeleteTries are mostly used for efficient string based search.
Varun:
ReplyDeleteSorry to be a newbie, but wondering "where is main"? I'm trying to get an understanding of tries, but can't get this to compile without "main" in .c code.
I have added a sample driver code and instructions in the comments.
DeleteUpdated the code with few improvisations and bug fixes.
ReplyDeleteWouldn't the linked list traversal be a massive bottleneck for large or non-Latin text? Indexing CJK characters could result in tens of thousands of traversals even if the strings themselves were only a few characters long, and long strings would result in an excessive amount of allocations and linked list traversals.
ReplyDeleteThat's not a knock on your code; I just heard recently that tries were perfect for everything but they seem incredibly limited outside of Latin dictionary lookups, and even then a pre-compressed structure optimized for lookups would be superior.
That's an interesting point. Tries work well with limited set of characters as in Latin text. Allocations can be improved using some sort of memory pool , but yes, the traversals can become bottleneck in case of non-Latin text. Thanks for bringing out such a valuable point.
DeleteI researched it further and learned that Unicode is indeed a major limitation of the standard Trie implementation, but I was wrong about a few things:
DeleteSince your code operates on individual bytes rather than Unicode codepoints, individual CJK characters will take no more than 256*4 lookups (four levels of 8-bit values), rather than 2^32 lookups (one level of 32-bit codepoints). That's over 4 MILLION times faster in the pathological worst case.
The recommended fix seems to be replacing the linked list with another data structure. I'll leave it up to your readers to decide which is best for their needs. :]
Also, it turns out that the major limitation regarding Unicode is using it for sorting (these tries operate on individual bytes, while Unicode sorting needs multiple bytes or characters), but your implementation doesn't sort anyway. So it looks like it works reasonably well!
If I tried to compile the code as above, gcc complains that TrieCreateNode is used before it is defined. Cheers,
ReplyDeleteI'm experiencing the same issue, does anyone has a solution to this bug? I haven't enough skills in C yet to fix this code myself. Thanks!
DeleteHere's the exact error message:
Deletetrie.c: In function 'TrieCreate':
trie.c:10:8: warning: assignment makes pointer from integer without a cast [enabled by default]
*root = TrieCreateNode('\0', 0xffffffff);
^
trie.c: At top level:
trie.c:13:13: error: conflicting types for 'TrieCreateNode'
trieNode_t *TrieCreateNode(char key, int data)
^
trie.c:10:10: note: previous implicit declaration of 'TrieCreateNode' was here
*root = TrieCreateNode('\0', 0xffffffff);
Please try the updated code now. Only a declaration was needed before the usage.
DeleteWhen I run triedriver the output I get is:
ReplyDeleteTrie Example
Key [tim] not found in trie
Inside TrieCreateNode you return a NULL pointer in case malloc fails, but when you call TrieCreateNode you do not check if it has returned NULL.
ReplyDeleteIn TrieSearch lvl and pPtr can be eliminated.
ReplyDeleteSorry for the many comments. Feel free to delete them once you've read them.
Feel free to give more :-) Any thing which can improve the quality of the code is welcome.
DeleteIs TrieSearch supposed to work? I'm getting NULL while searching "tina" (after adding it).
ReplyDeleteCheck TrieRemove function how it works. It has usage of TrieSearch.
ReplyDeleteFor the TrieDestroy function, doesn't seem like you need the first two if statements( if( tPtr->prev && tPtr->next) and if(tPtr->prev && !(tPtr->next)) ) because you always loop until you get to the first child of a node, meaning that tPtr->prev is always NULL.
ReplyDeleteI'm confused on lines 50-66 and 120-136. What's the reason for the two print statements as they both create nodes in each section I'm confused on. Is the first create function just creating space and then the second create function actually puts information into the create function?
ReplyDeleteHello,
ReplyDeleteFirst thanks you to share your knowledges over this blog.
I have a problem "double free or corruption (fasttop)" when I launch the code with this main :
/*triedriver.c*/
/*
* To Compile : gcc -o trie trie.c triedriver.c
* To run: ./trie
*/
#include
#include
#include "trie.h"
int main()
{
trieNode_t* root = NULL;
TrieCreate(&root);
TrieAdd(&root, "vic", 1);
TrieAdd(&root, "joe", 2);
TrieAdd(&root, "jar", 3);
TrieRemove(&root, "joe");
trieNode_t* ret = TrieSearch(root->children, "jar");
if(NULL != ret) printf("value %d\n", ret->value);
else printf("not found\n");
TrieDestroy(root);
return 0;
}
I would you know if someone already encountered this problem.
Thanks,
Victor
Solved!
DeleteIn TrieRemove function, just add that :
tPtr->next->prev = NULL;
here :
else if(!(tPtr->prev) && tPtr->next) {
tmp = tPtr;
// HERE
tPtr->parent->children = tPtr->next;
#ifdef DEBUG
printf("Deleted [%c] \n", tmp->key);
#endif
free(tmp);
break;
Victor
I'm relatively new to C so I'm guessing there's a good reason, but why not simply use:
ReplyDeletevoid TrieDestroy (trieNode* node)
{
if(node)
{
if(node->children)
{
DeleteTrie(node->children);
}
if(node->next)
{
DeleteTrie(node->next);
}
free(node);
}
}
Hi Varun,
ReplyDeleteThanks for this great implementation. Any chance you publish a traverse function ? thanks ;)
Varun - How do you support special characters? I use a variation of the geeksforgeeks
ReplyDeletehttp://www.geeksforgeeks.org/trie-insert-and-search/
implementation makes it even faster since, i reach head of a word at constant time. But, when I find special character, I fail to handle it.
in Add, instead of adding to the right most, we do a alphabetically sorted insertion, we would achieve faster search. - Lokesh Mogra
ReplyDeleteHello, can you please explain how you decide how to set values in main() while doing tests on your Trie?
ReplyDeleteTrieAdd(&root, "andrew", 1);
TrieAdd(&root, "tina", 2);
TrieAdd(&root, "argo", 3);
TrieAdd(&root, "timor", 5);
TrieRemove(&root, "tim");
TrieAdd(&root, "tim", 6);
TrieRemove(&root, "tim");
TrieAdd(&root, "ti", 6);
TrieAdd(&root, "amy", 7);
TrieAdd(&root, "aramis", 8);
Is that simply an INT variable incrementing, or how do you do it?
Thanks.
You can choose any values. It's basically a sample data stored at a particular node.
DeleteHi Varun, is it possible to predict other nodes with just two letters as input. For eg over here, if I give input as 'ar', I get argo and aramis as suggestions? Can you please tell me how?
ReplyDeleteAlso this is great implementation.
I've a segmentation error deleting "andrew" (the first entry in the trie).
ReplyDeletehow may it be?
yesterday I observed a segFault deleting the first entry.
ReplyDeletehere I propose a patch:
http://www.cloc3.net/corsoOnline/trie/trie.patch
I have a problem with this implementation.
ReplyDeleteWhen I try to compile the code, I constantly receive the following error:
./trie.h:21:4: error: typedef redefinition with different types ('struct trieNode_t' vs
'struct trieNode_t')
} trieNode_t;
^
./trie.h:21:4: note: previous definition is here
} trieNode_t;
^
//the typedef struct is identical to the given though. Can you help me with this issue?
Which compiler have you used?
Deleteclang
DeleteThe code compiles fine with gcc and msvc. Maybe it has something to do with clang compiler.
DeleteHi Varun, is it possible to predict other nodes with just two letters as input. For eg over here, if I give input as 'ar', I get argo and aramis as suggestions? Can you please tell me how?
ReplyDeleteCertainly, that's what the tries are extensively used for. When you visit a node, you can get all the subtrees of that node to achieve this. This is a simple tree operation. You can easily find the code by googling.
DeleteHey, thanks for sharing the code, I found it very helpful.
ReplyDeletesome little comments about the remove function:
1. if we have a single entry in the trie and we remove it, it cause the root itself to be freed, I am not sure we want it.
2. We may want to add this line right after line 221: tPtr->next->prev = NULL, so we will not accidentally reach an unallocated memory.
Thanks.
How would one go about displaying parts of the tree?
ReplyDeleteThanks
rw
You need to implement a traverse function somewhat similar to TrieDestroy minus the memory deallocation part. And then print each node.
Deletevery easy you only said the example only.
ReplyDeleteplease you can send a same program and it user interactive to create a trie data stracture.
like that accept a string and store and delte.
How can I call TrieSearch in main?
ReplyDeleteThanks
How would I call TrieSearch in main?
ReplyDeleteThanks
when i compiled the program an error occurred highlighting the line #include "trie.h" , saying 21 18 D:\C Prog\trie.cpp [Error] trie.h: No such file or directory.
ReplyDeleteplease help
when i compiled the program an error occurred highlighting the line #include "trie.h" , and stating
ReplyDelete21 18 D:\C Prog\trie.cpp [Error] trie.h: No such file or directory
please help
Great.Thanks for the information, I like the way you put things together in an organized way, and explain your thoughts. Tutor services and tuition services are provided by TheTuitionTeacher in Delhi.
ReplyDeleteHome Tutors in Delhi | Home Tuition service
is it working fro unicodes
ReplyDelete"Wow, what an eye-opening article! Your exploration of [topic] shed light on so many aspects I hadn't considered before. I particularly liked how you delved into [specific aspect or point discussed in the post], as it really deepened my understanding.
ReplyDeleteThe way you structured the content was superb—it flowed logically from one point to the next, keeping me engaged throughout. I also appreciated the practical examples you provided, which made the concepts easier to grasp and apply.