We stand with Ukraine

Building Your Taxonomy (Part 3)

Expert.ai Team - 17 November 2022

This is the final post in our 3-part blog series on how to utilize taxonomies to leverage the value of unstructured language data in your organization.  

For today’s enterprise, your unstructured language data is not merely a byproduct of operations but a vital resource to be mined for actionable insights. Whether it’s internal information that your employees need to be able to work more effectively and efficiently, or customers who rely on you for knowledge contained in the documents, reports, transcripts, PDFs or social media posts you manage, the stakes couldn’t be higher. 

A taxonomy helps users find information on your website, your intranet or any other digital repository. They do so by addressing the very challenges that make information discovery so difficult in the first place.


  • Manage the ambiguity in language so that all of the concepts and terminology in your content is understood in the proper context. 
  • Allow you to designate meaning for all of the terms, acronyms and concepts that matter for your business and make sure that your content is tagged accordingly. 
  • Transform your enterprise content into actionable intelligence by linking data across information repositories and silos together. 

So, how can you leverage taxonomies at your organization?  

Taxonomy Options: Build, Borrow or Both?  

The good news is that you have options depending on the organizational preferences and technology you can leverage.   

A great starting point is to consider the following questions:   

  • Are there any existing standard taxonomies for your industry or the one you are targeting?   
  • Do you have any information, or in-house knowledge (aka folksonomies), that you can leverage?   

Luckily, you don’t have to start from scratch.  

There are many industry-specific taxonomies available (MeSH for medical topics, IPTC for media topics are just two examples) that you can use as is, or as a starting point to reduce the time spent developing and customizing your own taxonomy. The TaxoBank database and WAND Inc. are great resources for finding these industry-specific, foundational taxonomies.  

However, if you work in a very specialized domain or find yourself somewhere between the standards available, you will need to consider building your own content organization scheme. As you might expect, this can be a labor-intensive task that would require a major investment of time and subject matter expertise to properly tag content. But again, you don’t have to start from zero. 

Technology exists to help you jumpstart the creation of a taxonomy. Expert.ai offers an out-of-the box feature, called Magic Taxonomy, that will automatically classify any kind of document, such as white papers, news and newsletters, books, frameworks, articles, manuals reviews, etc. This ensures that your content meets the criteria we established above—properly understood, consistently and correctly tagged, connected across all of your repositories—while avoiding the time and expense of manually creating a taxonomy from scratch.   

Our knowledge models contain concepts and relationships that are specific to different industries, domains, roles and use cases that can be used out of the box and further customized to meet the specific needs of your project. Our library includes knowledge models for the media, life sciences and healthcare and finance domains, as well as for ESG, sentiment and personally identifiable information (PII). 

Taxonomy Design Considerations 

Whether you borrow or build your taxonomy, there are several considerations that you’ll need to manage to make sure you’re successful. Here is a quick checklist: 

  • Make sure your taxonomy mirrors your domain knowledge. Your taxonomy is an opportunity to organize your information—your documents, insights and all your data assets—in a way that supports how your users would navigate your content.  
  • Understand your target users. Is your audience the general public or is it a professional audience? Again, it’s important to make sure that the content or insight you deliver is understood by your users and that you classify and deliver your information so that it can be navigated the way that is understood by domain experts.      
  • Set yourself up for success in content tagging. This is a critical part of the process that is best performed by those with the deepest subject matter expertise in your organization. However, content tagging takes time, and these same resources are likely already engaged in other higher value work. AI technology specialized in language understanding can provide the high-precision tagging you need, and subject matter experts can be brought in at various stages to help verify accuracy. 
  • Enrich your content.  Then we recommend you link your established taxonomy to ALL of your target content, often referred to as content enrichment. This will optimize your results. Obviously, your taxonomy needs to be representative of the content it targets. It is designed to cover the variety of topics a dataset is addressing. This requires annotation and testing to improve classification accuracy results. Your taxonomy is a living organism that is fed by your content. Ultimately, you cannot create and test the taxonomy without a representative dataset.  
  • Keep your users in the loop. Users must be involved to improve the relevance of the solution design for information access and discovery. This is where you can really influence user adoption and see the impact on the end solution you create.   

Driving Successful Data Discovery: Recommendations 

To summarize and review, here are some recommendations to guide you on your journey:    

  1. Start with the goal that drives your discovery initiative. Understand the obstacles you are faced with. Ask yourself: do you have unexplored archives, is your content hard to navigate, or is your search inefficient? Qualify the obstacles.     
  2. Remember your options. You don’t have to start with a blank slate. You can rely on existing taxonomies or knowledge models, or you can leverage AI tools to build out a new taxonomy. You can even reuse and recycle taxonomies. This is where technology can really help. 
  3. Make sure that your taxonomy properly covers the domain(s) represented by your content. We can’t emphasize this enough. The content that you will use to build your taxonomy must come from multiple representative data sets that reflect your business—don’t leave anything out.     
  4. Think about your taxonomy like it’s a living organism. As your business, customers and markets evolve, it’s important to make sure your taxonomy evolves with it. You will want to occasionally measure how effectively you are solving your business problem with your taxonomy. Are you driving more engagement from your users? Is your team more efficient? Is your search more accurate? Are your archives fully represented in your taxonomy? What about your newest initiatives? These metrics will help ensure that you’re on the right track.   

We hope this blog series helps on your organization’s journey to data discovery! Start from the beginning: Part 1: The Data Discovery Challenge and Part 2: How Taxonomies Solve Your Data Discovery Problems.  

For more information, feel free to reach out to us here.