Table of Contents

Using Truffle Hunt

Maarten Truyens Updated by Maarten Truyens

What is Truffle Hunt?

Truffle Hunt enables you to search through thousands of extracted clauses from hundreds of old documents, to quickly find a relevant clause.

Truffle Hunt allows you to upload a substantial collection of DOC/DOCX/RTF files. ClauseBuddy will then search through those files, extract fragments of text that seem relevant, and discard the original file. You can then quickly search through all extracted clauses, in search of good clauses.

Why would I want to use it?

The biggest advantage of truffle hunt is that it requires almost no effort to build a collection of clauses — simply upload documents once, and have everything processed automatically for you. This is extremely tempting for busy legal teams, who are always pressed for time, yet still want to benefit from low-hanging tech fruit.

Using Truffle Hunt is ideal when you know exactly what you are searching for, and you do not need any surrounding context to assess the merits of a clause. If those circumstances apply, then it will allow you to retrieve clauses quicker than Scroll Hunt, with almost no effort.

You should be aware, however, that there are tradeoffs:

  • You will save time upfront, but pay later with each and every search. For a typical legal team, the time spent on a day-to-day basis will exceed the time spent on curating a quality clause library.
  • Legal quality will suffer, because clauses are extracted from their context. Quality clause libraries simply offer much better guidance, particularly towards young team members.
  • Due to the sheer volume of clauses extracted, you may run into compliance issues. If you are like most legal experts, you are subject to client confidentiality rules, non-disclosure obligations, data protection rules and various internal security rules. No matter how secure ClauseBuddy is, storing thousands of confidential on external systems for mere efficiency reasons, is usually a bad idea.

The name "truffle hunt" is therefore deliberately chosen: it refers to the fact that it's a quick & dirty approach to search for hidden gems. It is OK to get started, and has a few interesting unique use cases, but for 90% of the legal teams quality clause libraries are a much better option.

When reading the above, it should be obvious that we here at ClauseBase are not a big fan of this approach. In fact, we wrote an in-depth blog post about all the problems of the fairytale of automatic clause extraction databases, where we described how we had initially developed this module, but then threw it away once we saw how our initial testers started using it.

However, after publishing this blog, we kept receiving requests from customers who want to use it, despite all our warnings. In the end, instead of stubbornly preaching why it's a bad idea, we decided to make this module available. We did, however, change the name from "Haystack" to "Truffle Hunt" to emphasize the dirty approach.

Understanding the clause extraction process

ClauseBuddy automatically extracts clauses from your uploaded documents. This requires zero effort from your side — just a little bit of patience, about 1.5 seconds per document if you have a good internet connection.

However, you should be aware that automatic extraction is hit-and-miss process. For example, have a look at the clause below. Assuming you think it is wonderfully written and you want to include it in your clause library, how would you deal with it? Keep it as a whole and insert it as-is, because that’s the nature of such a “miscellaneous” clause? Or store each of the individual paragraphs? Or perhaps store both?

ClauseBuddy uses advanced algorithms and a little bit of artificial intelligence to extract only relevant content, and split the content in a reasonable way. But do not expect this to be perfect: there will be many situations where you would make a different splitting assessment, erroneously considers text to qualify as a title, or where the software inadvertently ignores certain paragraphs, or instead extracts irrelevant paragraphs.

While we continue improving the extraction process, this is simply the tradeoff of having everything extracted automatically, as opposed to manual clause curation. The quality of the extraction will depend on the type of document (the software is trained on contracts), the language, the formatting (automatically formatted documents work much better), and so on.

User interface for searching

Getting started with Truffle Hunt's search is quite easy, but there are actually quite some additional search tools that you may want to read about.

  1. The Basket Selection allows you to select different baskets to search in.
  2. The Language Selection (not shown if your account only supports English) allows you to specify the language of your search.
  3. The Definition checkbox allows you to limit searches to paragraphs that resemble contract definitions.
  4. In the Primary Query box you type words that should be present in the document. If you checked "Definition", then this box will instead allow you to search for a legal term.
  5. In the Secondary Query box — only shown when searching for definitions — you type words that should be present in the definition's body text.
  6. The Switch Buttons allow you to transfer your query to other search modules (when available): keyword search in your quality library, Sample Hunt and Scroll Hunt.
  7. In the Filter panel you can narrow down found documents to certain categories, authors, years, files or titles.
  8. The Undo/Redo buttons allow you to go back and forth in your search actions.

1. Basket Selection

Truffle Hunt allows you to create different "baskets" with clauses — e.g., one for the Corporate department, one for Employment, one for IT/IP, and perhaps also a personal basket for every legal expert.

Creating different baskets is not only recommended to create a primary segmentation within your clauses (i.e., most of the time an employment lawyer is not interested in finding IP-related clauses), but is also useful for compliance reasons and to ensure internal information rules are met.

When you switch between baskets, Truffle Hunt will re-submit your current search query. So if you do not find any useful results in one basket, you can easily try in another basket to which you have access.

2. Language selection

The Language Selection dropdown (not shown if your account only supports English) allows you to specify the language of your search.

Because searches take into account grammatical variations (e.g., searching for "confidential" will also find "confidentially"), it is necessary to specify the language for each search.

Upon import, ClauseBuddy automatically defines the language of a document, based on the language of the dominant content. The results may therefore be unpredictable with mixed-language documents.
You should be aware that when you select the wrong language, it is likely that no search results will be found

3. Definition

When checked, the Primary Query box will be used to search for the legal term, and the Secondary Query box will be used to search for words in the body of the definition. See the explanation below.

4. Primary Query

In this box you type in your main search terms. Truffle Hunt will then search for clauses where those terms are found in either the title or the body of the clause.

Truffle Hunt will also take into account grammatical variations, depending on the language in which you are searching. For example, in the screenshot below, you can see that even though "liabilities" was searched, results with "liability" are also included.

Searching for a definition

If you have enabled the "Definition" checkbox, then the primary query converts into a defined term selector. If you want to narrow down the body of definitions, then you should enter search terms in the Secondary Query box.

For example, in the screenshot below, you can see a search for a definition of the term "contract", narrowed down to definitions whose body also contains the words "joint" and "venture"

Similar to searching in Google, you can quote search terms, to ensure that terms are not merely present within the clause, but are instead found next to each other.

For example, when you want to search for licensor's liability, the following search will contain both good results and bad results. The second search result, for example, includes both the word "licensors" and the word "liability", but not next to each other

While a quoted search will be much better in this case, as then only the first result will be shown.

Removing terms

By adding a hyphen to a term, you will remove search results having that term near the other terms specified. For example, this will search for clauses with the expression garden leave, but without the word executive.


Be aware that the software will always remove stopwords from your search terms, i.e. common words without much informational value, such as "the", "will", "shall", "if", "further", and so on. Even in literal (quoted) searches, those words will be ignored.


Advanced tip: you can also search using OR. For example, when you want to search for clauses that must in any case contain the word "liability", and should additionally contain either the word "licensor" or the word "guarantee" (one of the two is enough, both are not necessary), you can use an OR search, similar to the screenshot below.

Usually you will want to include parentheses, to explicitly indicate the relationship between the words. Similar to how in high school it is unclear whether "8 + 6 / 7" means "(8 + 6) divided by 7" or instead "8 augmented by (6 divided by 7)", you must use parentheses.

5. Secondary Query

In this box, you can specify other words that need to be present in the definition's body. See explanation above.

6. Jump to other search modules

By pressing this button, you can easily switch to the other search modules in ClauseBuddy. For example, when you click on the Antilope, your current search terms will be copied to Scroll Search, and ClauseBuddy will jump to that search module.

7. Filter Panel

In this panel you can filter found results on criteria such as the document's title, year, author, client or dossier name. If many different results are available for a certain criteria, then a filter box will be shown that allows you to easily narrow down the results:

8. Undo/redo buttons

With these buttons, you can go back & forth in your list of historical actions. So if you liked your initial search, but then made a few changes that were not so smart after all, you can go back to your initial search by hitting the Undo button a few times.

User interface for using clauses

When you have found an interesting clauses, you can use the buttons at the right side to interact with the clause.

Show entire document

This button allows you to show all the extracted clauses from the document to which the current clause belongs. All those clauses will then be shown in the order they were present in the original document.

This allows you to get some feeling for the original context of the clause. However, you shoud be aware that not all clauses are extracted (and that some clauses may afterwards be manually deleted), so that some "holes" will naturally exist when all clauses are lined up.

Show similar clauses

This button allows you to search for clauses in the basket that are similar to the current clause, from a semantic point of view. For example, when a clause contains words such as liability, claim and court, then the software may find clauses that talk about those words, or instead about closely related words — e.g., synonyms such as accountability and demand, but also associated words, such as arbitrage.

While the software uses Artificial Intelligence to calculate similar clauses, you should be aware that there is quite a grey zone involved: what human legal experts would consider to be "similar", will not necessarily be what the software finds similar.

With this popup menu, you can delete the current clause, or instead the entire document (with all clauses inside). Please note that this action cannot be undone.

Compare text

This button allows you to compare the text of the current clause with any text selected in the document. (It is only shown when you are working inside MS Word.)

Add to quality library

This button extracts the title and boy of the clause, and inserts it into the Add clause dialog box.

Send to curator

This button allows you to send the title and body of the clause to your curator, who will then decide whether (and how) to add it to your team's library.


This button allows you to have GPT4 redraft the contents of your clause, through a simple prompt. Afterwards, you can then request a comparison between the original text and the new text.

How did we do?

Using Scroll Hunt

Managing lockers & baskets