Table of Contents


Maarten Truyens Updated by Maarten Truyens

Summary: Truffle Hunt is an inspirational tool for legal experts; it should not be used as a document archive. Think twice before uploading documents containing confidential or personal data, as this may undermine your non-disclosure and data protection obligations. ClauseBase takes no responsibility for the loss or disclosure of any data uploaded.

As explained in the introduction, Truffle Hunt is designed to allow you to easily search through your set of favourite documents.

Due to Truffle Hunt's sheer power and speed, you may be tempted to start uploading thousands of old documents, which you then share across your team. Depending on your jurisdiction, legal domain and the types of documents you are working, you may run into compliance trouble.

Confidentiality obligations

Attorneys are bound by strict client confidentiality rules. While the rules differ across jurisdictions, they generally require you to ensure information is reasonably protected against leakage. In-house legal counsel and other professions are often subject to similar rules.

Even if you are not bound by such rules, you will probably be subject to various non-disclosure agreements that go in the same direction. It's therefore generally a really bad idea to store an outsourcing agreement that you worked on 4 years ago, and then make that agreement accessible to your whole department — that agreement may very well still be in force.

Some bar associations will be flexible in how "professional secrecy" applies to groups of people, for example allowing easy data exchange between all lawyers in a law firm — no matter how temporary or permanent those lawyers are there. While the wide dissemination of old contracts (e.g., within an entire firm of over 100 lawyers) would then be OK from a deontological perspective, this still remains a significant problem from a GDPR / data security / NDA perspective. In other words, the position of your bar association is an additional factor to take into account, it does not set you free from all other oblgiations.
Persons with access

As anyone knows, the risk of information leakage increases exponentially with each additional person who gets access to information, even when each additional person is also bound by rules of confidentiality. This is the reason why non-disclosure agreements tend to contain an obligation to only allow access to confidential information on a need-to-know basis.

You should therefore carefully consider whether you want to make documents available in a large team, particularly when team members tend to rotate.

Difference with other systems

You may be tempted to argue that Truffle Hunt is, in se, not so different from a typical content management system, or even a traditional drawer with paper files. After all, in most teams, all team members can also gain access to files in those systems.

However, Truffle Hunt's speed and ability to instantly jump between paragraphs and documents will cause team members to gain random access to various bits of information. This is quite different from a targeted search that would be traditionally undertaken when a team member would search for a specific file in a case filing system. Even if you would undertake a broad search in those systems, it would take considerably more time to get access to pieces of information, because you will have to open a file, scroll through it, and so on. Read more about this topic in our in-depth blog post about the downsides of unfiltered repositories of clauses.


The EU General Data Protection Regulation (GDPR), and similar legislation such as the Canadian counterpart, contains similar rules that require you to treat personal data confidentially/securely, taking into account factors such as the type of data, technology, cost, etc.

Because personal data is interpreted in a very broad way — basically anything relating to a potentially identifiable person — the scope of the GDPR's obligations is at the same time larger and smaller than the scope of typical non-disclosure agreements.

For example, personal data will also cover data that is not necessarily "confidential", such as fairly trivial information about a person. Conversely, some data that may be highly confidential (e.g., a multinational's internal financials) will not be within the scope of the GDPR obligations, because they do not relate to a natural person.

Accordingly, even if no non-disclosure agreement or professional secrecy rules would apply, the GDPR may impose limitations on how you treat documents.

Data minimisation

Article 1.c of the GDPR explicitly requires personal data to be minimised. Conversely, when you start uploading thousands of files into Clause Hunt to ensure that you have access to all potentially interesting clauses, you are performing data maximisation instead. While there are some grey zones, the position of most data protection authorities roughly states that you should not be collecting data that is only potentially useful.


Less known is the subtle repurposing prohibition of the GDPR, which applies separate from the confidentiality obligations.

Article 5 of the GDPR states that personal data shall be "collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes".

Article 6.4 then states that in situations when no consent was obtained, several factors must be taken into account to assess whether reuse of personal data is compatible. These factors include the link between the initial and the new purpose, the context, the type of personal data and the impact on the persons concerned; and the security safeguards.

In other words, completely separate from its confidentiality & security obligations, the GDPR may outright prohibit storing documents in Clause Hunt, because the purpose of storing them in Clause Hunt (i.e., for getting inspiration during drafting) is quite different from the initial purpose for which this data was stored (i.e. for handling a file, e.g. contract negotiation, litigation, legal advice, etc).

All of this depends on all the factors mentioned. It remains a gray area, but notice in particular the factor of the security safeguards. Applied to Clause Hunt, this probably means that making content available to a wide range of persons will negatively contribute to the compatibility between the initial and new purpose.

Processing ground

The GDPR requires an explicit legal ground in order for the processing of personal data to be lawful. Article 6.1 GDPR lists the six possible legal grounds. While processing personal data for handling client files is perfectly fine (6.1.a: consent, or 6.1.b: necessity for the performance of a contract), only the grey zone of the "legitimate interests" of article 6.1.f can be invoked to process personal data for efficiency or quality reasons, as is the case when uploading documents to Clause Hunt.

That "legitimate interests" legal ground is often invoked as a last resort, but it is under increased scrutiny by authorities and courts. For example, in case C‑252/21 (Meta/Facebook) of 4 July 2023, the EU Court of Justice emphasized that the processing of the personal data should be "strictly necessary" in order to reach the legitimate interests specified by the company, meaning "that the legitimate data processing interests pursued cannot reasonably be achieved just as effectively by other means less restrictive of the fundamental rights and freedoms of data subjects" (nr. 108).

It remains to be seen to which extent uploading documents with personal data to Clause Hunt is "strictly necessary" for efficiency/quality reasons, as there exist equally valid alternatives (such as manual cleaning and uploading to a quality clause library, which in our opinion is a much better idea from a quality & time-saving perspective).

Also interesting to note is that the ECJ did not accept "product improvements" as a reason, in light of the scale of the processing that Meta/Facebook undertakes, as well as the fact that the consumers did not expect that their data was used for this reason. This is highly relevant for our discussion here, because legal teams will want to do the clause extraction for service improvement reasons.

Rules of thumb

  • Limit the amount of persons who have access to a certain basket with documents. For this reason, Truffle Hunt allows you to create an unlimited number of lockers, so you can segment as you see fit.
  • Limit the persons who have access to a certain locker to those who already had access to the documents in that basket (e.g., because they were previously involved in handling the file).
    • Consider subdividing an individual department into separate sub-baskets. For example, in a law firm, you may want to consider creating a basket for each partner, and grant access to only that partner and the associates it closely works with on a day-to-day basis. Similarly, for inhouse legal departments, you may want to create a basket per subject matter.
    • Also consider allowing creating personal baskets. We often hear from legal experts that, even if they are strongly inclined to share content in a clause library, it "feels" different to bluntly share significant amounts of past working documents with random colleagues, particularly when those documents have their own "history" that is only known to the original author.
  • Don't blindly upload all existing files into Truffle Hunt; instead, make a selection of your the best (favourite) documents.
  • Don't treat Truffle Hunt as a document archive. Instead, upload copies of selected documents you find in your document management system, and upload copies thereof.
  • Do not upload very old files, as the likelihood that the information they contain is outdated, will obviously be higher. Take into account that the GDPR contains several provisions regarding data quality and a "right to be forgotten".
  • Upload templates if possible (instead of actual contracts). Usually they contain no confidential or personal data.
  • Manually clean your old documents, so that they do not contain any confidential or personal data.
  • Consider using scrubbing software before upload, such as the open source MAPA project or Microsoft Presidio. However, you should be aware that the effectiveness of these tools varies between languages and documents, so you should expect their performance to generally not exceed 80%. You probably need to combine this with some human screening. (We are investigating the possibility to use "Large Language Models" (LLMs) such as GPT and Google Bard to perform the scrubbing, so in the future we may offer such tools ourselves.)
  • If you are subject to the GDPR and rely on the "legitimate interests" legal ground to reuse the data of previous files, then remember to transparently inform your data subjects — upfront, i.e. not afterwards.
For all reasons above, we deliberately do not support importing your entire content management system in one go (e.g., an entire iManage or NetDocuments hierarchy). Such blind mass-import is a very bad idea, as this is all about data maximisation (the opposite of what the GDPR requires) and also because the signal-to-noise ratio of relevant documents versus irrelevant documents will be very bad. We therefore require you to do at least a minimum of prefiltering in your documents before uploading.

Of course, an even worse idea would be to automatically synchronise your content management system with ClauseBuddy, so that essentially every new file that would get added to your content management system will automatically end up in ClauseBuddy. In terms of compliance nightmares, this obviously trumps everything, and we don't want to be part of it.


This is the unfortunate part where we have to tell you that we are really serious about the information in the paragraphs above. Please check clause 8.3 of our T&C, where we explicitly disclaim quite a lot of responsibility.

More information

You may want to read an in-depth blog post from the renowned data protection law firm Timelex about legal tech compliance pitfalls.

How did we do?

Managing baskets