Maarten Truyens Updated by Maarten Truyens

Summary: Clause Hunt is an inspirational tool for legal experts; it should not be used as a document archive. Think twice before uploading documents containing confidential or personal data, as this may undermine your non-disclosure and data protection obligations. ClauseBase takes no responsibility for the loss or disclosure of any data uploaded to standard lockers, but offers you the possibility to create private servers that are entirely under your own control.

Due to Clause Hunt's sheer power and speed, you may be tempted to start uploading thousands of old documents, which you then share across your team. Depending on your jurisdiction, legal domain and the types of documents you are working, you may run into compliance trouble.

Confidentiality obligations

Attorneys are bound by strict client confidentiality rules. While the rules differ across jurisdictions, they generally require you to ensure information is reasonably protected against leakage. In-house legal counsel and other professions are often subject to similar rules.

Even if you are not bound by such rules, you will probably be subject to various non-disclosure agreements that go in the same direction.

Persons with access

As anyone knows, the risk of information leakage increases exponentially with each additional person who gets access to information, even when each additional person is also bound by rules of confidentiality. This is the reason why non-disclosure agreements tend to contain an obligation to only allow access to confidential information on a need-to-know basis.

You should therefore carefully consider whether you want to make documents available in a large team, particularly when team members tend to rotate.

Difference with other systems

You may be tempted to argue that Clause Hunt is, in se, not so different from a typical content management system, or even a traditional drawer with paper files. After all, in most teams, all team members can also gain access to files in those systems.

However, Clause Hunt's speed and ability to instantly jump between paragraphs and documents will cause team members to gain random access to various bits of information. This is quite different from a targeted search that would be traditionally undertaken when a team member would search for a specific file in a case filing system. Even if you would undertake a broad search in those systems, it would take considerably more time to get access to pieces of information, because you will have to open a file, scroll through it, and so on. Read more about this topic in our in-depth blog post about the downsides of unfiltered repositories of clauses.


The EU General Data Protection Regulation (GDPR) contains similar rules that require you to treat personal data confidentially/securely, taking into account factors such as the type of data, technology, cost, etc.

Because personal data is interpreted in a very broad way β€” basically anything relating to a potentially identifiable person β€” the scope of the GDPR's obligations is at the same time larger and smaller than the scope of typical non-disclosure agreements.

For example, personal data will also cover data that is not necessarily "confidential", such as fairly trivial information about a person. Conversely, some data that may be highly confidential (e.g., a multinational's internal financials) will not be within the scope of the GDPR obligations, because they do not relate to a natural person.

Accordingly, even if no non-disclosure agreement or professional secrecy rules would apply, the GDPR may impose limitations on how you treat documents.


Less known is the subtle repurposing prohibition of the GDPR, which applies separate from the confidentiality obligations.

Article 5 of the GDPR states that personal data shall be "collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes".

Article 6.4 then states that in situations when no consent was obtained, several factors must be taken into account to assess whether reuse of personal data is compatible. These factors include the link between the initial and the new purpose, the context, the type of personal data and the impact on the persons concerned; and the security safeguards.

In other words, completely separate from its confidentiality & security obligations, the GDPR may outright prohibit storing documents in Clause Hunt, because the purpose of storing them in Clause Hunt (i.e., for getting inspiration during drafting) is quite different from the initial purpose for which this data was stored (i.e. for handling a file, e.g. contract negotiation, litigation, legal advice, etc).

All of this depends on all the factors mentioned. It remains a gray area, but notice in particular the factor of the security safeguards. Applied to Clause Hunt, this probably means that making content available to a wide range of persons will negatively contribute to the compatibility between the initial and new purpose.

Rules of thumb

  • Limit the amount of persons who have access to a certain locker with documents. For this reason, Clause Hunt allows you to create an unlimited number of lockers, so you can segment as you see fit.
  • Limit the persons who have access to a certain locker to those who already had access to the documents in that locker (e.g., because they were previously involved in handling the file).
    • Consider subdividing an individual department into separate sub-lockers. For example, in a law firm, you may want to consider creating a locker for each partner, and grant access to only that partner and the associates it closely works with on a day-to-day basis. Similarly, for inhouse legal departments, you may want to create a locker per subject matter.
    • Also consider allowing creating personal lockers. We often hear from legal experts that, even if they are strongly inclined to share content in a clause library, it "feels" different to bluntly share significant amounts of past working documents with random colleagues, particularly when those documents have their own "history" that is only known to the original author.
  • Don't blindly upload all existing files into Clause Hunt; instead, make a selection of your the best documents.
  • Don't treat Clause Hunt as a document archive. Instead, upload copies of selected documents you find in your document management system, and upload copies thereof.
  • Do not upload very old files, as the likelihood that the information they contain is outdated, will obviously be higher. Take into account that the GDPR contains several provisions regarding data quality and a "right to be forgotten".
  • Upload templates if possible. Usually they contain no confidential or personal data.
  • Consider using scrubbing software before upload, such as the open source MAPA project or Microsoft Presidio. However, you should be aware that the effectiveness of these tools varies between languages and documents, so you should expect their performance to generally not exceed 80%. You probably need to combine this with some human screening. (We are investigating the possibility to use "Large Language Models" (LLMs) such as GPT and Google Bard to perform the scrubbing, so in the future we may offer such tools ourselves.)
  • Consider setting up a private server if you want to upload documents containing confidential or personal data. ClauseBase offers paid customers the possibility to set up their own private server, which is entirely under their own control.


This is the unfortunate part where we have to tell you that we are really serious about the information in the paragraphs above. So here we go:

To the maximum extent allowed by law, ClauseBase shall not be responsible for any use you make of Clause Hunt outside of the rules set forth above. You are entirely responsible for any data loss and any data leakage that would occur when using the Clause Hunt feature. Moreover, you will indemnify ClauseBase for any third party claim (e.g., from your clients, counterparties or regulatory authorities) that would be submitted against ClauseBase related to your use of the Clause Hunt feature.

More information

You may want to read an in-depth blog post from the renowned data protection law firm Timelex about legal tech compliance pitfalls.

How did we do?

Adding & removing documents