Updated by Maarten Truyens
DocAnswer contains two modules. Both are ideal for quickly getting to know your document.
- Ask Questions allows you to interactively "interrogate" your Word-documents with the help of Artificial Intelligence (AI). Ask a question, and get both an concise answer and an overview of all the relevant parts of the document that deal with your question.
- Semantic Search allows you to search in your document for semantically related words.
How does Semantic Search work?
This module is the easiest to understand from a technical perspective.
When you enter one or more keywords, ClauseBuddy will not only search for those exact keywords, but also for grammatical variations (e.g., "confidential" would also lead to "confidentially", and "disclose" would also lead to "disclosing" and "discloses") and for words that are — semantically speaking — closely related.
Examples of semantically related words:
- "payment" will also find varations of the verb "pay", as well as adjectives such as "payable", nouns such as "cash" and "invoice" and "money"
- "court" will also find variations such as "judge" and "trial" and "jury". Be aware that it may also find expressions such as "basketball court", if they would be present in your document
It then highlights all paragraphs that contain all those words. With a simple click, MS Word will then highlight that particular paragraph.
Semantic search is ideal for situations where you want to quickly find something in long documents, and you are not sure which particular keyword is being used. After all, Word's standard function only works great when you know which keywords are used.
The local calculation is also the reason why Semantic Search can be super quick — it can search through a document of 100 pages in less than a second.
How does Ask Questions work?
While Ask Questions often feels like magic — in many cases you can really "talk" to your documents — you should be aware of what's going on beneath the surface in order to understand the limitations of the technology.
When you submit a question, the following is happening beneath the surface.
- Your document is split into individual clauses.
- A semantic AI engine converts your question and each individual clause into hundreds of tiny semantic datapoints that mathemetically express the meaning of the clause. For example, a typical confidentiality clause will be converted into datapoints such as "confidentiality", "obligations", "protection", "electronic data", "storage", "need-to-know", "exceptions", and so on.
- All clauses in the documents are evaluated and filtered, so that only the clauses with the highest semantic matches are kept — similar to how the Semantic Search explained above works. The selection is performed by comparing the semantic datapoints of your question with the semantic datapoints of each clause. This goes beyond literal comparisons, so that if your question talks about "intellectual property", clauses with "IP" and "licenses" will likely get a high matching score.
- The highest matching clauses are fed into another AI engine (GPT-4) that then answers your questions on the basis of the contents of those highest matching clauses.
Due to the filtering process, not all questions can be answered by ClauseBuddy. In particular, you should be cautious for the following situations:
- The answer to your question is distributed over several different clauses in your document.
In such situation, it may happen that the clause filter will miss a few relevant clauses, particularly when clauses use wording that is very distant from the wording in your question. The contents of those clauses is then simply not taken into account when the answer is formulated.
For example, when you ask "Is the service provider liable for gross negligence?", the typical "liability cap" clause will likely be taken into account. However, an unrelated clause that talks about the scope of the provider's responsibility, may get skipped, depending on how semantically close the actual wording of that clause is, compared to the question.
Similarly, when you ask for a question about the duration and termination of a contract, it is likely that the "applicable law" clause will be filtered away, because it is semantically unrelated to duration and termination. When answering your question, GPT-4 will therefore be unaware of the applicable law, which — particularly for continental European jurisdictions — may lead to wrong answers, because the default (complimentary) rules of the relevant civil code may provide a partially different answer.
- The answer to your question relies on clauses that use semantically unrelated terms.
For example, when you ask a question about the transfer of intellectual property, then it may happen that a clause that talks about "Foreground Material" (which is formally defined in the contract foreground intellectual property) gets removed during the filtering process.
- The answer to your question relies on specific or recent knowledge about the world.
Be aware that GPT-4 is trained on what is publicly available on the Internet, so it has no knowledge about your organisation. Also remember that GPT-4 was trained with information up to September 2021, so it is unaware of what happened in the world after that date.
- The answer to your question requires significant legal reasoning that goes much deeper than what is literally mentioned within the document.
Remember, GPT-4 is a Large Language Model — it is not a Large Legal Model. While it has a good understanding of popular legal domains (e.g., commercial law), particularly in larger jurisdictions, its knowledge of specialised legal areas and smaller jurisdictions is limited.
You should therefore not expect it to accurately answer questions that require significant legal reasoning, particularly when that reasoning relies on information that is not explicitly mentioned in your document. For example, when you would ask whether the service provider is liable in a certain situation, GPT-4 may answer "no", while in reality the rules of the applicable jurisdiction would tell the opposite.
- The primary use-case of DocAnswer is to quickly get a feeling about a document. Similar to how you would use other types of AI, please do not rely on the answers provided by the software without doublechecking yourself.
- Try to be as explicit as possible in your wording. Longer and more detailed questions usually lead to better results, because the AI-filter will then perceive more semantic proximity between your question and relevant paragraphs.
- Do not treat Large Language Models such as GPT-4 as experienced lawyers with deep knowledge of specific legal subjects. Instead, think about them as very bright law students that have only one year of experience with legal subjects.
- Do not apply DocAnswer to poorly structured documents. Its performance is highest with documents that are split into clauses (i.e., text parts with a title and a handful of associated paragraphs); it will not perform well in documents that have long chapters with unstructured text.
- Fun fact: the semantic datapoints are actually multilingual. So in many cases you can actually ask a question in another language than the document's language.
Be aware that your entire document, split into independent clauses, will be sent to Microsoft's semantic AI engine, for conversion to semantic datapoints. In a next stage, the relevant (filtered) paragraphs are sent to Microsoft's GPT-4 engine. For our paid customers, we use the European region of Microsoft, while for our free users we use the American region.
Microsoft provides strong confidentiality guarantees: your data will not be reused in any way, neither by Microsoft, nor by OpenAI.