Suggested article to read before: Orbit NG: New infrastructure for the Search
TABLE OF CONTENTS
- Data related explanations
- Differences on aliases and search fields
- F operator’s related differences
- S and P operators' differences
- D and W operators' differences
Data related explanations
These kinds of differences are minor in quantity; however, they can explain many of the slightly different results as the two search engines are not identical, and the data is not either.
Data indexation by QP and ES is not strictly identical:
- Within QP, the FULLTEXT publications are introduced as soon as received from the providers. Families in FAMPAT and patents in FULLPAT are updated once a week (Sunday) as a construction takes place. On the other hand, within ES, all the publications/patents/families are introduced and updated at the same time. This simplifies a synchronous update of the 3 collections, thus faster than before.
- The direct consequence of this process is that on Tuesdays and the days after, your searches performed in Elastic Search can retrieve more results compared to those from QP, as the ES collections will include more data than the QP ones.
- The maintenances and fixes are now applied on the Elastic Search collections, and not always on the QP databases.
- When these improvements do not update a significative number of documents and are barely noticeable, we do not update the QP data to prevent any discrepancies, and to ensure operation for a longer period.
Differences on aliases and search fields
An alias is a search field which replaces many current search fields and avoids naming each field. Please find below the list of impacted aliases and the previous/current behaviour:
- /IPC and /CPC
- On QP, these aliases did search for all current and historical IPC codes.
- On ES, /CPC and /IPC are no more alias as they do search for current classification codes only. To include the codes history, please explicitly add /ICH and /CPCH fields.
- /DESC and /CLMS did search for all descriptions (in all languages) and all claims (in all languages).
- Now on ES, “original” claims and descriptions (=not in Latin languages) are no more part of these search fields. Again, you can explicitly add the related fields like /OCLM and /ODES
F operator’s related differences
The F operator allows to search within the same field. In QP
(Usb F Key)/TI/AB
would retrieve documents where “Usb” and “Key” are simultaneously present in the titles, or in the abstracts.
However now on ES, this operator will work slightly differently:
- On QP, it does a search within the same publication stage, as based on the SDOC operator
- On ES, the F operator will make the search within the same fields among the different publications stages of the family / patent. If you need to mimic the previous behaviour, please replace advantageously by 99D or P operators.
S and P operators' differences
As the search done the F operator will slightly change, the S and P operators will evolve too, mainly due to the sentence and paragraph structure on ES:
- Elastic Search restrictions on sentences limit the number of terms up to 200 words. Furthermore, the paragraph is now defined as a block up to 20 000 words. Beyond these figures, each structure is now split into 2, or as many parts as are necessary to cover the sentence or paragraph. It can therefore happen that a document is found in QP mode and not in ES mode when the two searched words are separated by more terms than the limit indicated above.
- When more than 2 words are combined with S or P, for instance with:
(Usb S key S keyboard)/TI
- On QP, the engine does a first search with Usb in the same sentence as key, then a second search with Key in the same sentence as Keyboard.
- On ES, all the terms must be part of the same sentence (thus among the 200 words of a sentence)
D and W operators' differences
The operators D and W respect the sentence and paragraphs structures as described above.
For instance,
(Key 3D Usb)
will not find a document where Key is at the end of a sentence, and the word Usb at the beginning of a different sentence, even if these words are close when you read the text.
The 200 words sentence structure may have an impact on exceptionally long sentences. Nonetheless, the main difference lies in the distance calculation between words, for instance with the following search
key 1D usb 3D keyboard
On QP, as described for S and P, the search is done in two times where Key and Usb are separated up to 1 word, then another search where Usb and Keyboard are separated by 3 words max.
On ES, the engine searches where these 3 words are part of an interval of 4 words (1+3).
To mimic the QP search, you must introduce parenthesis like:
((key 1D usb) 3D keyboard)