Today’s post is about an implementation approach for visibility rules using Apache Solr. Recently I and my colleagues faced with a quite tricky B2B case on practice. The result of the collaborative research and design work was an article Implementing Product Whitelisting/Blacklisting in SAP Commerce Cloud for Large Product and Customer Bases. In contrast to the presented in the article approach the described below approach is less general but should provide better performance due to broader usage of Apache Solr capabilities.
Assume visibility rules are defined as a Solr collection (or a core). The collection stores access control lists (ACL) of a special form. Each document in Solr collection consists of four fields:
product_id
- an identifier of product that ACL record belongsinclusion
- a list of user or user group IDs who can see the productexclusion
- a list of user or user group IDs who are restricted to see the productexclusivity
- a list of user or user group who have exclusive rights to see productThe exclusivity concept can be uncovered using the following rules:
Exclusivity > Exclusion > Inclusion.
The exclusivity concept can be uncovered using the following rules:
exclusivity
list then all other lists should be ignored, and the user will be able to see a product in the search result listexclusivity
list and it is not empty, then there is no need to check other list and the current user won't see the productexclusivity
field is empty then all other lists should be in use. From the business standpoint, the logic behind exclusivity is to give exclusive rights to a customer for a specific product and make sure that no other customer will be able to see and as a result to buy a product.The task is to make Apache Solr aware of those rules.
The first step is to teach how Solr to honor inclusion. This can be done by combining Join Query Parser and Filter Query to join and filter a separate Solr collection with the product one:
.. "fq": "{!join from=id to=id fromIndex=acl_core}inclusive:(user user_group1 user_group2 ..)"}}
where user
is an ID of the current user and user_group1 user_group2 ..
are all user groups the current user belongs. Therefore, the joined collection (fromIndex
parameter) must have a single shard and a replica on all Solr nodes where the collection you’re joining to has a replica. This can be achieved through so-called сollection сollocation.
The exclusion can be modeled as the difference operation applied on sets (see Solr boolean operator NOT):
inclusive:(user user_group1 user_group2 ..) NOT exclusion(user user_group1 user_group2 ..)
The exclusivity can be added add on top as a subtraction from the previous result of all the documents where exclusivity
is empty or includes one of user/user group IDs (see an answer on stackoverflow):
listing:(user user_group1 user_group2 ..) NOT exclusion(user user_group1 user_group2 ..) AND -(exclusivity:* -exclusivity:(user user_group1 user_group2 ..))
A random collection of ACL was generated using the following parameters:
u
and a random number.Command | Time, ms |
---|---|
inclusion:(u27362) | 15 |
inclusion:(u27362) NOT exclusion:(u27362) | 10 |
inclusion:(u27362) NOT exclusion:(u27362) AND -(exclusivity:* -exclusivity:(u27362)) | 8916 |
For sure ~9s is not acceptable, and the next couple of tests shows that the issue is in the approach to handle 'empty' field:
Command | Time, ms |
---|---|
-(exclusivity:* -exclusivity:(u27362)) | 6631 |
-(exclusivity:*) | 3553 |
If NOT everything is replaced with an exact match then everything start running fast again:
Command | Time, ms |
---|---|
-exclusivity:(u32175) | 4 |
So, the crux is to avoid using NOT everything this can be done there is a control over how the data is loaded into Solr and in most cases it is true. In this case the preprocessing logic should put a predefined term for example None
for all documents where exclusivity
field if empty. If it is done then the requests are processed at the same speed as it is expected:
Command | Time, ms |
---|---|
inclusion:(u52591) NOT exclusion:(u52591) AND -(exclusivity:==None== -exclusivity:(u52591)) | 10 |
inclusion:(u18801 u67517 u26398 u82115 u45460) NOT exclusion:(u18801 u67517 u26398 u82115 u45460) AND -(exclusivity:None -exclusivity:(u18801 u67517 u26398 u82115 u45460)) | 89 |
inclusion:(u18801 u67517 u26398 u82115 u45460 u29970 u92781 u50830 u41427 u4220 u60431 u27097 u81089 u93168 u80261 u15358 u85068 u62514 u29483 u40720 u79200 u46984 u30591 u70104 u56161 u70501 u41777 u16694 u1109 u6525 u91872 u66800 u63422 u46366 u28028 u30052 u92227 u8774 u93961 u90288 u64035 u13089 u24628 u33381 u71814 u40973 u84336 u47175 u78620 u34269) NOT exclusion:(u18801 u67517 u26398 u82115 u45460 u29970 u92781 u50830 u41427 u4220 u60431 u27097 u81089 u93168 u80261 u15358 u85068 u62514 u29483 u40720 u79200 u46984 u30591 u70104 u56161 u70501 u41777 u16694 u1109 u6525 u91872 u66800 u63422 u46366 u28028 u30052 u92227 u8774 u93961 u90288 u64035 u13089 u24628 u33381 u71814 u40973 u84336 u47175 u78620 u34269) AND -(exclusivity:None -exclusivity:(u18801 u67517 u26398 u82115 u45460 u29970 u92781 u50830 u41427 u4220 u60431 u27097 u81089 u93168 u80261 u15358 u85068 u62514 u29483 u40720 u79200 u46984 u30591 u70104 u56161 u70501 u41777 u16694 u1109 u6525 u91872 u66800 u63422 u46366 u28028 u30052 u92227 u8774 u93961 u90288 u64035 u13089 u24628 u33381 u71814 u40973 u84336 u47175 u78620 u34269)) | 278 |
The post demonstrated a basic principle how to achieve a similar behavior and results to ones described in the article "Implementing Product Whitelisting/Blacklisting in SAP Commerce Cloud for Large Product and Customer Bases" published on HYBRISMART. The major difference is that the approach presented in this article doesn’t require much pre-processing to evaluate the full list of all user and user group IDs for ACL. This should reduce significantly the size of the data stored in the ACL collection as well as the pre-processing time. The downside is the limited changeability and tight coupling on the current business how visibility is defined. For example, if the logic and/or the visibility model is significantly changed the power of Solr operations on sets might be not enough to addresses a new business logic and the presented approach won’t work.