Another approach for Apache Solr-driven product visibility rules

Posted at — May 29, 2019

Today’s post is about an implementation approach for visibility rules using Apache Solr. Recently I and my colleagues faced with a quite tricky B2B case on practice. The result of the collaborative research and design work was an article Implementing Product Whitelisting/Blacklisting in SAP Commerce Cloud for Large Product and Customer Bases. In contrast to the presented in the article approach the described below approach is less general but should provide better performance due to broader usage of Apache Solr capabilities.

Task definition

Assume visibility rules are defined as a Solr collection (or a core). The collection stores access control lists (ACL) of a special form. Each document in Solr collection consists of four fields:

product_id - an identifier of product that ACL record belongs
inclusion - a list of user or user group IDs who can see the product
exclusion - a list of user or user group IDs who are restricted to see the product
exclusivity - a list of user or user group who have exclusive rights to see product

The exclusivity concept can be uncovered using the following rules:

Exclusivity > Exclusion > Inclusion.

The exclusivity concept can be uncovered using the following rules:

if the ID of the current user occurs explicitly in the exclusivity list then all other lists should be ignored, and the user will be able to see a product in the search result list
if the ID of the current is not in exclusivity list and it is not empty, then there is no need to check other list and the current user won't see the product
if the exclusivity field is empty then all other lists should be in use. From the business standpoint, the logic behind exclusivity is to give exclusive rights to a customer for a specific product and make sure that no other customer will be able to see and as a result to buy a product.

The task is to make Apache Solr aware of those rules.

Solution

The first step is to teach how Solr to honor inclusion. This can be done by combining Join Query Parser and Filter Query to join and filter a separate Solr collection with the product one:

.. "fq": "{!join from=id to=id fromIndex=acl_core}inclusive:(user user_group1 user_group2 ..)"}}

where user is an ID of the current user and user_group1 user_group2 .. are all user groups the current user belongs. Therefore, the joined collection (fromIndex parameter) must have a single shard and a replica on all Solr nodes where the collection you’re joining to has a replica. This can be achieved through so-called сollection сollocation.
The exclusion can be modeled as the difference operation applied on sets (see Solr boolean operator NOT): inclusive:(user user_group1 user_group2 ..) NOT exclusion(user user_group1 user_group2 ..)
The exclusivity can be added add on top as a subtraction from the previous result of all the documents where exclusivity is empty or includes one of user/user group IDs (see an answer on stackoverflow): listing:(user user_group1 user_group2 ..) NOT exclusion(user user_group1 user_group2 ..) AND -(exclusivity:* -exclusivity:(user user_group1 user_group2 ..))

Testing and improvement

A random collection of ACL was generated using the following parameters:

A number of customers and customer groups was 100,000. All customer and customer groups IDs were of the pattern u and a random number.
Number of products was 1,000,000
Max number of inclusion per product was 1,000
Max number of exclusions per product was 200
Max number of exclusivity per product was 200

Command	Time, ms
inclusion:(u27362)	15
inclusion:(u27362) NOT exclusion:(u27362)	10
inclusion:(u27362) NOT exclusion:(u27362) AND -(exclusivity:* -exclusivity:(u27362))	8916

For sure ~9s is not acceptable, and the next couple of tests shows that the issue is in the approach to handle 'empty' field:

Command	Time, ms
-(exclusivity:* -exclusivity:(u27362))	6631
-(exclusivity:*)	3553

If NOT everything is replaced with an exact match then everything start running fast again:

Command	Time, ms
-exclusivity:(u32175)	4

So, the crux is to avoid using NOT everything this can be done there is a control over how the data is loaded into Solr and in most cases it is true. In this case the preprocessing logic should put a predefined term for example None for all documents where exclusivity field if empty. If it is done then the requests are processed at the same speed as it is expected:

Command	Time, ms
inclusion:(u52591) NOT exclusion:(u52591) AND -(exclusivity:==None== -exclusivity:(u52591))	10
inclusion:(u18801 u67517 u26398 u82115 u45460) NOT exclusion:(u18801 u67517 u26398 u82115 u45460) AND -(exclusivity:None -exclusivity:(u18801 u67517 u26398 u82115 u45460))	89
inclusion:(u18801 u67517 u26398 u82115 u45460 u29970 u92781 u50830 u41427 u4220 u60431 u27097 u81089 u93168 u80261 u15358 u85068 u62514 u29483 u40720 u79200 u46984 u30591 u70104 u56161 u70501 u41777 u16694 u1109 u6525 u91872 u66800 u63422 u46366 u28028 u30052 u92227 u8774 u93961 u90288 u64035 u13089 u24628 u33381 u71814 u40973 u84336 u47175 u78620 u34269) NOT exclusion:(u18801 u67517 u26398 u82115 u45460 u29970 u92781 u50830 u41427 u4220 u60431 u27097 u81089 u93168 u80261 u15358 u85068 u62514 u29483 u40720 u79200 u46984 u30591 u70104 u56161 u70501 u41777 u16694 u1109 u6525 u91872 u66800 u63422 u46366 u28028 u30052 u92227 u8774 u93961 u90288 u64035 u13089 u24628 u33381 u71814 u40973 u84336 u47175 u78620 u34269) AND -(exclusivity:None -exclusivity:(u18801 u67517 u26398 u82115 u45460 u29970 u92781 u50830 u41427 u4220 u60431 u27097 u81089 u93168 u80261 u15358 u85068 u62514 u29483 u40720 u79200 u46984 u30591 u70104 u56161 u70501 u41777 u16694 u1109 u6525 u91872 u66800 u63422 u46366 u28028 u30052 u92227 u8774 u93961 u90288 u64035 u13089 u24628 u33381 u71814 u40973 u84336 u47175 u78620 u34269))	278

Conclusion

The post demonstrated a basic principle how to achieve a similar behavior and results to ones described in the article "Implementing Product Whitelisting/Blacklisting in SAP Commerce Cloud for Large Product and Customer Bases" published on HYBRISMART. The major difference is that the approach presented in this article doesn’t require much pre-processing to evaluate the full list of all user and user group IDs for ACL. This should reduce significantly the size of the data stored in the ACL collection as well as the pre-processing time. The downside is the limited changeability and tight coupling on the current business how visibility is defined. For example, if the logic and/or the visibility model is significantly changed the power of Solr operations on sets might be not enough to addresses a new business logic and the presented approach won’t work.

Igor Sokolov's Blog

IT blog

Another approach for Apache Solr-driven product visibility rules

Task definition

Solution

Testing and improvement

Conclusion