Igor Sokolov's Blog

IT blog

Another approach for Apache Solr-driven product visibility rules

Posted at — May 29, 2019

Today’s post is about an implementation approach for visibility rules using Apache Solr. Recently I and my colleagues faced with a quite tricky B2B case on practice. The result of the collaborative research and design work was an article Implementing Product Whitelisting/Blacklisting in SAP Commerce Cloud for Large Product and Customer Bases. In contrast to the presented in the article approach the described below approach is less general but should provide better performance due to broader usage of Apache Solr capabilities.

Task definition

Assume visibility rules are defined as a Solr collection (or a core). The collection stores access control lists (ACL) of a special form. Each document in Solr collection consists of four fields:

The exclusivity concept can be uncovered using the following rules:

Exclusivity > Exclusion > Inclusion.

The exclusivity concept can be uncovered using the following rules:

The task is to make Apache Solr aware of those rules.

Solution

  1. The first step is to teach how Solr to honor inclusion. This can be done by combining Join Query Parser and Filter Query to join and filter a separate Solr collection with the product one:

    .. "fq": "{!join from=id to=id fromIndex=acl_core}inclusive:(user user_group1 user_group2 ..)"}}

    where user is an ID of the current user and user_group1 user_group2 .. are all user groups the current user belongs. Therefore, the joined collection (fromIndex parameter) must have a single shard and a replica on all Solr nodes where the collection you’re joining to has a replica. This can be achieved through so-called сollection сollocation.

  2. The exclusion can be modeled as the difference operation applied on sets (see Solr boolean operator NOT): inclusive:(user user_group1 user_group2 ..) NOT exclusion(user user_group1 user_group2 ..)

  3. The exclusivity can be added add on top as a subtraction from the previous result of all the documents where exclusivity is empty or includes one of user/user group IDs (see an answer on stackoverflow): listing:(user user_group1 user_group2 ..) NOT exclusion(user user_group1 user_group2 ..) AND -(exclusivity:* -exclusivity:(user user_group1 user_group2 ..))

Testing and improvement

A random collection of ACL was generated using the following parameters:

Command Time, ms
inclusion:(u27362) 15
inclusion:(u27362) NOT exclusion:(u27362) 10
inclusion:(u27362) NOT exclusion:(u27362) AND -(exclusivity:* -exclusivity:(u27362)) 8916

For sure ~9s is not acceptable, and the next couple of tests shows that the issue is in the approach to handle 'empty' field:

Command Time, ms
-(exclusivity:* -exclusivity:(u27362)) 6631
-(exclusivity:*) 3553

If NOT everything is replaced with an exact match then everything start running fast again:

Command Time, ms
-exclusivity:(u32175) 4

So, the crux is to avoid using NOT everything this can be done there is a control over how the data is loaded into Solr and in most cases it is true. In this case the preprocessing logic should put a predefined term for example None for all documents where exclusivity field if empty. If it is done then the requests are processed at the same speed as it is expected:

Command Time, ms
inclusion:(u52591) NOT exclusion:(u52591) AND -(exclusivity:==None== -exclusivity:(u52591)) 10
inclusion:(u18801 u67517 u26398 u82115 u45460) NOT exclusion:(u18801 u67517 u26398 u82115 u45460) AND -(exclusivity:None -exclusivity:(u18801 u67517 u26398 u82115 u45460)) 89
inclusion:(u18801 u67517 u26398 u82115 u45460 u29970 u92781 u50830 u41427 u4220 u60431 u27097 u81089 u93168 u80261 u15358 u85068 u62514 u29483 u40720 u79200 u46984 u30591 u70104 u56161 u70501 u41777 u16694 u1109 u6525 u91872 u66800 u63422 u46366 u28028 u30052 u92227 u8774 u93961 u90288 u64035 u13089 u24628 u33381 u71814 u40973 u84336 u47175 u78620 u34269) NOT exclusion:(u18801 u67517 u26398 u82115 u45460 u29970 u92781 u50830 u41427 u4220 u60431 u27097 u81089 u93168 u80261 u15358 u85068 u62514 u29483 u40720 u79200 u46984 u30591 u70104 u56161 u70501 u41777 u16694 u1109 u6525 u91872 u66800 u63422 u46366 u28028 u30052 u92227 u8774 u93961 u90288 u64035 u13089 u24628 u33381 u71814 u40973 u84336 u47175 u78620 u34269) AND -(exclusivity:None -exclusivity:(u18801 u67517 u26398 u82115 u45460 u29970 u92781 u50830 u41427 u4220 u60431 u27097 u81089 u93168 u80261 u15358 u85068 u62514 u29483 u40720 u79200 u46984 u30591 u70104 u56161 u70501 u41777 u16694 u1109 u6525 u91872 u66800 u63422 u46366 u28028 u30052 u92227 u8774 u93961 u90288 u64035 u13089 u24628 u33381 u71814 u40973 u84336 u47175 u78620 u34269)) 278

Conclusion

The post demonstrated a basic principle how to achieve a similar behavior and results to ones described in the article "Implementing Product Whitelisting/Blacklisting in SAP Commerce Cloud for Large Product and Customer Bases" published on HYBRISMART. The major difference is that the approach presented in this article doesn’t require much pre-processing to evaluate the full list of all user and user group IDs for ACL. This should reduce significantly the size of the data stored in the ACL collection as well as the pre-processing time. The downside is the limited changeability and tight coupling on the current business how visibility is defined. For example, if the logic and/or the visibility model is significantly changed the power of Solr operations on sets might be not enough to addresses a new business logic and the presented approach won’t work.

comments powered by Disqus