Add lazy-loading to FilterTable (#3093)

Signed-off-by: Clinton Wolfe <clintoncwolfe@gmail.com>
This commit is contained in:
Clinton Wolfe 2018-06-05 17:32:52 -04:00 committed by GitHub
parent 10183aca1a
commit ca6556e0fe
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
7 changed files with 493 additions and 99 deletions

View file

@ -4,15 +4,15 @@ If you just want to _use_ FilterTable, see filtertable-usage.md . Reading this
## What makes this hard?
FilterTable was created in 2016 in an attempt to consolidate the pluralization features of several resources. They each had slightly different featuresets, and were all in the wild, so FilterTable exposes some extensive side-effects to provide those features.
FilterTable was created in 2016 in an attempt to consolidate the pluralization features of several resources. They each had slightly different feature-sets, and were all in the wild, so FilterTable exposes some extensive side-effects to provide those features.
Additionaly, the ways in which the classes relate is not straightforward.
Additionally, the ways in which the classes relate is not straightforward.
## Where is the code?
The main FilterTable code is in [utils/filter.rb](https://github.com/chef/inspec/blob/master/lib/utils/filter.rb).
Also educational is the unit test for Filtertable, at test/unit/utils/filter_table_test.rb
Also educational is the unit test for Filtertable, at test/unit/utils/filter_table_test.rb . Recent work has focused on using functional tests to exercise FilterTable; see test/unit/mocks/profiles/filter_table and test/functional/filter_table_test.rb .
The file utils/filter_array.rb appears to be unrelated.
@ -80,15 +80,15 @@ This is currently simply an alias for `register_custom_property`. See it for de
Legacy name (alias): `add`
This method has very complex behavior, ans should likely be split into several use cases. `register_custom_property` requires a symbol (which will be used as a method name _to be added to the resource class_), then also accepts a block and/or additional args. These things - name, block, and opts - are packed into a simple struct called a CustomPropertyType. The name stored in the struct will be `opts[:field]` if provided, and the method name if not.
This method has very complex behavior, ans should likely be split into several use cases. `register_custom_property` requires a symbol (which will be used as a method name _to be added to the resource class_), then also accepts a block and/or additional args. These things - name, block, and opts - are packed into a simple Struct called a CustomPropertyType. The name stored in the Struct will be `opts[:field]` if provided, and the method name if not.
The CustomPropertyType struct is then appended to the Hash `@custom_properties`, keyed on the method name provided. `self` is then returned for method chaining.
The CustomPropertyType Struct is then appended to the Hash `@custom_properties`, keyed on the method name provided. `self` is then returned for method chaining.
The implementation of the custom property method is generated by `create_custom_property_body`, and varies based on whether a block was provided to `register_custom_property`.
#### Behavior when a block is provided
This behavior is implemented by lines 298-304.
This behavior is implemented by lines 388-394.
If a block is provided, it is turned into a Lambda and used as the method body.
@ -117,7 +117,7 @@ If you provide _both_ a block and opts, only the block is used, and the options
#### Behavior when no block is provided
If you do not provide a block, you _must_ provide a `:field` option (though that does no appear to be enforced). The behavior is to define a method with the name provided, that has a conditional return type. The method body is defined in lines 306-319.
If you do not provide a block, you _must_ provide a `:field` option (though that does no appear to be enforced). The behavior is to define a method with the name provided, that has a conditional return type. The method body is defined in lines 396-413.
If called without arguments, it returns an array of the values in the raw data for that column.
```
@ -142,9 +142,9 @@ You can provide options to `register_custom_property` / `add`, after the desired
##### field
This is the most common option. It selects an implementation in which the desired method will be defined such that it returns an array of the row values using the specified key. In other words, this acts as a "column fetcher", like in SQL: "SELECT some_column FROM some_table"
This is the most common option, and is mandatory if a block is not provided. It selects an implementation in which the desired method will be defined such that it returns an array of the row values using the specified key. In other words, this acts as a "column fetcher", like in SQL: "SELECT some_column FROM some_table"
Internally, (line 236-241), a Struct type is created to repressent a row of raw data. The struct's attribute list is taken from the `field` options passed to `register_custom_property` / `add`. This new type is strored as `row_eval_context_type`. It is used as the evaluation context for block-mode `where` calls.
Internally, (line 269-276), a Struct type is created to represent a row of raw data. The struct's attribute list is taken from the `field` options passed to `register_custom_property` / `add`. This new type is stored as `row_eval_context_type`. It is used as the evaluation context for block-mode `where` calls.
* No checking is performed to see if the field name is actually a column in the raw data (the raw data hasn't been fetched yet, so we can't check).
* You can't have two `register_custom_property` / `add` calls that reference the same field, because the Struct would see that as a duplicate attribute.
@ -153,10 +153,16 @@ POSSIBLE BUG: We could deduplicate the field names when defining the Struct, thu
##### style
The `style` option is intended to effect post-processing of the return value from the generated method. To date there is only one recognized value, `:simple`, which flattens, uniq's, and compact's the array value of the property. This is implemented on line 336.
The `style` option is intended to effect post-processing of the return value from the generated method. To date there is only one recognized value, `:simple`, which `flatten`s, `uniq`s, and `compact`s the array value of the property. This is implemented on line 406.
No other values for `:style` have been seen.
##### lazy
This option implements column-wise lazy loading. The value of the option is expected to a lambda expecting 3 arguments: row (a Hash representing a row of the raw data), condition (a sought value to filter for), and table (a reference to the FilterTable::Table subclass, which may be used for context).
See the usage guide for details on the usage of the lazy mechanism; this document will examine the internals.
### install_filter_methods_on_resource
Legacy name (alias): connect
@ -175,51 +181,57 @@ Note that 'connect' does not refer to Connectors.
So, `install_filter_methods_on_resource`/`connect`'s job is to actually install everything.
#### Re-pack the "connectors"
#### Defines a special Struct type to support block-mode where
First, on lines 188-192, the list of custom properties ("connectors", registered using the `register_custom_property` / `add` method) are repacked into an array of hashes of two elements - the desired method name and the lambda that will be used as the method body. The lambda is created by the private method `create_custom_property_body`; see `register_custom_property` for discussion about how the implementtation behaves.
#### Defines a special Struct type to represent rows in the table
At lines 212-217, a new Struct type `row_eval_context_type` is defined, with attributes for each of the known table fields. The motivation for this struct type is to implement the block-mode behavior of `where`. Because each struct represents a row, and it has the attributes (accessors) for the fields, block-mode `where` is implemented by instance-evaling against each row as a struct.
At lines 270-276, a new Struct type `row_eval_context_type` is defined, with attributes for each of the known table fields. The motivation for this struct type is to implement the block-mode behavior of `where`. Because each struct represents a row, and it has the attributes (accessors) for the fields, block-mode `where` is implemented by `instance_eval`ing against each row as a struct.
Additionally, an instance variable, `@criteria_string` is defined, with an accessor. `to_s` is implemented, using `@criteria_string`, or `super` if not defined. I guess we then rely on the `Struct` class to stringify.
`@criteria_string` is a trace - a string indicating the filter criteria used to create the table. I found no location where this per-row trace data was used.
Table fields are determined by listing the `field_name`s of the Connectors.
Additionally, an instance variable is setup to refer to the filter table later. This is required for lazy-loading columns.
Table fields are determined by listing the `field_name`s of the CustomProperties.
BUG: this means that any `register_custom_property` / `add` call that uses a block but not options will end up with an attribute in the row Struct. Thus, `filter.add(:exists?) { ... }` results in a row Struct that includes an attribute named `exists?` which may be undesired. This attribute will never have a value, because when the structs are instantiated, the block for the field is not called.
POSSIBLE MISFEATURE: Defining a Struct for rows means that people who use `entries` (or other data accessors) interact with something unusual. The simplest possible thing would be an Array of Hashes. There is likely something relying on this...
#### Re-pack the "connectors"
On lines 278-280, the list of custom properties ("connectors", registered using the `register_custom_property` / `add` method) are repacked into an array of hashes of two elements - the desired method name and the lambda that will be used as the method body. The lambda is created by the private method `create_custom_property_body`; see `register_custom_property` for discussion about how the implementation behaves.
#### Subclass FilterTable::Table into an anonymous class
At line 248, create the local var `table_class`, which refers to an anonymous class that subclasses FilterTable::Table. The class is opened and two groups of methods are defined.
At line 284, create the local var `table_class`, which refers to an anonymous class that subclasses FilterTable::Table. The class is opened and two groups of methods are defined.
Lines 269-280 install the "custom_property" methods, using the names and lambdas determined on line 244.
Lines 286-288 install the "custom_property" methods, using the names and lambdas determined on line 279.
Line 255-260 define a method, `create_eval_context_for_row`. This is used when executing a block-mode `where`; see line 118.
Lines 290-292 allow the Table subclass to introspect on the CustomProperties by slipping a reference to it in the class body, forming a closure.
Line 295-301 define a method, `create_eval_context_for_row`. This is used when executing a block-mode `where`; see line 120.
#### Setup the row context struct for lazy loading
If you have a lazy field named `color` and it has not yet been populated, we need to trigger it to populate the first time it is read. If a block-mode where is used (`my_resource.where { color == :red }`), then we have to intercept the Struct's default `getter method`, and call the lazy column's lambda.
Lines 304-327 do exactly that, by defining methods on the Struct subclass we're using for context. We continue to rely on the default Struct setter (`[]=`) and getter at the end (`[]`).
#### Install methods on the resource
Lines 269-280 install the "filter_methods" and "custom properties" methods onto the resource that you are authoring.
Lines 335-346 install the "filter_methods" and "custom properties" methods onto the resource that you are authoring.
Line 269-270 collects the names of the methods to define - by agglomerating the names of the "filter_methods" and "custom properties" methods. They are treated the same.
Line 335-336 collects the names of the methods to define - by agglomerating the names of the "filter_methods" and "custom properties" methods. They are treated the same.
Line 271 uses `send` with a block to call `define_method` on the resource class that you're authoring. Using a block with `send` is undocumented, but is treated as an implicit argument (per stackoverflow) , so the end result is that the block is used as the body for the new method being defined.
Line 337 uses `send` with a block to call `define_method` on the resource class that you're authoring. Using a block with `send` is undocumented, but is treated as an implicit argument (per StackOverflow) , so the end result is that the block is used as the body for the new method being defined.
The method body is wrapped in an exception-catching facility that catches skipped or failed resource exceptions and wraps them in a specialized exception catcher class. TBD: understand this better.
Line 274 constructs an instance of the anonymous FilterTable::Table subclass defined at 248. It passes three args:
Line 340 constructs an instance of the anonymous FilterTable::Table subclass defined at 284. It passes three args:
1. `self`. A reference to the resource instance.
2. The return value of calling the data fetcher method (that is an array of hashes, the raw data).
3. The string ' with', which is probably informing the criteria stringification. The extra space is intentional, as it follows the resource name: 'my_things with color == :red' might be a result.
On line 275, we then immediately call a method on that "FilterTable::Table subclass instance". The method name is the same as the one we're defining on the resource - but we're calling it on the Table. Recall we defined all the "custom_property" methods on the Table subclass at line 250-252. The method gets called with any args or block passed, and since it's the last thing, it provides the return value.
VERY WORRISOME THING: So, the Table subclass has methods for the "custom_properties" (for example, `thing_ids` or `exist?`. What about the "filter_methods" - `where` and `entries`? Are those in the FilterTable::Table class, or method_missing'd?
On line 341, we then immediately call a method on that "FilterTable::Table subclass instance". The method name is the same as the one we're defining on the resource - but we're calling it on the Table. Recall we defined all the "custom_property" methods on the Table subclass at line 286-288. The method gets called with any args or block passed, and since it's the last thing, it provides the return value.
## What is its behavior? (FilterTable::Table)
@ -259,12 +271,13 @@ We know from the above exploration of `install_filter_methods_on_resource` / `co
### FilterTable::Table constructor and internals
Factory calls the FilterTable::Table constructor at 87-92 with three args. Table stores them into instance vars:
* @resource_instance - this was passed in as `self` from line 274
Factory calls the FilterTable::Table constructor at 87-93 with three args. Table stores them into instance vars:
* @resource_instance - this was passed in as `self` from line 340
* @raw_data - an array of hashes
* @criteria_string - This looks to be stringification trace data; the string ' with' was passed in by Factory.
* @populated_lazy_columns = a hash, by lazy field name, with boolean values. This is set true if `populate_lazy_field` is called on a field.
All three get exposed via `attr_reader`s.
The first three get exposed via `attr_reader`s.
### `entries` behavior
@ -272,21 +285,21 @@ From usage, I expect entries to return a structure that resembles an array of ha
#### A new method `entries` is defined on the resource class
That is performed by Factory#connect line 271.
That is performed by Factory#connect line 337.
#### It delegates to FilterTable::Table#entries
This is a real method defined in filter.rb line 153.
This is a real method defined in filter.rb line 155.
It loops over the provided raw data (@raw_data) and builds an array, calling `create_eval_context_for_row` (see Factory 231-236) on each row; also appending a stringification trace to each entry. The array is returned.
It loops over the provided raw data (@raw_data) and builds an array, calling `create_eval_context_for_row` (see Factory lines 295-301) on each row; also appending a stringification trace to each entry. The array is returned.
#### `entries` conclusion
Not Surprising: It does behave as expected - an array of hashlike structs representing the table. I don't know why it adds in the per-row stringification data - I've never seen that used.
Not Surprising: It does behave as expected - an array of Hash-like structs representing the table. I don't know why it adds in the per-row stringification data - I've never seen that used.
Surprising: this is a real method with a concrete implementation. That means that you can't call `filter.add_accessor` with arbitrary method names - `:entries` means something very specific.
Surpising: I would not recommend this method be used for data access; instead I would recommend using `raw_data`.
Surprising: I would not recommend this method be used for data access; instead I would recommend using `raw_data`.
### `where` behavior
@ -296,11 +309,11 @@ So, what happens when you call `register_filter_method(:where)` and then call `r
#### A new method `where` is defined on the resource class
That is performed by Factory#connect line 271.
That is performed by Factory#connect line 337.
#### It delegates to FilterTable::Table#where
Like `entries`, this is a real implemented method on FilterTable::Table, at line 97.
Like `entries`, this is a real implemented method on FilterTable::Table, at line 98.
The method accepts all params as the local var `conditions` which defaults to an empty Hash. A block, if any, is also explicitly assigned the name `block`.
@ -310,26 +323,26 @@ MISFEATURE: The first guard clause simply returns the Table if `conditions` is n
The second guard clause is a sensible degenerate case - return the existing Table if there are no conditions and no block. So `thing.where` is OK.
Line 102 initializes a local var, `new_criteria_string`, which again is a stringification tracker.
Line 103 initializes a local var, `new_criteria_string`, which again is a stringification tracker.
Line 103 initializes a var to track the `filtered_raw_data`.
Line 104 initializes a var to track the `filtered_raw_data`.
Lines 107-110 loop over the provided Hash `conditions`. It repeatedly downfilters `filtered_raw_data` by calling the private method `filter_raw_data` on it. `filter_raw_data` does some syntactic sugaring for common types, Ints and Floats and Regexp matching. Additionally, the 107-110 loop builds up the stringification tracker, `new_criteria_string`, by stringifying the field name and target value.
Lines 108-113 loop over the provided Hash `conditions`. If the requested field is lazy, it requests that it be populated (note that `populate_lazy_field` is idempotent - it won't fetch a field twice). Next, it repeatedly down-filters `filtered_raw_data` by calling the private method `filter_raw_data` on it. `filter_raw_data` does some syntactic sugaring for common types, Integers and Floats and Regexp matching. Additionally, the line 108-113 loop builds up the stringification tracker, `new_criteria_string`, by stringifying the field name and target value.
Line 116-133 begins work if a filtration block has been provided. At this point, `filtered_raw_data` has been initialized with the raw data, and (if method params were provided) has also been filtered down.
Line 118-135 begins work if a filtration block has been provided. At this point, `filtered_raw_data` has been initialized with the raw data, and (if method params were provided) has also been filtered down.
Line 118 filters the rows of the raw data using an interesting approach. Each row is inflated to a Struct using `create_eval_context_for_row` (see line 255). Then the provided block is `instance_eval`'d against the Struct. Because the Struct was defined with attributes (that is, accessor methods) for each declared field name (from FilterTable::Factory#register_custom_property), you can use field names in the block, and each row-as-struct will be able to respond.
Line 120 filters the rows of the raw data using an interesting approach. Each row is inflated to a Struct using `create_eval_context_for_row` (see line 295). Then the provided block is `instance_eval`'d against the Struct. Because the Struct was defined with attributes (that is, accessor methods) for each declared field name (from FilterTable::Factory#register_custom_property), you can use field names in the block, and each row-as-struct will be able to respond. If the field happened to be lazy, we'll call our custom getter from lines 304-327.
_That just explained a major spooky side-effect for me._
Lines 120-132 do something with stringification tracing. TODO.
Lines 125-134 do something with stringification tracing. TODO.
Finally, at line 135, the FilterTable::Table anonymous subclass is again used to construct a new instance, passing on the resource reference, the newly filtered raw data table, and the newly adjusted stringificatioon tracer.
Finally, at line 137, the FilterTable::Table anonymous subclass is again used to construct a new instance, passing on the resource reference, the newly filtered raw data table, and the newly adjusted stringification tracer.
That new Table instance is returned, and thus `where` allows you to chain.
#### `where` conclusion
Unsurprising: How where works with method params.
Surprising: How where works in block mode, instance_eval'ing against each row-as-Struct.
Surprising: How where works in block mode, `instance_eval`'ing against each row-as-Struct.
Surprising: You can use method-mode and block-mode together if you want.

View file

@ -379,6 +379,90 @@ You could use this to do something fairly complicated.
However, the resource instance won't know about the filtration, so I'm not sure what good this does. Chances are, someone is doing something horrid using this feature in the wild.
## Lazy Loading
### What is Lazy Loading
In some cases, the raw data may require multiple actions to populate. For example, if you wanted a list of processes, and their open files, you might need to call 'ps' once, then 'lsof' one or more times. That would become slow, and so you would only want to do it if you knew it was going to be used.
Lazy loaded columns are absent in the raw data, until they are accessed (either by method-where, block-where, or a list property). When they are accessed, a user-provided Lambda is called, which populates one or more columns. FilterTable remembers which lazy columns have been populated, and will not call the lambda again.
### Declaring a lazy field
You declare a field to be lazy by providing an option, `lazy`, whose value is the lambda to be called.
You can use the 'stabby lambda' syntax:
```ruby
filter_table_config.register_column(
:open_files,
field: :files,
lazy: ->() {|r,c,t| r[:files] = lookup_files_for_pid(r[:pid])},
)
```
You can also refer to a *class* method. You cannot use an instance method, because FilterTable binds to the resource class, not the resource instance.
```ruby
def self.populate_lsof(row, criteria, table)
row[:files] = ...
end
filter_table_config.register_column(
:open_files,
field: :files,
lazy: method(:populate_lsof),
)
```
### Arguments to the fetcher lambda
The lambda will be provided three arguments:
1. `row`. This is a Hash, the current row of the raw_data. You will likely need to examine this to find an ID value or other field that will act as a search key for your fetch. You are expected to add one or more entries to this hash, as a result of your fetch.
2. `condition`. In some cases, a condition (desired value) is provided; the semantics of this are up to you.
3. `table`. A reference to the FilterTable. You can use this to access other context - including the entire raw data (`table.raw_data`) or the resource instance (`table.resource_instance`).
### Clobbering
Lazy-loading will not clobber an existing value in raw data. For example:
```ruby
# Your raw data table:
[
{ id: 1 },
{ id: 2, color: :blue },
{ id: 3 },
]
# On lazy load, set all rows to color red
filter_table_config.register_column(
:colors,
field: :color,
lazy: ->() { |r,c,t| r[:color] = :red },
)
# Trigger a fetch
my_resource.colors => [:red, :blue, :red]
# Raw data now:
[
{ id: 1, color: :red },
{ id: 2, color: :blue },
{ id: 3, color: :red },
]
```
Note that not only was the `:color` blue not overwritten, in fact the fetcher lambda was only called twice.
### Can I set multiple columns at once?
Yes. If your fetching action provides you with data to populate multiple columns, you are free to set any columns you wish in the `row`.
You can even have multiple lazy columns share an implementation; the first one to be called will populate all the columns that share that implementation, and if any of the others are later triggered, the no-clobber effect will kick in, and the fetcher will not be called again.
### Can I set multiple rows at once?
Yes. Using `table.raw_data`, you could perform a column-at-once population. After the fetcher was called for the first row, all other rows would already be populated, so the fetcher would not be called again due to the no-clobber effect.
## Gotchas and Surprises
### Methods defined with `register_column` will change their return type based on their call pattern

View file

@ -23,27 +23,73 @@ class AwsIamUsers < Inspec.resource(1)
include AwsPluralResourceMixin
def self.lazy_get_login_profile(row, _criterion, table)
backend = BackendFactory.create(table.resource.inspec_runner)
begin
_login_profile = backend.get_login_profile(user_name: row[:user_name])
row[:has_console_password] = true
rescue Aws::IAM::Errors::NoSuchEntity
row[:has_console_password] = false
end
row[:has_console_password?] = row[:has_console_password]
end
def self.lazy_list_mfa_devices(row, _criterion, table)
backend = BackendFactory.create(table.resource.inspec_runner)
begin
aws_mfa_devices = backend.list_mfa_devices(user_name: row[:user_name])
row[:has_mfa_enabled] = !aws_mfa_devices.mfa_devices.empty?
rescue Aws::IAM::Errors::NoSuchEntity
row[:has_mfa_enabled] = false
end
row[:has_mfa_enabled?] = row[:has_mfa_enabled]
end
def self.lazy_list_user_policies(row, _criterion, table)
backend = BackendFactory.create(table.resource.inspec_runner)
row[:inline_policy_names] = backend.list_user_policies(user_name: row[:user_name]).policy_names
row[:has_inline_policies] = !row[:inline_policy_names].empty?
row[:has_inline_policies?] = row[:has_inline_policies]
end
def self.lazy_list_attached_policies(row, _criterion, table)
backend = BackendFactory.create(table.resource.inspec_runner)
attached_policies = backend.list_attached_user_policies(user_name: row[:user_name]).attached_policies
row[:has_attached_policies] = !attached_policies.empty?
row[:has_attached_policies?] = row[:has_attached_policies]
row[:attached_policy_names] = attached_policies.map { |p| p[:policy_name] }
row[:attached_policy_arns] = attached_policies.map { |p| p[:policy_arn] }
end
filter = FilterTable.create
filter.add_accessor(:where)
.add_accessor(:entries)
.add(:exists?) { |x| !x.entries.empty? }
.add(:has_mfa_enabled?, field: :has_mfa_enabled)
.add(:has_console_password?, field: :has_console_password)
.add(:has_inline_policies?, field: :has_inline_policies)
.add(:has_attached_policies?, field: :has_attached_policies)
# Summary methods
filter.add(:exists?) { |table| !table.params.empty? }
.add(:count) { |table| table.params.count }
# These are included on the initial fetch
filter.add(:usernames, field: :user_name)
.add(:username) { |res| res.entries.map { |row| row[:user_name] } } # We should deprecate this; plural resources get plural properties
.add(:password_ever_used?, field: :password_ever_used?)
.add(:password_never_used?, field: :password_never_used?)
.add(:password_last_used_days_ago, field: :password_last_used_days_ago)
.add(:usernames, field: :user_name)
.add(:username) { |res| res.entries.map { |row| row[:user_name] } } # We should deprecate this; plural resources get plural properties
# Next three are needed to declare fields for use by the de-duped set
filter.add(:dupe_inline_policy_names, field: :inline_policy_names_source)
.add(:dupe_attached_policy_names, field: :attached_policy_names_source)
.add(:dupe_attached_policy_arns, field: :attached_policy_arns_source)
# These three are now able to access the above three in .entries
filter.add(:inline_policy_names) { |obj| obj.dupe_inline_policy_names.flatten.uniq }
.add(:attached_policy_names) { |obj| obj.dupe_attached_policy_names.flatten.uniq }
.add(:attached_policy_arns) { |obj| obj.dupe_attached_policy_arns.flatten.uniq }
# Remaining properties / criteria are handled lazily, grouped by fetcher
filter.add(:has_console_password?, field: :has_console_password?, lazy: method(:lazy_get_login_profile))
.add(:has_console_password, field: :has_console_password, lazy: method(:lazy_get_login_profile))
filter.add(:has_mfa_enabled?, field: :has_mfa_enabled?, lazy: method(:lazy_list_mfa_devices))
.add(:has_mfa_enabled, field: :has_mfa_enabled, lazy: method(:lazy_list_mfa_devices))
filter.add(:has_inline_policies?, field: :has_inline_policies?, lazy: method(:lazy_list_user_policies))
.add(:has_inline_policies, field: :has_inline_policies, lazy: method(:lazy_list_user_policies))
.add(:inline_policy_names, field: :inline_policy_names, style: :simple, lazy: method(:lazy_list_user_policies))
filter.add(:has_attached_policies?, field: :has_attached_policies?, lazy: method(:lazy_list_attached_policies))
.add(:has_attached_policies, field: :has_attached_policies, lazy: method(:lazy_list_attached_policies))
.add(:attached_policy_names, field: :attached_policy_names, style: :simple, lazy: method(:lazy_list_attached_policies))
.add(:attached_policy_arns, field: :attached_policy_arns, style: :simple, lazy: method(:lazy_list_attached_policies))
filter.connect(self, :table)
def validate_params(raw_params)
@ -66,45 +112,17 @@ class AwsIamUsers < Inspec.resource(1)
table
end
def fetch_from_api # rubocop: disable Metrics/AbcSize
def fetch_from_api
backend = BackendFactory.create(inspec_runner)
@table = fetch_from_api_paginated(backend)
# TODO: lazy columns - https://github.com/chef/inspec-aws/issues/100
@table.each do |user|
# Some of these throw exceptions to indicate empty results;
# others return empty arrays
begin
_login_profile = backend.get_login_profile(user_name: user[:user_name])
user[:has_console_password] = true
rescue Aws::IAM::Errors::NoSuchEntity
user[:has_console_password] = false
end
user[:has_console_password?] = user[:has_console_password]
begin
aws_mfa_devices = backend.list_mfa_devices(user_name: user[:user_name])
user[:has_mfa_enabled] = !aws_mfa_devices.mfa_devices.empty?
rescue Aws::IAM::Errors::NoSuchEntity
user[:has_mfa_enabled] = false
end
user[:has_mfa_enabled?] = user[:has_mfa_enabled]
user[:inline_policy_names_source] = backend.list_user_policies(user_name: user[:user_name]).policy_names
user[:has_inline_policies] = !user[:inline_policy_names_source].empty?
user[:has_inline_policies?] = user[:has_inline_policies]
attached_policies = backend.list_attached_user_policies(user_name: user[:user_name]).attached_policies
user[:has_attached_policies] = !attached_policies.empty?
user[:has_attached_policies?] = user[:has_attached_policies]
user[:attached_policy_names_source] = attached_policies.map { |p| p[:policy_name] }
user[:attached_policy_arns_source] = attached_policies.map { |p| p[:policy_arn] }
password_last_used = user[:password_last_used]
user[:password_ever_used?] = !password_last_used.nil?
user[:password_never_used?] = password_last_used.nil?
next unless user[:password_ever_used?]
user[:password_last_used_days_ago] = ((Time.now - password_last_used) / (24*60*60)).to_i
if user[:password_ever_used?]
user[:password_last_used_days_ago] = ((Time.now - password_last_used) / (24*60*60)).to_i
end
end
@table
end

View file

@ -89,6 +89,7 @@ module FilterTable
@raw_data = raw_data
@raw_data = [] if @raw_data.nil?
@criteria_string = criteria_string
@populated_lazy_columns = {}
end
# Filter the raw data based on criteria (as method params) or by evaling a
@ -106,6 +107,7 @@ module FilterTable
# against the raw data. Criteria are assumed to be hash keys.
conditions.each do |raw_field_name, desired_value|
raise(ArgumentError, "'#{raw_field_name}' is not a recognized criterion - expected one of #{list_fields.join(', ')}'") unless field?(raw_field_name)
populate_lazy_field(raw_field_name, desired_value) if is_field_lazy?(raw_field_name)
new_criteria_string += " #{raw_field_name} == #{desired_value.inspect}"
filtered_raw_data = filter_raw_data(filtered_raw_data, raw_field_name, desired_value)
end
@ -173,7 +175,7 @@ module FilterTable
# Currently we only know about a field if it is present in a at least one row of the raw data.
# If we have no rows in the raw data, assume all fields are acceptable (and rely on failing to match on value, nil)
return true if raw_data.empty?
list_fields.include?(proposed_field)
list_fields.include?(proposed_field) || is_field_lazy?(proposed_field)
end
def to_s
@ -182,6 +184,38 @@ module FilterTable
alias inspect to_s
def populate_lazy_field(field_name, criterion)
return unless is_field_lazy?(field_name)
return if field_populated?(field_name)
raw_data.each do |row|
next if row.key?(field_name) # skip row if pre-existing data is present
callback_for_lazy_field(field_name).call(row, criterion, self)
end
mark_lazy_field_populated(field_name)
end
def is_field_lazy?(sought_field_name)
custom_properties_schema.values.any? do |property_struct|
sought_field_name == property_struct.field_name && \
property_struct.opts[:lazy]
end
end
def callback_for_lazy_field(field_name)
return unless is_field_lazy?(field_name)
custom_properties_schema.values.find do |property_struct|
property_struct.field_name == field_name
end.opts[:lazy]
end
def field_populated?(field_name)
@populated_lazy_columns[field_name]
end
def mark_lazy_field_populated(field_name)
@populated_lazy_columns[field_name] = true
end
private
def matches_float(x, y)
@ -229,12 +263,13 @@ module FilterTable
@resource = nil # TODO: this variable is never initialized
end
def install_filter_methods_on_resource(resource_class, raw_data_fetcher_method_name)
def install_filter_methods_on_resource(resource_class, raw_data_fetcher_method_name) # rubocop: disable Metrics/AbcSize, Metrics/MethodLength
struct_fields = @custom_properties.values.map(&:field_name)
# A context in which you can access the fields as accessors
row_eval_context_type = Struct.new(*struct_fields.map(&:to_sym)) do
attr_accessor :criteria_string
attr_accessor :filter_table
def to_s
@criteria_string || super
end
@ -245,22 +280,53 @@ module FilterTable
end
# Define the filter table subclass
custom_properties = @custom_properties # We need a local var, not an instance var, for a closure below
table_class = Class.new(Table) {
# Install each custom property onto the FilterTable subclass
properties_to_define.each do |property_info|
define_method property_info[:method_name], &property_info[:method_body]
end
define_method :custom_properties_schema do
custom_properties
end
# Install a method that can wrap all the fields into a context with accessors
define_method :create_eval_context_for_row do |row_as_hash, criteria_string = ''|
return row_eval_context_type.new if row_as_hash.nil?
res = row_eval_context_type.new(*struct_fields.map { |field| row_as_hash[field] })
res.criteria_string = criteria_string
res
context = row_eval_context_type.new(*struct_fields.map { |field| row_as_hash[field] })
context.criteria_string = criteria_string
context.filter_table = self
context
end
}
# Define all access methods with the parent resource_class
# Now that the table class is defined and the row eval context struct is defined,
# extend the row eval context struct to support triggering population of lazy fields
# in where blocks. To do that, we'll need a reference to the table (which
# knows which fields are populated, and how to populate them) and we'll need to
# override the getter method for each lazy field, so it will trigger
# population if needed. Keep in mind we don't have to adjust the constructor
# args of the row struct; also the Struct class will already have provided
# a setter for each field.
@custom_properties.values.each do |property_info|
next unless property_info.opts[:lazy]
field_name = property_info.field_name.to_sym
row_eval_context_type.send(:define_method, field_name) do
unless filter_table.field_populated?(field_name)
filter_table.populate_lazy_field(field_name, NoCriteriaProvided) # No access to criteria here
# OK, the underlying raw data has the value in the first row
# (because we would trigger population only on the first row)
# We could just return the value, but we need to set it on this Struct in case it is referenced multiple times
# in the where block.
self[field_name] = filter_table.raw_data[0][field_name]
end
# Now return the value using the Struct getter, whether newly populated or not
self[field_name]
end
end
# Define all access methods with the parent resource
# These methods will be configured to return an `ExceptionCatcher` object
# that will always return the original exception, but only when called
# upon. This will allow method chains in `describe` statements to pass the
@ -332,6 +398,10 @@ module FilterTable
lambda do |filter_criteria_value = NoCriteriaProvided, &cond_block|
if filter_criteria_value == NoCriteriaProvided && !block_given?
# No second-order block given. Just return an array of the values in the selected column.
result = where(nil)
if custom_property_struct.opts[:lazy]
result.populate_lazy_field(custom_property_struct.field_name, filter_criteria_value)
end
result = where(nil).get_column_values(custom_property_struct.field_name) # TODO: the where(nil). is likely unneeded
result = result.flatten.uniq.compact if custom_property_struct.opts[:style] == :simple
result

View file

@ -50,8 +50,66 @@ describe '2943 inspec exec for filter table profile, method mode for `where' do
end
end
describe '2370 lazy_load for filter table' do
include FunctionalHelper
it 'positive tests should pass' do
controls = [
'2370_where_block',
'2370_where_block_only_referenced',
'2370_where_method',
'2370_where_method_only_referenced',
'2370_populate_once',
'2370_no_side_populate',
'2370_no_clobber',
'2370_list_property',
'2370_list_property_filter_method',
'2370_list_property_filter_block',
'2370_no_rows',
]
cmd = 'exec ' + File.join(profile_path, 'filter_table')
cmd += ' --reporter json --no-create-lockfile'
cmd += ' --controls ' + controls.join(' ')
cmd = inspec(cmd)
# RSpec keeps issuing a deprecation count to stdout; I can't seem to disable it.
output = cmd.stdout.split("\n").reject {|line| line =~ /deprecation/}.join("\n")
data = JSON.parse(output)
failed_controls = data['profiles'][0]['controls'].select { |ctl| ctl['results'][0]['status'] == 'failed' }
control_hash = {}
failed_controls.each do |ctl|
control_hash[ctl['id']] = ctl['results'][0]['message']
end
control_hash.must_be_empty
cmd.exit_status.must_equal 0
end
it 'negative tests should fail but not abort' do
controls = [
'2370_proc_handle_exception',
]
cmd = inspec('exec ' + File.join(profile_path, 'filter_table') + ' --reporter json --no-create-lockfile' + ' --controls ' + controls.join(' '))
data = JSON.parse(cmd.stdout)
failed_controls = data['profiles'][0]['controls'].select { |ctl| ctl['results'][0]['status'] == 'failed' }
control_hash = {}
failed_controls.each do |ctl|
control_hash[ctl['id']] = ctl['results'][0]['message']
end
controls.each do |expected_control|
control_hash.keys.must_include(expected_control)
end
cmd.stderr.must_equal ''
cmd.exit_status.must_equal 100
end
end
describe '2929 exceptions in block-mode where' do
include FunctionalHelper
it 'positive tests should pass' do
controls = [
'2929_exception_in_where',

View file

@ -0,0 +1,127 @@
title 'Verify lazy loading columns works correctly - issue 2370'
fresh_data = ->() do
[
{ row_id: 0, color: :red }.dup,
{ row_id: 1, color: :blue, lazy_2: :pre_existing }.dup,
{ row_id: 2, color: :green }.dup,
]
end
# Fixture notes:
# lazy_1 populates with a constant symbol
# lazy_2 populates with a constant symbol but encounters a collision
# lazy_3 increments on each call
# lazy_4 throws an exception on call
control '2370_where_block' do
desc 'When we call where as a block, lazy columns should load if referenced'
describe lazy_loader(fresh_data.call).where { lazy_1 == :lazy_1_loaded } do
its('count') { should cmp 3 }
its('lazy_1s.first') { should cmp :lazy_1_loaded }
end
describe lazy_loader(fresh_data.call).where { lazy_3 == 1 } do
its('count') { should cmp 1 }
its('lazy_3s.first') { should cmp 1 }
its('resource.lazy_3_call_count') { should == 3 }
end
end
control '2370_where_block_only_referenced' do
desc 'When we call where as a block, lazy columns should not load unless referenced'
describe lazy_loader(fresh_data.call).where { color == :red } do
[ :lazy_1, :lazy_2, :lazy_3, :lazy_4 ].each do |lazy_field|
its('raw_data.first.keys') { should_not include lazy_field }
end
end
end
control '2370_where_method' do
desc 'When we call where as a method, lazy columns should load if referenced'
describe lazy_loader(fresh_data.call).where(lazy_1: :lazy_1_loaded) do
its('count') { should cmp 3 }
its('lazy_1s.first') { should cmp :lazy_1_loaded }
end
describe lazy_loader(fresh_data.call).where(lazy_3: 1) do
its('count') { should cmp 1 }
its('lazy_3s.first') { should cmp 1 }
its('resource.lazy_3_call_count') { should == 3 }
end
end
control '2370_where_method_only_referenced' do
desc 'When we call where as a method, lazy columns should not load unless referenced'
describe lazy_loader(fresh_data.call).where(color: :red) do
[ :lazy_1, :lazy_2, :lazy_3, :lazy_4 ].each do |lazy_field|
its('params.first.keys') { should_not include lazy_field }
end
end
end
control '2370_populate_once' do
desc 'When we have already triggered a populate, the proc should not be called again'
describe lazy_loader(fresh_data.call).where { lazy_3.kind_of? Integer }.where { lazy_3.kind_of? Integer } do
its('count') { should cmp 3 }
its('lazy_3s.first') { should == 1 }
its('resource.lazy_3_call_count') { should == 3 }
end
end
control '2370_no_side_populate' do
desc 'When we trigger a populate on one column, it should not trigger a populate on another column.'
describe lazy_loader(fresh_data.call).where( lazy_1: :lazy_1_loaded ) do
[ :lazy_2, :lazy_3, :lazy_4 ].each do |lazy_field|
its('params.first.keys') { should_not include lazy_field }
end
end
end
control '2370_no_clobber' do
desc 'When we trigger a populate, it should not clobber existing values in the table.'
describe lazy_loader(fresh_data.call).lazy_2s do
it { should include :lazy_2_loaded }
it { should include :pre_existing }
end
end
control '2370_list_property' do
desc 'When we call a list property on a lazy column, we should get the list'
describe lazy_loader(fresh_data.call).lazy_1s do
its('count') { should cmp 3 }
it { should include :lazy_1_loaded }
end
end
control '2370_list_property_filter_method' do
desc 'When we call a list property on a lazy column with a filter value, we should get a filtered table'
describe lazy_loader(fresh_data.call).lazy_3s(2) do
its('count') { should cmp 1 }
its('lazy_3s.first') { should cmp 2 }
end
end
control '2370_list_property_filter_block' do
desc 'When we call a list property on a lazy column with a filter block, we should get a filtered table'
describe lazy_loader(fresh_data.call).lazy_3s(2) { lazy_3 == 2 } do
its('count') { should cmp 1 }
its('lazy_3s.first') { should cmp 2 }
end
end
control '2370_no_rows' do
desc 'When the data has no rows, the lazy populator should not get called'
describe lazy_loader([]).where { lazy_3 } do
its('resource.lazy_3_call_count') { should be_zero }
end
end
control '2370_proc_handle_exception' do
desc 'An exception in a Proc should not derail the run'
# TODO read exception
describe lazy_loader(fresh_data.call).lazy_4s do
its('count') { should cmp 0 }
end
end

View file

@ -0,0 +1,24 @@
class LazyLoader < Inspec.resource(1)
name 'lazy_loader'
attr_reader :plain_data
attr_accessor :lazy_3_call_count
def initialize(provided_data)
@plain_data = provided_data
@lazy_3_call_count = 0
end
filter_table_generator = FilterTable.create
filter_table_generator.add_accessor(:where)
filter_table_generator.add_accessor(:entries)
filter_table_generator.add(:exists?) { |table| !table.entries.empty? }
filter_table_generator.add(:count) { |table| table.params.count }
filter_table_generator.add(:ids, field: :id)
filter_table_generator.add(:colors, field: :color)
filter_table_generator.add(:lazy_1s, field: :lazy_1, lazy: ->(r,c,t) { r[:lazy_1] = :lazy_1_loaded } )
filter_table_generator.add(:lazy_2s, field: :lazy_2, lazy: ->(r,c,t) { r[:lazy_2] =:lazy_2_loaded } )
filter_table_generator.add(:lazy_3s, field: :lazy_3, lazy: ->(r,c,t) { r[:lazy_3] = t.resource.lazy_3_call_count += 1 } )
filter_table_generator.add(:lazy_4s, field: :lazy_4, lazy: ->(r,c,t) { 1 / 0 } )
filter_table_generator.connect(self, :plain_data)
end