inspec/dev-docs/filtertable-internals.md

354 lines
20 KiB
Markdown
Raw Normal View History

# Internals of FilterTable
If you just want to _use_ FilterTable, see filtertable-usage.md . Reading this may make you more confused, not less.
## What makes this hard?
FilterTable was created in 2016 in an attempt to consolidate the pluralization features of several resources. They each had slightly different feature-sets, and were all in the wild, so FilterTable exposes some extensive side-effects to provide those features.
Additionally, the ways in which the classes relate is not straightforward.
## Where is the code?
The main FilterTable code is in [inspec/utils/filter.rb](https://github.com/chef/inspec/blob/master/lib/inspec/utils/filter.rb).
Also educational is the unit test for Filtertable, at test/unit/utils/filter_table_test.rb . Recent work has focused on using functional tests to exercise FilterTable; see test/fixtures/profiles/filter_table and test/functional/filter_table_test.rb .
The file inspec/utils/filter_array.rb appears to be unrelated.
## What are the classes involved?
### FilterTable::Factory
This class is responsible for the definition of the filtertable. It provides the methods that are used by the resource author to configure the filtertable.
FilterTable::Factory initializes three instance variables:
```
@filter_methods = []
@custom_properties = {}
@resource = nil # This appears to be unused
```
### FilterTable::Table
This is the actual innards of the implementation. The Factory's goal is to configure a Table subclass and attach it to the resource you are authoring. The table is a container for the raw data your resource provides, and performs filtration services.
### FilterTable::ExceptionCatcher
TODO
## What are the major entry points? (FilterTable::Factory)
A resource class using FilterTable typically will call a sequence similar to this, in the class body:
```
filter = FilterTable.create
.register_column(:thing_ids, field: :thing_id)
.install_filter_methods_on_resource(self, :table)
```
Legacy code might look like this:
```
filter = FilterTable.create
filter.add_accessor(:entries)
.add(:exists?) { |x| !x.entries.empty? }
.add(:count) { |x| x.entries.count }
.add(:thing_ids, field: :thing_id)
.connect(self, :table)
```
### create
Returns a blank instance of a FilterTable::Factory. It also adds a default implementation of `where` `raw_data`, and `entries` using `register_filter_method`, `count` using `register_custom_property`, and `exist?` using `register_custom_matcher`.
### register\_filter\_method
Legacy name (alias): `add_accessor`
This simply pushes the provided method name onto the `@filter_methods` instance variable array. See "filter_method" behavior section below for what this does.
After adding the method name to the array, it returns `self` - the FilterTable::Factory instance - so that method chaining will work.
### register\_column
Legacy name (alias): `add`
This is currently simply an alias for `register_custom_property`. See it for details. By calling it with a distinctive name, we'll be able to add functionality in the future (especially around introspection).
### register\_custom\_matcher
Legacy name (alias): `add`
This is currently simply an alias for `register_custom_property`. See it for details. By calling it with a distinctive name, we'll be able to add functionality in the future (especially around introspection).
### register\_custom\_property
Legacy name (alias): `add`
This method has very complex behavior, ans should likely be split into several use cases. `register_custom_property` requires a symbol (which will be used as a method name _to be added to the resource class_), then also accepts a block and/or additional args. These things - name, block, and opts - are packed into a simple Struct called a CustomPropertyType. The name stored in the Struct will be `opts[:field]` if provided, and the method name if not.
The CustomPropertyType Struct is then appended to the Hash `@custom_properties`, keyed on the method name provided. `self` is then returned for method chaining.
The implementation of the custom property method is generated by `create_custom_property_body`, and varies based on whether a block was provided to `register_custom_property`.
#### Behavior when a block is provided
This behavior is implemented by lines 398-404.
If a block is provided, it is turned into a Lambda and used as the method body.
The block will be provided two arguments (though most users only use the first):
1. The FilterTable::Table instance that wraps the raw data.
2. An optional value used as an additional opportunity to filter.
For example, this is common in legacy code:
```
filter.add(:exists?) { |x| !x.entries.empty? }
```
Here, `x` is the Table instance, which exposes the `entries` method (which returns an array, one entry for each raw data row).
You could also implement a more sophisticated property, which semantically should re-filter the table based on the candidate value, and return the new table.
```
filter.add(:smaller_than) { |table, threshold| table.where { some_field <= threshold } }
```
```
things.smaller_than(12)
```
If you provide _both_ a block and opts, only the block is used, and the options are ignored.
#### Behavior when no block is provided
If you do not provide a block, you _must_ provide a `:field` option (though that does no appear to be enforced). The behavior is to define a method with the name provided, that has a conditional return type. The method body is defined in lines 306-423.
If called without arguments, it returns an array of the values in the raw data for that column.
```
things.thing_ids => [1,2,3,4]
```
If called with an argument, it instead calls `where` passing the name of the field and the argument, effectively filtering.
```
things.thing_ids(2) => FilterTable::Table that only contains a row where thing_id = 2
```
If called with a block, it passes the block to where.
```
things.thing_ids { some_code } => Same as things.where { some_code }
```
POSSIBLE BUG: I think this case is broken; it certainly seems ill-advised.
#### Known Options
You can provide options to `register_custom_property` / `add`, after the desired method name.
##### field
This is the most common option, and is mandatory if a block is not provided. It selects an implementation in which the desired method will be defined such that it returns an array of the row values using the specified key. In other words, this acts as a "column fetcher", like in SQL: "SELECT some_column FROM some_table"
Internally, (line 271-278), a Struct type is created to represent a row of raw data. The struct's attribute list is taken from the `field` options passed to `register_custom_property` / `add`. This new type is stored as `row_eval_context_type`. It is used as the evaluation context for block-mode `where` calls.
* No checking is performed to see if the field name is actually a column in the raw data (the raw data hasn't been fetched yet, so we can't check).
* You can't have two `register_custom_property` / `add` calls that reference the same field, because the Struct would see that as a duplicate attribute.
POSSIBLE BUG: We could deduplicate the field names when defining the Struct, thus allowing multiple properties to use the same field.
##### style
The `style` option is intended to effect post-processing of the return value from the generated method. To date there is only one recognized value, `:simple`, which `flatten`s, `uniq`s, and `compact`s the array value of the property. This is implemented on line 416.
No other values for `:style` have been seen.
##### lazy
This option implements column-wise lazy loading. The value of the option is expected to a lambda expecting 3 arguments: row (a Hash representing a row of the raw data), condition (a sought value to filter for), and table (a reference to the FilterTable::Table subclass, which may be used for context).
See the usage guide for details on the usage of the lazy mechanism; this document will examine the internals.
### install_filter_methods_on_resource
Legacy name (alias): connect
This method is called like this:
```
filter.install_filter_methods_on_resource(self, :data_fetching_method_name)
```
`filter` is an instance of FilterTable::Factory. `self` is a reference to the resource class you are authoring. `data_fetching_method_name` is a symbol, the name of a method that will return the actual data to be processed by the FilterTable - as an array of hashes.
Note that 'connect' does not refer to Connectors.
`register_custom_property` and `register_filter_method` did nothing other than add register names for methods that we'd like to have added to the resource class. No filtering ability is present, nor are the methods defined, at this point.
So, `install_filter_methods_on_resource`/`connect`'s job is to actually install everything.
#### Defines a special Struct type to support block-mode where
At lines 270-278, a new Struct type `row_eval_context_type` is defined, with attributes for each of the known table fields. We're careful to filter out any fields that have a block implementation, because they cannot be accessed as a row value, and so should not be present on the row context. The motivation for this struct type is to implement the block-mode behavior of `where`. Because each struct represents a row, and it has the attributes (accessors) for the fields, block-mode `where` is implemented by `instance_eval`ing against each row as a struct.
Additionally, an instance variable, `@criteria_string` is defined, with an accessor. `to_s` is implemented, using `@criteria_string`, or `super` if not defined. I guess we then rely on the `Struct` class to stringify.
`@criteria_string` is a trace - a string indicating the filter criteria used to create the table. I found no location where this per-row trace data was used.
Additionally, an instance variable is setup to refer to the filter table later. This is required for lazy-loading columns.
Table fields are determined by listing the `field_name`s of the CustomProperties.
BUG: this means that any `register_custom_property` / `add` call that uses a block but not options will end up with an attribute in the row Struct. Thus, `filter.add(:exists?) { ... }` results in a row Struct that includes an attribute named `exists?` which may be undesired. This attribute will never have a value, because when the structs are instantiated, the block for the field is not called.
#### Re-pack the "connectors"
On lines 280-282, the list of custom properties ("connectors", registered using the `register_custom_property` / `add` method) are repacked into an array of hashes of two elements - the desired method name and the lambda that will be used as the method body. The lambda is created by the private method `create_custom_property_body`; see `register_custom_property` for discussion about how the implementation behaves.
#### Subclass FilterTable::Table into an anonymous class
At line 286, create the local var `table_class`, which refers to an anonymous class that subclasses FilterTable::Table. The class is opened and two groups of methods are defined.
Lines 288-290 install the "custom_property" methods, using the names and lambdas determined on line 281.
Lines 292-294 allow the Table subclass to introspect on the CustomProperties by slipping a reference to it in the class body, forming a closure.
Line 296-303 define a method, `create_eval_context_for_row`. This is used when executing a block-mode `where`; see line 120.
#### Setup the row context struct for lazy loading
If you have a lazy field named `color` and it has not yet been populated, we need to trigger it to populate the first time it is read. If a block-mode where is used (`my_resource.where { color == :red }`), then we have to intercept the Struct's default `getter method`, and call the lazy column's lambda.
Lines 306-329 do exactly that, by defining methods on the Struct subclass we're using for context. We continue to rely on the default Struct setter (`[]=`) and getter at the end (`[]`).
#### Install methods on the resource
Lines 337-348 install the "filter_methods" and "custom properties" methods onto the resource that you are authoring.
Line 337-338 collects the names of the methods to define - by agglomerating the names of the "filter_methods" and "custom properties" methods. They are treated the same.
Line 339 uses `send` with a block to call `define_method` on the resource class that you're authoring. Using a block with `send` is undocumented, but is treated as an implicit argument (per StackOverflow) , so the end result is that the block is used as the body for the new method being defined.
The method body is wrapped in an exception-catching facility that catches skipped or failed resource exceptions and wraps them in a specialized exception catcher class. TBD: understand this better.
Line 342 constructs an instance of the anonymous FilterTable::Table subclass defined at 284. It passes three args:
1. `self`. A reference to the resource instance.
2. The return value of calling the data fetcher method (that is an array of hashes, the raw data).
3. The string ' with', which is probably informing the criteria stringification. The extra space is intentional, as it follows the resource name: 'my_things with color == :red' might be a result.
On line 343, we then immediately call a method on that "FilterTable::Table subclass instance". The method name is the same as the one we're defining on the resource - but we're calling it on the Table. Recall we defined all the "custom_property" methods on the Table subclass at line 288-290. The method gets called with any args or block passed, and since it's the last thing, it provides the return value.
## What is its behavior? (FilterTable::Table)
Assume that your resource has a method, `fetch_data`, which returns a fixed array:
```
[
{ id: 1, name: 'Dani', color: 'blue' },
{ id: 2, name: 'Mike', color: 'red' },
{ id: 3, name: 'Erika', color: 'green' },
]
```
Assume that you then perform this sequence in your resource class body:
```
filter = FilterTable.create
filter.register_filter_method(:entries)
filter.register_filter_method(:where)
filter.register_custom_property(:exists?) { |x| !x.exists.empty? }
filter.register_custom_property(:names, field: :name)
filter.install_filter_methods_on_resource(self, :fetch_data)
```
Legacy code equivalent:
```
filter = FilterTable.create
filter.add_accessor(:entries)
filter.add_accessor(:where)
filter.add(:exists?) { |x| !x.exists.empty? }
filter.add(:names, field: :name)
filter.connect(self, :fetch_data)
```
We know from the above exploration of `install_filter_methods_on_resource` / `connect` that we now have several new methods on the resource class, all of which delegate to the FilterTable::Table implementation.
### FilterTable::Table constructor and internals
Factory calls the FilterTable::Table constructor at 87-93 with three args. Table stores them into instance vars:
* @resource_instance - this was passed in as `self` from line 342
* @raw_data - an array of hashes
* @criteria_string - This looks to be stringification trace data; the string ' with' was passed in by Factory.
* @populated_lazy_columns = a hash, by lazy field name, with boolean values. This is set true if `populate_lazy_field` is called on a field.
The first three get exposed via `attr_reader`s.
### `entries` behavior
From usage, I expect entries to return a structure that resembles an array of hashes representing the (filtered) data.
#### A new method `entries` is defined on the resource class
That is performed by Factory#connect line 339.
#### It delegates to FilterTable::Table#entries
This is a real method defined in filter.rb line 155.
It loops over the provided raw data (@raw_data) and builds an array, calling `create_eval_context_for_row` (see Factory lines 297-303) on each row; also appending a stringification trace to each entry. The array is returned.
#### `entries` conclusion
Not Surprising: It does behave as expected - an array of Hash-like structs representing the table. I don't know why it adds in the per-row stringification data - I've never seen that used.
Surprising: this is a real method with a concrete implementation. That means that you can't call `filter.add_accessor` with arbitrary method names - `:entries` means something very specific.
Surprising: I would not recommend this method be used for data access; instead I would recommend using `raw_data`.
### `where` behavior
From usage, I expect this to take either method params or a block (both of which are magical), perform filtering, and return some object that contains only the filtered rows.
So, what happens when you call `register_filter_method(:where)` and then call `resource.where`?
#### A new method `where` is defined on the resource class
That is performed by Factory#connect line 339.
#### It delegates to FilterTable::Table#where
Like `entries`, this is a real implemented method on FilterTable::Table, at line 98.
The method accepts all params as the local var `conditions` which defaults to an empty Hash. A block, if any, is also explicitly assigned the name `block`.
The implementation opens with two guard clauses, both of which will return `self` (which is the FilterTable::Table subclass instance).
MISFEATURE: The first guard clause simply returns the Table if `conditions` is not a Hash. That would mean that someone called it like: `thing.where(:apples, :bananas, :cantaloupes)`. That misuse is silently ignored; I think we should probably throw a ResourceFailed or something.
The second guard clause is a sensible degenerate case - return the existing Table if there are no conditions and no block. So `thing.where` is OK.
Line 103 initializes a local var, `new_criteria_string`, which again is a stringification tracker.
Line 104 initializes a var to track the `filtered_raw_data`.
Lines 108-113 loop over the provided Hash `conditions`. If the requested field is lazy, it requests that it be populated (note that `populate_lazy_field` is idempotent - it won't fetch a field twice). Next, it repeatedly down-filters `filtered_raw_data` by calling the private method `filter_raw_data` on it. `filter_raw_data` does some syntactic sugaring for common types, Integers and Floats and Regexp matching. Additionally, the line 108-113 loop builds up the stringification tracker, `new_criteria_string`, by stringifying the field name and target value.
Line 118-135 begins work if a filtration block has been provided. At this point, `filtered_raw_data` has been initialized with the raw data, and (if method params were provided) has also been filtered down.
Line 120 filters the rows of the raw data using an interesting approach. Each row is inflated to a Struct using `create_eval_context_for_row` (see line 297). Then the provided block is `instance_eval`'d against the Struct. Because the Struct was defined with attributes (that is, accessor methods) for each declared field name (from FilterTable::Factory#register_custom_property), you can use field names in the block, and each row-as-struct will be able to respond. If the field happened to be lazy, we'll call our custom getter from lines 306-329.
_That just explained a major spooky side-effect for me._
Lines 125-134 do something with stringification tracing. TODO.
Finally, at line 137, the FilterTable::Table anonymous subclass is again used to construct a new instance, passing on the resource reference, the newly filtered raw data table, and the newly adjusted stringification tracer.
That new Table instance is returned, and thus `where` allows you to chain.
#### `where` conclusion
Unsurprising: How where works with method params.
Surprising: How where works in block mode, `instance_eval`'ing against each row-as-Struct.
Surprising: You can use method-mode and block-mode together if you want.