mirror of
https://github.com/nushell/nushell
synced 2024-12-26 04:53:09 +00:00
Add a datepart expression for dfr to be used with dfr with-column (#9285)
# Description Today the only way to extract date parts from a dfr series is the dfr get-* set of commands. These create a new dataframe with just the datepart in it, which is almost entirely useless. As far as I can tell there's no way to append it as a series in the original dataframe. In discussion with fdncred on Discord we decided the best route was to add an expression for modifying columns created in dfr with-column. These are the way you manipulate series within a data frame. I'd like feedback on this approach - I think it's a fair way to handle things. An example to test it would be: ```[[ record_time]; [ (date now)]] | dfr into-df | dfr with-column [ ((dfr col record_time) | dfr datepart nanosecond | dfr as "ns" ), (dfr col record_time | dfr datepart second | dfr as "s"), (dfr col record_time | dfr datepart minute | dfr as "m"), (dfr col record_time | dfr datepart hour | dfr as "h") ]``` I'm also proposing we deprecate the dfr get-* commands. I've not been able to figure out any meaningful way they could ever be useful, and this approach makes more sense by attaching the extracted date part to the row in the original dataframe as a new column. <!-- Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes. Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience. --> # User-Facing Changes add in dfr datepart as an expression <!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. --> # Tests + Formatting Need to add some better assertive tests. I'm also not sure how to properly write the test_dataframe at the bottom, but will revisit as part of this PR. Wanted to get feedback early. <!-- Don't forget to add tests that cover your changes. Make sure you've run and fixed any issues with these commands: - `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes) - `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect -A clippy::result_large_err` to check that you're using the standard code style - `cargo test --workspace` to check that all tests pass - `cargo run -- crates/nu-std/tests/run.nu` to run the tests for the standard library > **Note** > from `nushell` you can also use the `toolkit` as follows > ```bash > use toolkit.nu # or use an `env_change` hook to activate it automatically > toolkit check pr > ``` --> # After Submitting <!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. --> --------- Co-authored-by: Robert Waugh <robert@waugh.io>
This commit is contained in:
parent
560c2e63ac
commit
b67b6f7fc5
7 changed files with 236 additions and 49 deletions
|
@ -29,8 +29,8 @@ fn map_sql_polars_datatype(data_type: &SQLDataType) -> Result<DataType> {
|
|||
SQLDataType::Boolean => DataType::Boolean,
|
||||
SQLDataType::Date => DataType::Date,
|
||||
SQLDataType::Time(_, _) => DataType::Time,
|
||||
SQLDataType::Timestamp(_, _) => DataType::Datetime(TimeUnit::Milliseconds, None),
|
||||
SQLDataType::Interval => DataType::Duration(TimeUnit::Milliseconds),
|
||||
SQLDataType::Timestamp(_, _) => DataType::Datetime(TimeUnit::Microseconds, None),
|
||||
SQLDataType::Interval => DataType::Duration(TimeUnit::Microseconds),
|
||||
SQLDataType::Array(inner_type) => match inner_type {
|
||||
Some(inner_type) => DataType::List(Box::new(map_sql_polars_datatype(inner_type)?)),
|
||||
None => {
|
||||
|
|
162
crates/nu-cmd-dataframe/src/dataframe/expressions/datepart.rs
Normal file
162
crates/nu-cmd-dataframe/src/dataframe/expressions/datepart.rs
Normal file
|
@ -0,0 +1,162 @@
|
|||
use super::super::values::NuExpression;
|
||||
|
||||
use crate::dataframe::values::{Column, NuDataFrame};
|
||||
use chrono::{DateTime, FixedOffset};
|
||||
use nu_engine::CallExt;
|
||||
use nu_protocol::{
|
||||
ast::Call,
|
||||
engine::{Command, EngineState, Stack},
|
||||
Category, Example, PipelineData, ShellError, Signature, Span, Spanned, SyntaxShape, Type,
|
||||
Value,
|
||||
};
|
||||
|
||||
#[derive(Clone)]
|
||||
pub struct ExprDatePart;
|
||||
|
||||
impl Command for ExprDatePart {
|
||||
fn name(&self) -> &str {
|
||||
"dfr datepart"
|
||||
}
|
||||
|
||||
fn usage(&self) -> &str {
|
||||
"Creates an expression for capturing the specified datepart in a column."
|
||||
}
|
||||
|
||||
fn signature(&self) -> Signature {
|
||||
Signature::build(self.name())
|
||||
.required(
|
||||
"Datepart name",
|
||||
SyntaxShape::String,
|
||||
"Part of the date to capture. Possible values are year, quarter, month, week, weekday, day, hour, minute, second, millisecond, microsecond, nanosecond",
|
||||
)
|
||||
.input_type(Type::Custom("expression".into()))
|
||||
.output_type(Type::Custom("expression".into()))
|
||||
.category(Category::Custom("expression".into()))
|
||||
}
|
||||
|
||||
fn examples(&self) -> Vec<Example> {
|
||||
let dt = DateTime::<FixedOffset>::parse_from_str(
|
||||
"2021-12-30T01:02:03.123456789 +0000",
|
||||
"%Y-%m-%dT%H:%M:%S.%9f %z",
|
||||
)
|
||||
.expect("date calculation should not fail in test");
|
||||
vec![
|
||||
Example {
|
||||
description: "Creates an expression to capture the year date part",
|
||||
example: r#"[["2021-12-30T01:02:03.123456789"]] | dfr into-df | dfr as-datetime "%Y-%m-%dT%H:%M:%S.%9f" | dfr with-column [(dfr col datetime | dfr datepart year | dfr as datetime_year )]"#,
|
||||
result: Some(
|
||||
NuDataFrame::try_from_columns(vec![
|
||||
Column::new("datetime".to_string(), vec![Value::test_date(dt)]),
|
||||
Column::new("datetime_year".to_string(), vec![Value::test_int(2021)]),
|
||||
])
|
||||
.expect("simple df for test should not fail")
|
||||
.into_value(Span::test_data()),
|
||||
),
|
||||
},
|
||||
Example {
|
||||
description: "Creates an expression to capture multiple date parts",
|
||||
example: r#"[["2021-12-30T01:02:03.123456789"]] | dfr into-df | dfr as-datetime "%Y-%m-%dT%H:%M:%S.%9f" |
|
||||
dfr with-column [ (dfr col datetime | dfr datepart year | dfr as datetime_year ),
|
||||
(dfr col datetime | dfr datepart month | dfr as datetime_month ),
|
||||
(dfr col datetime | dfr datepart day | dfr as datetime_day ),
|
||||
(dfr col datetime | dfr datepart hour | dfr as datetime_hour ),
|
||||
(dfr col datetime | dfr datepart minute | dfr as datetime_minute ),
|
||||
(dfr col datetime | dfr datepart second | dfr as datetime_second ),
|
||||
(dfr col datetime | dfr datepart nanosecond | dfr as datetime_ns ) ]"#,
|
||||
result: Some(
|
||||
NuDataFrame::try_from_columns(vec![
|
||||
Column::new("datetime".to_string(), vec![Value::test_date(dt)]),
|
||||
Column::new("datetime_year".to_string(), vec![Value::test_int(2021)]),
|
||||
Column::new("datetime_month".to_string(), vec![Value::test_int(12)]),
|
||||
Column::new("datetime_day".to_string(), vec![Value::test_int(30)]),
|
||||
Column::new("datetime_hour".to_string(), vec![Value::test_int(1)]),
|
||||
Column::new("datetime_minute".to_string(), vec![Value::test_int(2)]),
|
||||
Column::new("datetime_second".to_string(), vec![Value::test_int(3)]),
|
||||
Column::new("datetime_ns".to_string(), vec![Value::test_int(123456789)]),
|
||||
])
|
||||
.expect("simple df for test should not fail")
|
||||
.into_value(Span::test_data()),
|
||||
),
|
||||
},
|
||||
]
|
||||
}
|
||||
|
||||
fn search_terms(&self) -> Vec<&str> {
|
||||
vec![
|
||||
"year",
|
||||
"month",
|
||||
"week",
|
||||
"weekday",
|
||||
"quarter",
|
||||
"day",
|
||||
"hour",
|
||||
"minute",
|
||||
"second",
|
||||
"millisecond",
|
||||
"microsecond",
|
||||
"nanosecond",
|
||||
]
|
||||
}
|
||||
|
||||
fn run(
|
||||
&self,
|
||||
engine_state: &EngineState,
|
||||
stack: &mut Stack,
|
||||
call: &Call,
|
||||
input: PipelineData,
|
||||
) -> Result<PipelineData, ShellError> {
|
||||
let part: Spanned<String> = call.req(engine_state, stack, 0)?;
|
||||
|
||||
let expr = NuExpression::try_from_pipeline(input, call.head)?;
|
||||
let expr_dt = expr.into_polars().dt();
|
||||
let expr = match part.item.as_str() {
|
||||
"year" => expr_dt.year(),
|
||||
"quarter" => expr_dt.quarter(),
|
||||
"month" => expr_dt.month(),
|
||||
"week" => expr_dt.week(),
|
||||
"day" => expr_dt.day(),
|
||||
"hour" => expr_dt.hour(),
|
||||
"minute" => expr_dt.minute(),
|
||||
"second" => expr_dt.second(),
|
||||
"millisecond" => expr_dt.millisecond(),
|
||||
"microsecond" => expr_dt.microsecond(),
|
||||
"nanosecond" => expr_dt.nanosecond(),
|
||||
_ => {
|
||||
return Err(ShellError::UnsupportedInput(
|
||||
format!("{} is not a valid datepart, expected one of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond", part.item),
|
||||
"value originates from here".to_string(),
|
||||
call.head,
|
||||
part.span,
|
||||
));
|
||||
}
|
||||
}.into();
|
||||
|
||||
Ok(PipelineData::Value(
|
||||
NuExpression::into_value(expr, call.head),
|
||||
None,
|
||||
))
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod test {
|
||||
use super::super::super::test_dataframe::test_dataframe;
|
||||
use super::*;
|
||||
use crate::dataframe::eager::WithColumn;
|
||||
use crate::dataframe::expressions::ExprAlias;
|
||||
use crate::dataframe::expressions::ExprAsNu;
|
||||
use crate::dataframe::expressions::ExprCol;
|
||||
use crate::dataframe::series::AsDateTime;
|
||||
|
||||
#[test]
|
||||
fn test_examples() {
|
||||
test_dataframe(vec![
|
||||
Box::new(ExprDatePart {}),
|
||||
Box::new(ExprCol {}),
|
||||
Box::new(ExprAsNu {}),
|
||||
Box::new(AsDateTime {}),
|
||||
Box::new(WithColumn {}),
|
||||
Box::new(ExprAlias {}),
|
||||
])
|
||||
}
|
||||
}
|
|
@ -3,6 +3,7 @@ mod arg_where;
|
|||
mod as_nu;
|
||||
mod col;
|
||||
mod concat_str;
|
||||
mod datepart;
|
||||
mod expressions_macro;
|
||||
mod is_in;
|
||||
mod lit;
|
||||
|
@ -17,6 +18,7 @@ use crate::dataframe::expressions::arg_where::ExprArgWhere;
|
|||
use crate::dataframe::expressions::as_nu::ExprAsNu;
|
||||
pub(super) use crate::dataframe::expressions::col::ExprCol;
|
||||
pub(super) use crate::dataframe::expressions::concat_str::ExprConcatStr;
|
||||
pub(crate) use crate::dataframe::expressions::datepart::ExprDatePart;
|
||||
pub(crate) use crate::dataframe::expressions::expressions_macro::*;
|
||||
pub(super) use crate::dataframe::expressions::is_in::ExprIsIn;
|
||||
pub(super) use crate::dataframe::expressions::lit::ExprLit;
|
||||
|
@ -64,6 +66,7 @@ pub fn add_expressions(working_set: &mut StateWorkingSet) {
|
|||
ExprMean,
|
||||
ExprMedian,
|
||||
ExprStd,
|
||||
ExprVar
|
||||
ExprVar,
|
||||
ExprDatePart
|
||||
);
|
||||
}
|
||||
|
|
|
@ -46,35 +46,66 @@ impl Command for AsDateTime {
|
|||
}
|
||||
|
||||
fn examples(&self) -> Vec<Example> {
|
||||
vec![Example {
|
||||
description: "Converts string to datetime",
|
||||
example: r#"["2021-12-30 00:00:00" "2021-12-31 00:00:00"] | dfr into-df | dfr as-datetime "%Y-%m-%d %H:%M:%S""#,
|
||||
result: Some(
|
||||
NuDataFrame::try_from_columns(vec![Column::new(
|
||||
"datetime".to_string(),
|
||||
vec![
|
||||
Value::Date {
|
||||
val: DateTime::parse_from_str(
|
||||
"2021-12-30 00:00:00 +0000",
|
||||
"%Y-%m-%d %H:%M:%S %z",
|
||||
)
|
||||
.expect("date calculation should not fail in test"),
|
||||
span: Span::test_data(),
|
||||
},
|
||||
Value::Date {
|
||||
val: DateTime::parse_from_str(
|
||||
"2021-12-31 00:00:00 +0000",
|
||||
"%Y-%m-%d %H:%M:%S %z",
|
||||
)
|
||||
.expect("date calculation should not fail in test"),
|
||||
span: Span::test_data(),
|
||||
},
|
||||
],
|
||||
)])
|
||||
.expect("simple df for test should not fail")
|
||||
.into_value(Span::test_data()),
|
||||
),
|
||||
}]
|
||||
vec![
|
||||
Example {
|
||||
description: "Converts string to datetime",
|
||||
example: r#"["2021-12-30 00:00:00" "2021-12-31 00:00:00"] | dfr into-df | dfr as-datetime "%Y-%m-%d %H:%M:%S""#,
|
||||
result: Some(
|
||||
NuDataFrame::try_from_columns(vec![Column::new(
|
||||
"datetime".to_string(),
|
||||
vec![
|
||||
Value::Date {
|
||||
val: DateTime::parse_from_str(
|
||||
"2021-12-30 00:00:00 +0000",
|
||||
"%Y-%m-%d %H:%M:%S %z",
|
||||
)
|
||||
.expect("date calculation should not fail in test"),
|
||||
span: Span::test_data(),
|
||||
},
|
||||
Value::Date {
|
||||
val: DateTime::parse_from_str(
|
||||
"2021-12-31 00:00:00 +0000",
|
||||
"%Y-%m-%d %H:%M:%S %z",
|
||||
)
|
||||
.expect("date calculation should not fail in test"),
|
||||
span: Span::test_data(),
|
||||
},
|
||||
],
|
||||
)])
|
||||
.expect("simple df for test should not fail")
|
||||
.into_value(Span::test_data()),
|
||||
),
|
||||
},
|
||||
Example {
|
||||
description: "Converts string to datetime with high resolutions",
|
||||
example: r#"["2021-12-30 00:00:00.123456789" "2021-12-31 00:00:00.123456789"] | dfr into-df | dfr as-datetime "%Y-%m-%d %H:%M:%S.%9f""#,
|
||||
result: Some(
|
||||
NuDataFrame::try_from_columns(vec![Column::new(
|
||||
"datetime".to_string(),
|
||||
vec![
|
||||
Value::Date {
|
||||
val: DateTime::parse_from_str(
|
||||
"2021-12-30 00:00:00.123456789 +0000",
|
||||
"%Y-%m-%d %H:%M:%S.%9f %z",
|
||||
)
|
||||
.expect("date calculation should not fail in test"),
|
||||
span: Span::test_data(),
|
||||
},
|
||||
Value::Date {
|
||||
val: DateTime::parse_from_str(
|
||||
"2021-12-31 00:00:00.123456789 +0000",
|
||||
"%Y-%m-%d %H:%M:%S.%9f %z",
|
||||
)
|
||||
.expect("date calculation should not fail in test"),
|
||||
span: Span::test_data(),
|
||||
},
|
||||
],
|
||||
)])
|
||||
.expect("simple df for test should not fail")
|
||||
.into_value(Span::test_data()),
|
||||
),
|
||||
},
|
||||
]
|
||||
}
|
||||
|
||||
fn run(
|
||||
|
@ -110,11 +141,11 @@ fn command(
|
|||
})?;
|
||||
|
||||
let res = if not_exact {
|
||||
casted.as_datetime_not_exact(Some(format.as_str()), TimeUnit::Milliseconds, None)
|
||||
casted.as_datetime_not_exact(Some(format.as_str()), TimeUnit::Nanoseconds, None)
|
||||
} else {
|
||||
casted.as_datetime(
|
||||
Some(format.as_str()),
|
||||
TimeUnit::Milliseconds,
|
||||
TimeUnit::Nanoseconds,
|
||||
false,
|
||||
false,
|
||||
true,
|
||||
|
|
|
@ -6,7 +6,7 @@ use nu_protocol::{
|
|||
use num::Zero;
|
||||
use polars::prelude::{
|
||||
BooleanType, ChunkCompare, ChunkedArray, DataType, Float64Type, Int64Type, IntoSeries,
|
||||
NumOpsDispatchChecked, PolarsError, Series, TimeUnit, Utf8NameSpaceImpl,
|
||||
NumOpsDispatchChecked, PolarsError, Series, Utf8NameSpaceImpl,
|
||||
};
|
||||
use std::ops::{Add, BitAnd, BitOr, Div, Mul, Sub};
|
||||
|
||||
|
@ -580,10 +580,7 @@ where
|
|||
F: Fn(&ChunkedArray<Int64Type>, i64) -> ChunkedArray<BooleanType>,
|
||||
{
|
||||
match series.dtype() {
|
||||
DataType::UInt32
|
||||
| DataType::Int32
|
||||
| DataType::UInt64
|
||||
| DataType::Datetime(TimeUnit::Milliseconds, _) => {
|
||||
DataType::UInt32 | DataType::Int32 | DataType::UInt64 | DataType::Datetime(_, _) => {
|
||||
let to_i64 = series.cast(&DataType::Int64);
|
||||
|
||||
match to_i64 {
|
||||
|
|
|
@ -749,7 +749,7 @@ pub fn from_parsed_columns(column_values: ColumnMap) -> Result<NuDataFrame, Shel
|
|||
InputType::Date => {
|
||||
let it = column.values.iter().map(|v| {
|
||||
if let Value::Date { val, .. } = &v {
|
||||
Some(val.timestamp_millis())
|
||||
Some(val.timestamp_nanos())
|
||||
} else {
|
||||
None
|
||||
}
|
||||
|
@ -757,7 +757,7 @@ pub fn from_parsed_columns(column_values: ColumnMap) -> Result<NuDataFrame, Shel
|
|||
|
||||
let res: DatetimeChunked =
|
||||
ChunkedArray::<Int64Type>::from_iter_options(&name, it)
|
||||
.into_datetime(TimeUnit::Milliseconds, None);
|
||||
.into_datetime(TimeUnit::Nanoseconds, None);
|
||||
|
||||
df_series.push(res.into_series())
|
||||
}
|
||||
|
|
|
@ -1,6 +1,5 @@
|
|||
mod custom_value;
|
||||
|
||||
use core::fmt;
|
||||
use nu_protocol::{PipelineData, ShellError, Span, Value};
|
||||
use polars::prelude::{col, AggExpr, Expr, Literal};
|
||||
use serde::{Deserialize, Deserializer, Serialize, Serializer};
|
||||
|
@ -8,7 +7,7 @@ use serde::{Deserialize, Deserializer, Serialize, Serializer};
|
|||
// Polars Expression wrapper for Nushell operations
|
||||
// Object is behind and Option to allow easy implementation of
|
||||
// the Deserialize trait
|
||||
#[derive(Default, Clone)]
|
||||
#[derive(Default, Clone, Debug)]
|
||||
pub struct NuExpression(Option<Expr>);
|
||||
|
||||
// Mocked serialization of the LazyFrame object
|
||||
|
@ -31,12 +30,6 @@ impl<'de> Deserialize<'de> for NuExpression {
|
|||
}
|
||||
}
|
||||
|
||||
impl fmt::Debug for NuExpression {
|
||||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
|
||||
write!(f, "NuExpression")
|
||||
}
|
||||
}
|
||||
|
||||
// Referenced access to the real LazyFrame
|
||||
impl AsRef<Expr> for NuExpression {
|
||||
fn as_ref(&self) -> &polars::prelude::Expr {
|
||||
|
@ -132,6 +125,7 @@ impl NuExpression {
|
|||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
// Enum to represent the parsing of the expressions from Value
|
||||
enum ExtractedExpr {
|
||||
Single(Expr),
|
||||
|
|
Loading…
Reference in a new issue