Feature Statistics
Custom Feature Types :The feature_stat method is used to compute feature statistics.
- Enable you to create custom summary statistics based on the feature type.
- Works on Panda Series. → ads.feature_stat()
- Works on Panda DataFrames. → ads.feature_stat()
The first feature type with a feature_stat() method defined in the inheritance chain is used.
Custom Feature Types Statistics
class JobFunction(FeatureType): @staticmethod def feature_stat(series: pd.Series) -> pd.DataFrame: result = dict() job_function = ['Product Management', 'Software Developer’, 'Software Manager', 'Admin', 'TPM'] for label in job_function: result[label] = len(series[series == label]) return pd.DataFrame.from_dict(result, orient='index’, columns=[series.name])
- Create a Python class that inherits from the FeatureType class.
- Before creating a custom feature statistics, add the @Staticmethod decorator.
- It takes its Panda stages Series and is going to return a Pandas DataFrame. The DataFrame has two columns, metrics and value. The metric column is a string that defines what it is that you were measuring, and the Value column represents the value of the metric. It has to be a continuous value.
every row is a job function, and every value is the number of times that job function occurred in the data set.
Feature Plots
With feature_plot(), you can customize a summary plot for each feature type. The power of the feature plot is that you can customize the feature plot so that it best represents the data you are looking at.
Since a feature can have multiple inheritance, the inheritance chain is used to determine which future plot method is dispatched.
-
series.ads.feature_plot(): call feature_plot on a Panda Series, you will get a univariate plot of your data.
-
df.ads.feature_plot() : call feature_plot on a Pandas DataFrame, you’ll get a list of plots, one for each feature in the DataFrame. In the Notebook session, it will display all the plots.
Custom Feature Plot
To create the custom feature plot, you need to create a custom feature type class and define the feature_plot() method
The method requires you to take a Panda Series and return a matplotlib.axes object.
class CustomCreditCard(FeatureType): @staticmethod def feature_plot(x: pd.Series) -> plt.Axes: card_types = x.apply(assign_issuer) df = card_types.value_counts().to_frame() if len(df.index): ax = sns.barplot(x=df.index, y=list(df.iloc[:, 0]), color='#76A2A0') ax.set(xlabel="Issuing Financial Institution") ax.set(ylabel="Count") return ax
We are going to create a Python class and inherit the FeatureType class. Just the reserve word class followed by the name of the class, in this case, CustomCreditCard.The FeatureType in parenthesis will make it a custom feature type
Feature Type Warnings
Check the state or condition of your data, for example no missing values.
Ensure that the data meets quality standards.
Code checks on the data and then repeat the process each time a new data set is used.
- Run on Pandas Series : series.ads.warning()
- Run on Pandas DataFrame : df.ads.warning()
Creating Warnings
When we created feature statistics and feature plots, we did this by creating a FeatureType class and then defining a method.
The warning system does not work this way. It is meant to be more dynamic and reusable.
Warnings are registered with the feature type object at runtime. This allows you to create a warning and then reuse that code in many feature types. It also allows you to remove the warnings that don’t apply to your specific data set.
Warnings consist of a name and a warning handler pair.
- A warning handler returns a specifically formatted dataframe.
- Warning handlers can report on multiple warnings.
Each feature type can have multiple names and warning handler pairs.
Defining Warning Handler and Registering Warnings
There are three steps to creating a warning. The first thing is to create a warning handler.
- create a warning handler : A warning handler is a Python function, not a method. It doesn’t belong to any class.
- get the FeatureType object : You get this feature type object by calling feature_type_manager. feature_type method, and then you give it the name of the feature type.The feature_type_object is the object that represents the feature type. Specifically, it is an object that represents the FeatureType class.
- register the handler with the FeatureType object : you register the warning, use the feature_type_object call, and call warning.register method. Pass in the name of the warning and the handler.
Feature Type Validators
- ensure that all data is valid : The feature type validators are a way of performing this validation. They are built-in methods for feature types that are provided by ADS. But the idea is for you to create methods for your custom feature types.
- defined at the feature-type level : You define functions that are applied to the features.
- Set “is” underscore “something” methods, where something is generally the name of the feature type : for example, is_credit_card could be called to ensure that the data is a valid credit card number.
- support multiple validators for any feature type : you may want a validator that says is_credit_card. But you may also want one that says is_visa and is_mastercard. They could determine if the card is actually a Visa card or a Mastercard.
- Validate each observation individually.
- Return a Boolean Pandas Series the same length as the data set.
- Are inherited from all the feature types in the inheritance chain.
- Are executed by using the name of the validator: series.ads.validator.is_credit_card()
- List what validators are available on:
- Pandas Series :ads.validator_registered()
- DataFrame : ads.validator_registered()
Create Validators
1.Create a handler
def is_visa_card_handler(data: pd.Series, *args, **kwargs) -> pd.Series:
PATTERN = re.compile(_pattern_string, re.VERBOSE)
def _is_credit_card(x: pd.Series):
return (
not pd.isnull(x)
and PATTERN.match(str(x)) is not None
)
return data.apply(lambda x: True if _is_credit_card(x) else False)
2.get a feature type object
3.register the handler
CreditCard = feature_type_manager.feature_type_object('credit_card')
CreditCard.validator.register(name='is_visa_card',
handler=is_visa_card_handler)
Types of Feature Type Validators
Default : Handler that is called when no other handler can process a request.
series.ads.validator.is_credit_card()
Open Value : Formal arguments have required arguments but the value of the argument is not constrained.
CreditCard.validator.register(
name='is_credit_card’,
condition=("card_type",),
handler=is_any_card_handler)
series.ads.validator.is_credit_card(card_type = 'Visa')
series.ads.validator.is_credit_card(card_type = 'Mastercard')
Closed Value : Formal arguments have required arguments and specific values are required.
CreditCard.validator.register(
name='is_credit_card’,
condition={"card_type": "Amex"},
handler=is_amex_handler)
Creating a Custom Feathttps://blogs.oracle.com/ai-and-datascience/post/how-to-create-custom-feature-types-for-exploratory-data-analysisure Type
Define a class that inherits from FeatureType.
Register the feature type with feature_type_manager.