Jump to content

User:LI AR/Books/Cracking the DataScience Interview: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Liar666 (talk | contribs)
Liar666 (talk | contribs)
Line 60: Line 60:
* Inspiration for this section: https://github.com/soulmachine/machine-learning-cheat-sheet
* Inspiration for this section: https://github.com/soulmachine/machine-learning-cheat-sheet
:[[Glossary_of_probability_and_statistics]]
:[[Glossary_of_probability_and_statistics]]
:[[Curse_of_dimensionality]]
:[[Mode_(statistics)]]
:[[Mode_(statistics)]]
:[[Variance]]
:[[Variance]]
Line 105: Line 106:
:[[Sigmoid_function]]
:[[Sigmoid_function]]
:[[Hyperbolic_function#Tanh]]
:[[Hyperbolic_function#Tanh]]



;Evaluation
;Evaluation
Line 119: Line 119:
:[[Gini_coefficient]]
:[[Gini_coefficient]]
:[[Akaike_information_criterion]]
:[[Akaike_information_criterion]]
:[[Bayesian_information_criterion]]
:[[Precision_and_recall]]
:[[Precision_and_recall]]
:[[Sensitivity_and_specificity]]
:[[Sensitivity_and_specificity]]
Line 192: Line 193:
:[[Lasso_(statistics)]]
:[[Lasso_(statistics)]]
:[[Expectation%E2%80%93maximization_algorithm]]
:[[Expectation%E2%80%93maximization_algorithm]]
:[[Hidden_Markov_Models]]
:[[Viterbi_algorithm]]
:[[Latent_semantic_analysis]]
:[[Latent_semantic_analysis]]
:[[Evolutionary_algorithm]]
:[[Evolutionary_algorithm]]
:[[Genetic_algorithm]]
:[[Genetic_algorithm]]
:[[Voronoi_diagram]]
:[[Voronoi_diagram]]
:[[Hidden_Markov_model]]
:[[Local_outlier_factor]]
:[[Local_outlier_factor]]
:[[Ordered_weighted_averaging_aggregation_operator]]
:[[Ordered_weighted_averaging_aggregation_operator]]

Revision as of 16:02, 28 February 2017


Cracking the DataScience Interview

Basic Stuff To Know

Generic pages
Glossaire_de_l'exploration_de_données
Big_data
  • Inspired from books like:
    • "A collection of Data Science Interview Questions Solved in Python and Spark vol I & II"
    • "120 real data science interview questions"
Tips
Bias–variance_tradeoff
Correlation_does_not_imply_causation
Competitions
Datasets
IDEs
Data Manipulation
Maths (Stats / Algebra)
Glossary_of_probability_and_statistics
Curse_of_dimensionality
Mode_(statistics)
Variance
Covariance
Entropy_in_thermodynamics_and_information_theory
Expected_value
Likelihood_function
Cumulative_distribution_function
Probability_mass_function
Probability_density_function
Pareto_efficiency
Tensor_product
Taxicab_geometry
Norm_(mathematics)#Euclidean_norm
Lp_space
Norm_(mathematics)
Determinant
Trace_(linear_algebra)
Eigenvalues_and_eigenvectors
Convolution
Hadamard_product_(matrices)
Kernel_(statistics)
Radial_basis_function
Logit
Latent_variable
Inference
Statistical_inference
Inductive_reasoning
Deduction_and_induction
Discrete_uniform_distribution
Normal_distribution
Bernoulli_distribution
Binomial_distribution
Poisson_distribution
Chi-squared_distribution
Log-normal_distribution
Weibull_distribution
Gamma_distribution
Beta_distribution
Hypergeometric_distribution
  • Neural Nets
Softmax_function
Sigmoid_function
Hyperbolic_function#Tanh
Evaluation
Mean_absolute_percentage_error
Mean_absolute_scaled_error
Symmetric_mean_absolute_percentage_error
Regression-kriging
Information_gain_ratio
Kullback–Leibler_divergence
Gini_coefficient
Akaike_information_criterion
Bayesian_information_criterion
Precision_and_recall
Sensitivity_and_specificity
Receiver_operating_characteristic
Receiver_operating_characteristic#Area_under_the_curve
Cross-validation_(statistics)
Errors_and_residuals
  • If residual is consistantly >0 or <0 on a range of the training set => the model has failed to capture something in the data or we use wrong type of model (e.g. linear reg on parabolic data; DataSkeptic/Heteroskedasticity)
Heteroscedasticity
  • Clustering
Dunn_index
Rand_index
Jaccard_index
Silhouette_(clustering)
Working with Text
Tf–idf
Okapi_BM25
Sentiment_analysis
Named-entity_recognition
Conditional_random_field
Latent_Dirichlet_allocation
Apache_Lucene
Visualization
Data_visualization
Exploratory_data_analysis
Statistical_graphics
Visual_perception
Heat_map
Misleading_graph
Pareto_chart
Feature/Attribute Selection / Dimensionality Reduction
Principal_component_analysis
Independent_component_analysis
Singular_value_decomposition
T-distributed_stochastic_neighbor_embedding
Autoencoder
Deep_learning#Stacked_.28de-noising.29_auto-encoders
Statistical tests
  • Evaluating an hypothesis
Statistical_power
Statistical_hypothesis_testing
P-value
Student's_t-test
Type_I_and_type_II_errors
  • Detecting abrupt changes in time series
Structural_break
Chow_test
Kruskal–Wallis_one-way_analysis_of_variance
F-test
F-statistics
Pairwise_summation
CUSUM
Chaos
Lyapunov_exponent
Techniques
Statistical_classification
Cluster_analysis
Regression_analysis
Linear_regression
Logistic_regression
Association_rule_learning
Survival_analysis
Monte_Carlo_method
Multinomial_logistic_regression
Lasso_(statistics)
Expectation–maximization_algorithm
Hidden_Markov_Models
Viterbi_algorithm
Latent_semantic_analysis
Evolutionary_algorithm
Genetic_algorithm
Voronoi_diagram
Local_outlier_factor
Ordered_weighted_averaging_aggregation_operator
  • Neural Nets
Boltzmann_machine
Ensemble Techniques
Ensemble_learning
Bootstrap_aggregating
Boosting_(machine_learning)
Experimentation framework
Coding / Exposing API to the rest of the application
Microservices
BigData
Star_schema
OLAP_cube
Solid-state_drive
MongoDB
  • Map-Reduce framework
Apache_Hadoop https://hadoop.apache.org/
  • Scrapping
Apache_Flume http://flume.apache.org/
  • Storage
Apache_Hadoop#HDFS https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
Apache_HBase http://hbase.apache.org/
Apache_Hive https://hive.apache.org/
  • Transfers - to/from RelationalDB
Sqoop http://sqoop.apache.org/
  • Transfers - serialization/streaming
Apache_Avro http://avro.apache.org/
Apache_Kafka https://kafka.apache.org/
  • Storage - In memory
Apache_Spark https://spark.apache.org/
  • Admin
Apache_ZooKeeper http://zookeeper.apache.org/
Apache_Cassandra https://cassandra.apache.org
Ambari http://ambari.apache.org/
Apache_Oozie http://oozie.apache.org/
  • Programming
Pig_(programming_tool) https://pig.apache.org/
  • ML
Apache_Mahout http://mahout.apache.org/
Apache_SystemML http://systemml.apache.org/
  • Working with text
Elasticsearch https://www.elastic.co/
  • Working with text - Data Viz
Kibana https://www.elastic.co/products/kibana
Resources
News/Blogs/RSS
Podcasts
MOOCs