Jump to content

User:LI AR/Books/Cracking the DataScience Interview: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Liar666 (talk | contribs)
Liar666 (talk | contribs)
Line 260: Line 260:
* http://www.learnpython.org/
* http://www.learnpython.org/
* https://www.codecademy.com/learn/python
* https://www.codecademy.com/learn/python
* http://www.dataschool.io/how-to-get-better-at-data-science/
* http://simplystatistics.org/2015/03/17/data-science-done-well-looks-easy-and-that-is-a-big-problem-for-data-scientists/


; News/Blogs/RSS
; News/Blogs/RSS

Revision as of 16:04, 28 February 2017


Cracking the DataScience Interview

Basic Stuff To Know

Generic pages
Glossaire_de_l'exploration_de_données
Big_data
  • Inspired from books like:
    • "A collection of Data Science Interview Questions Solved in Python and Spark vol I & II"
    • "120 real data science interview questions"
Tips
Bias–variance_tradeoff
Correlation_does_not_imply_causation
Competitions
Datasets
IDEs
Data Manipulation
Maths (Stats / Algebra)
Glossary_of_probability_and_statistics
Curse_of_dimensionality
Mode_(statistics)
Variance
Covariance
Entropy_in_thermodynamics_and_information_theory
Expected_value
Likelihood_function
Cumulative_distribution_function
Probability_mass_function
Probability_density_function
Pareto_efficiency
Tensor_product
Taxicab_geometry
Norm_(mathematics)#Euclidean_norm
Lp_space
Norm_(mathematics)
Determinant
Trace_(linear_algebra)
Eigenvalues_and_eigenvectors
Convolution
Hadamard_product_(matrices)
Kernel_(statistics)
Radial_basis_function
Logit
Latent_variable
Inference
Statistical_inference
Inductive_reasoning
Deduction_and_induction
Discrete_uniform_distribution
Normal_distribution
Bernoulli_distribution
Binomial_distribution
Poisson_distribution
Chi-squared_distribution
Log-normal_distribution
Weibull_distribution
Gamma_distribution
Beta_distribution
Hypergeometric_distribution
  • Neural Nets
Softmax_function
Sigmoid_function
Hyperbolic_function#Tanh
Evaluation
Mean_absolute_percentage_error
Mean_absolute_scaled_error
Symmetric_mean_absolute_percentage_error
Regression-kriging
Information_gain_ratio
Kullback–Leibler_divergence
Gini_coefficient
Akaike_information_criterion
Bayesian_information_criterion
Precision_and_recall
Sensitivity_and_specificity
Receiver_operating_characteristic
Receiver_operating_characteristic#Area_under_the_curve
Cross-validation_(statistics)
Errors_and_residuals
  • If residual is consistantly >0 or <0 on a range of the training set => the model has failed to capture something in the data or we use wrong type of model (e.g. linear reg on parabolic data; DataSkeptic/Heteroskedasticity)
Heteroscedasticity
  • Clustering
Dunn_index
Rand_index
Jaccard_index
Silhouette_(clustering)
Working with Text
Tf–idf
Okapi_BM25
Sentiment_analysis
Named-entity_recognition
Conditional_random_field
Latent_Dirichlet_allocation
Apache_Lucene
Visualization
Data_visualization
Exploratory_data_analysis
Statistical_graphics
Visual_perception
Heat_map
Misleading_graph
Pareto_chart
Feature/Attribute Selection / Dimensionality Reduction
Principal_component_analysis
Independent_component_analysis
Singular_value_decomposition
T-distributed_stochastic_neighbor_embedding
Autoencoder
Deep_learning#Stacked_.28de-noising.29_auto-encoders
Statistical tests
  • Evaluating an hypothesis
Statistical_power
Statistical_hypothesis_testing
P-value
Student's_t-test
Type_I_and_type_II_errors
  • Detecting abrupt changes in time series
Structural_break
Chow_test
Kruskal–Wallis_one-way_analysis_of_variance
F-test
F-statistics
Pairwise_summation
CUSUM
Chaos
Lyapunov_exponent
Techniques
Statistical_classification
Cluster_analysis
Regression_analysis
Linear_regression
Logistic_regression
Association_rule_learning
Survival_analysis
Monte_Carlo_method
Multinomial_logistic_regression
Lasso_(statistics)
Expectation–maximization_algorithm
Hidden_Markov_Models
Viterbi_algorithm
Latent_semantic_analysis
Evolutionary_algorithm
Genetic_algorithm
Voronoi_diagram
Local_outlier_factor
Ordered_weighted_averaging_aggregation_operator
  • Neural Nets
Boltzmann_machine
Ensemble Techniques
Ensemble_learning
Bootstrap_aggregating
Boosting_(machine_learning)
Experimentation framework
Coding / Exposing API to the rest of the application
Microservices
BigData
Star_schema
OLAP_cube
Solid-state_drive
MongoDB
  • Map-Reduce framework
Apache_Hadoop https://hadoop.apache.org/
  • Scrapping
Apache_Flume http://flume.apache.org/
  • Storage
Apache_Hadoop#HDFS https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
Apache_HBase http://hbase.apache.org/
Apache_Hive https://hive.apache.org/
  • Transfers - to/from RelationalDB
Sqoop http://sqoop.apache.org/
  • Transfers - serialization/streaming
Apache_Avro http://avro.apache.org/
Apache_Kafka https://kafka.apache.org/
  • Storage - In memory
Apache_Spark https://spark.apache.org/
  • Admin
Apache_ZooKeeper http://zookeeper.apache.org/
Apache_Cassandra https://cassandra.apache.org
Ambari http://ambari.apache.org/
Apache_Oozie http://oozie.apache.org/
  • Programming
Pig_(programming_tool) https://pig.apache.org/
  • ML
Apache_Mahout http://mahout.apache.org/
Apache_SystemML http://systemml.apache.org/
  • Working with text
Elasticsearch https://www.elastic.co/
  • Working with text - Data Viz
Kibana https://www.elastic.co/products/kibana
Resources
News/Blogs/RSS
Podcasts
MOOCs