Logistic Regression using Mahout | sjsubigdata



Logistic Regression using Mahout | sjsubigdata
  • Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several predictor variables that may be either numerical or categories.
  • It is used to predict a binary response from a binary predictor, used for predicting the outcome of a categoricaldependent variable (i.e., a class label) based on one or more predictor variables (features).
  • Logistic regression is the standard industry workhorse that underlies many production fraud detection and advertising quality and targeting products.
  • Mahout’s implementation of Logistic regression uses Stochastic Gradient Descent (SGD) algorithm
  • This algorithm is a sequential (nonparallel) algorithm, but it’s fast. Because it utilizes the SGD algorithm instead of iteratively reweighted least squares (IRLS), SGD allows for incremental updating, which could be important for some uses. Although there is a parallel algorithm for SGD Parallel Stochastic Gradient Descent, but it was not utilized for mahout.
  • While working with large data, the SGD algorithm uses a constant amount of memory regardless of the size of the input.
  • Mahout includes a command line example of logistic regression program.
  • For production use, the logistic regression stuff mostly is not run from the command line, but is integrated more tightly into some data flow i.e. the logistic regression model is used as part of existing process and code is written to utilize the libraries in the best way possible.
  • Mahout’s implementation of Logistic Regression using SGD supports the following command line program names:
Valid program names are:
o   cat : Print a file or resource as the logistic regression models would see it
o   runlogistic : Run a logistic regression model against CSV data
o   trainlogistic : Train a logistic regression using stochastic gradient descent.

Read full article from Logistic Regression using Mahout | sjsubigdata

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts