George Opritescu ♥ Developer from somewhere

setting chef environment for zsh with chef-dk

Run this from your terminal:

eval "$(chef shell-init zsh)"

scala spark - No TypeTag available

Received the error No TypeTag available when trying to package a spark sql app. Turns out, it was caused by having a case class defined inside a method. Moving it outside of the method fixes the compilation;

scala spark

scala.runtime.VolatileObjectRef.zero() when submitting job to spark

➜  SparkyOne ~/Tools/spark-1.4.1/bin/spark-submit --class Crocodil target/scala-2.11/sparkyone_2.11-1.0.jar
Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.VolatileObjectRef.zero()Lscala/runtime/VolatileObjectRef;
        at Crocodil$.main(Crocodil.scala)
        at Crocodil.main(Crocodil.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)

I received the above error when doing a submit. Turns out, my spark 1.4.1 was using scala 2.10.4, while my scala compiler was 2.11.8. Changing to this in my build.sbt:

scalaVersion := "2.10.4"

followed by a repackage, and a resubmit to spark, fixed the issue.

spark scala jar

find port where hdfs is listening

hdfs getconf -confKey fs.default.name

hdfs hadoop

view logs for a hadoop jar execution

Assuming you started a jar with

1	hadoop jar your_jar some_args

, you might see something similar to this, when the job starts:

16/04/24 07:33:01 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1461491409387_0019/

Once the job finishes, a way to view your logs is by issuing this command:

yarn logs -applicationId 1461491409387_0019

I found this helpful for viewing log statements that I added in the jar.

hadoop

splitting a big file into multiple smaller ones

Used the following to split the cloudera zip into smaller ones:

split --bytes 300M --numeric-suffixes --suffix-length=3 cloudera-quickstart-vm-5.5.0-0-virtualbox.zip cloud.

ubuntu file big

vagrant private network and /etc/udev/rules.d/70-persistent-net.rules

Encountered the following error when trying to setup private networking in vagrant:

The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

ARPCHECK=no /sbin/ifup eth1 2> /dev/null

Stdout from the command:

Device eth1 does not seem to be present, delaying initialization.


Stderr from the command:

Additionally, the call to private_network had to be changed to this:

box.vm.network "private_network", ip: "whatever_ip", :auto_config => false

The issue is discussed here, and this post helped me fix the issue. Basically, you have to check the contents of the file: /etc/udev/rules.d/70-persistent-net.rules. The safest bet seems to be to delete that file, so that vagrant can assign device names without MAC address issues. Thanks a lot, ablecoder!

vagrant networking

decompressing gzip http response with node-red

Add zlib in settings.js, if you don’t already have it:

functionGlobalContext: {
   zlib: require('zlib'),
   ...
}

Then create a new function node with the following code:

var buffer = new Buffer(msg.payload);
msg.payload = global.get('zlib').gunzipSync(buffer).toString();
return msg;

This won’t yield the best performance, but it works in my case. Most likely, if could be done better by using the callback version of gunzip, and then using send to pass the msg to the next node.

node.js node-red

Parse data from another agent using WebsiteAgent in huginn

For one use case, I had to use a PostAgent, which returned html, and I wanted to parse it’s created events with a WebsiteAgent. Here’s the code that made it happen:

PostAgent:

{
  "post_url": "URL to which you're posting goes here",
  "expected_receive_period_in_days": "1",
  "content_type": "form",
  "method": "post",
  "payload": {
    "param1": 1,
    "param2": "2"
  },
  "headers": {},
  "emit_events": "true"
}

The events created by the previous agent will be in the following format:

{
  "body": "<html>...</html>",
  "headers": {
    "Date": "Fri, 01 Apr 2016 06:02:40 GMT",
    "Pragma": "no-cache",
    "Expires": "Thu, 01 Jan 1970 00:00:00 GMT",
    // other headers cut for brevity
  },
  "status": 200
}

And here’s the WebsiteAgent that parses the events created ( make sure to set the previous agent as source ):

{
  "expected_update_period_in_days": "2",
  "data_from_event": "",
  "type": "html",
  "mode": "on_change",
  "extract": {
    "status": {
      "css": "table tbody td:nth-child(3)",
      "value": ".//text()"
    }
  }
}

huginn ruby

activate ssh-agent automatically when opening bash, in windows

Taken from here. I added the following in my .bashrc ( on the windows system ). You’re likely to find it in the following location

1	C:\Users\YourUserName\.bashrc

env=~/.ssh/agent.env

agent_load_env () { test -f "$env" && . "$env" >| /dev/null ; }

agent_start () {
    (umask 077; ssh-agent >| "$env")
    . "$env" >| /dev/null ; }

# agent_run_state: 0=agent running w/ key; 1=agent w/o key; 2= agent not running
agent_run_state=$(ssh-add -l >| /dev/null 2>&1; echo $?)

if [ ! "$SSH_AUTH_SOCK" ] || [ $agent_run_state = 2 ]; then
    agent_load_env
    agent_start
    ssh-add
elif [ "$SSH_AUTH_SOCK" ] && [ $agent_run_state = 1 ]; then
    ssh-add
fi

unset env

vagrant windows ssh