salesforce sfdx

Testing Salesforce Managed Packages (2GP) using Bitbucket Pipelines & SFDX scratch orgs

This post assumes you have access to a Salesforce Dev Hub

I’ve been working on a Second-Generation Managed Package (2GP) and one frequent issue is that the package needs to support both Person Accounts and the standard business accounts and contacts model. On top of that, the package also needs to support a Multiple Currencies environment and a standard one. That means that the same APEX code might work in an environment but not the other.

One great benefit of SFDX is that you can easily spun out a new scratch org that supports Person Accounts, deploy your code and run the tests, and ensure there are no errors thrown. However it is easy to work on your code for a few hours (or days!) in your standard business accounts and contacts model environment before you notice that your package is now broken with Person Accounts environments. That’s where a good Continuous Integration (CI) solution is a good way to remedy to those problems. You might want to spend some time on Trailhead for Continuous Integration Using Salesforce DX if you are unfamiliar with the concept.

I am using Bitbucket for source control so i’ll break down the necessary steps to setup Bitbucket Pipelines with Salesforce DX scratch orgs using the JWT authorization flow.

  1. Create a Private Key and Self-Signed Digital Certificate. You will end up with a server.crt and server.key file
  2. Create a Connected App within your Salesforce Dev Hub (the same you use to create scratch orgs for your development). You will assign your new server.crt digital certificate file to this connected app. Salesforce will provide you with a new Consumer Key.
  3. From your own environment, authenticate to your Dev Hub using the JWT-based flow using your Consumer Key, server.key file and the login username used with your Dev Hub. Important, logout sfdx force:auth:logout -u DevHub before you try to connect again to your Dev Hub with the JWT flow. The JWT to authenticate should look like this
sfdx force:auth:jwt:grant --clientid CONSUMERKEY --jwtkeyfile PATH/TO/server.key --username DEVHUBUSERNAMEEMAIL --setdefaultdevhubusername --setalias DevHub

Now that you have tested that you can connect to your Dev Hub using the Connected App, it’s time to get ready to set up your Bitbucket pipeline. Very conveniently, Salesforce is providing a pre-made test Salesforce Package you can use on Bitbucket to test the Bitbucket Pipelines before you configure it within your own existing package.

  1. Because you don’t want to store your key server.key that allows a direct access to your Dev Hub within source control (you could if you want, but it would be safer to avoid it), you want instead to encrypt it and provide the keys to decrypt it at runtime using Bitbucket Repository variables. You first want to generate a key and initializtion vector (iv), which will be needed for encrypting your key (and decrypting it later)

openssl enc -aes-256-cbc -k <passphrase here> -P -md sha1 -nosalt

That commande will provide you with a key and a iv value (keep note of them in a safe space).

  1. Encrypt your server.key file using the key and iv value
openssl enc -nosalt -aes-256-cbc -in server.key -out server.key.enc -base64 -K <key from above> -iv <iv from above>

You now have a server.key.enc file, which is what you will commit and store in your repository (remember not to store the server.key file in source control).

  1. Go within your Bitbucket Repository, go to Settings and then Repository variables. Add those 4 variables
      • Should be the key value you generated
      • Should be the iv value you generated
      • The consumer key from your Salesforce Connected App on your Dev Hub
      • The username you use to connect to the Dev Hub
  1. Copy the bitbucket-pipelines.yml file to the root directory of your Salesforce managed package project and customize it. This is the configuration file for your Bitbucket Pipeline where all the scratch orgs are created, tests are run, package version are created, etc.
    1. Update the value of PACKAGENAME for the hardcoded ID value of your package. Which you can find within your sfdx-project.json or by typing sfdx force:package:list (this is the value that starts with 0Ho)
    2. Update the step Decrypt server key to point to the right location where you file server.key.enc is located in your repository (and make sure the --out argument output (where the decrypted certificate will be stored) is the same as the input for the step Authorize Dev Hub
  2. The last test run (Run unit tests on scratch org) step in the bitbucket-pipelines.yml file runs all the tests located on the scratch org that installed the package. However, because i’m working on a managed package in a Namespace environment, I had to change this line to manually define all the tests I want to run manually using the --tests argument otherwise the tests will not run.
  3. The file bitbucket-pipelines.yml is configured to execute every time you commit anything on any branch, of course you might want to change that. Read the doc on how to Configure your pipeline (I’ve changed it to only execute on schedule every night)
  4. After you commit all your files (bitbucket-pipelines.yml and server.key.enc), go to your repository, click settings, within Pipeline Settings click Enable Pipelines
  5. Your next commit should execute the pipeline! In order to confirm the execution was completed successfully, go within the Pipelines section of your repository.

If you encounter an issue with the pipeline running in your project and can’t figure out the problem, I’d recommend to create a new repository using the SFDX bitbucket package, create a sample unlocked package and practice on this one first.

Leaflet maps openstreetmap olpc

Offline Solution for OpenStreetMap (OSM)

I’ve recently became involved with XSCE (School Server Community Edition) on their “Internet in a box” project to allow OpenStreetMap (OSM) maps to be available offline. Some of their deployments in remote schools around the world do not have a consistent internet access. So the idea is to download and store a set of knowledge resources (Wikipedia, videos from Khan Academy, OSM maps, etc) on a server, which will then provide those resources while being offline to laptops connected on the internal network.

Here are the constraints that need to be considered

  • The laptops that will be visualizing the maps are very underpowered. They are often XO laptops from the One Laptop per Child OLPC project.
  • The server, while not being as underpowered as the laptop, are typically quite limited as well on the HD, RAM and CPU.
  • Server handle other tasks than providing maps so this can’t be using entirely the hardware available
  • Server specs are not consistent from a deployment to the other (but they do have in common that they must run the XSCE software)
  • Deployments’ needs are rarely the same, they can be in any region of the world and each of them might not want the same level of map details for the same countries
  • Server is typically configured by a volunteer that has internet access, before it is deployed in remote locations. While they do have IT knowledge, this need to be simple enough.
  • Map does not need to be updated every week, but it needs to be relatively recent. If the server gets internet access once in a while, it needs to be able to update the maps relatively easily

The solution chosen is shown on that architecture diagram.

Since the server specs are limited, the map tiles needs to be pre-rendered before they make it to the XSCE internet in a box server. They cannot be rendered on the fly from the native OSM solution which uses a PostgreSQL database with PostGIS because it requires too much resources and would require to provision a different database for each deployment.

The pre-rendered tiles are stored into a MBTiles file, which is a format created by Mapbox that allows to stores efficiently millions of tiles in a SQLite database (which is then stored in a single file). It is efficient because it avoids duplicate tiles, which is frequent with large area of water. This also simplifies deployment because all you have to do is to move few files around instead of potentially copying millions of PNG tiles stored directly on disk.

To allow saving previous HD disk space, there will be a global planet OSM MBtiles (that does not zoom above level 10, which only zoom up to the city level) and then each country will be available for download as a separate pre-rendered MBTiles file (for zoom level 11 to 15). So for example, if the deployment is in Nepal, they could decide to download on the server the planet MBtiles file to get the map of the whole world, and then only specifically download the higher-zoom file for Nepal, to allow to zoom up to the street level. Downloading the whole world at zoom level up to 15 would require way above 1TB of HD space, which we can’t handle. This is why we want to get a high zoom level only for the countries that are needed by the deployment and based on how much HD space they have to spare.

To serve the MBTiles on a web server, there are a few options like TileStream (node.js) and TileStache (python). I chose TileStache, because it supports composite layers, which allows to serve multiple MBtiles file at the same time. TileStream only supports serving one MBTiles at a time, which would require to merge multiple MBtiles together, which is possible, but complicates deployment and makes it harder if we want to add/remove only specific countries later on. TileStache can serve tiles on WSGI, CGI and mod_python with Apache. XSCE also happens to already run multiple tools with Python and use WSGI with another tool, so the integration was easier (click here for details on the integration).

Then all you need is a simple HTML page, that will load Leaflet as a client side javascript library and will be configured to query  the Tilestache tile server located on the local network.

This solution is entirely based on raster tiles, instead of vector tiles. While vector tiles offers significant savings in terms of disk usage, they require much more CPU usage to render on the frontend and newer browsers, which is impossible with the type of hardware that we have (XO laptops).

The big remaining question is, where are those tiles being rendered, where are they stored and how can they be downloaded on demand by the XSCE server? This is a topic for a further blog post!

maps openstreetmap

Merging multiple MBTiles together

I’ve started to use TileStream as a node.js server to serve pre-rendered tiles saved in a .mbtiles file. This server can then be used as a tile layer by Leaflet to build a frontend HTML page that will show a map.

As I said in a previous blog post, MBTiles is a format created by mapbox that allows to stores millions of tiles in a SQLite database, which can be useful if you want to build an OpenStreetMap solution that will store tiles offline or store it in your own server.

There are many alternatives to TileStream, such as TileStache (in python). One benefit of TileStache is that it supports composite layers, which would allow you to serve multiple MBTiles at the same time on the same map. One reason you might want to do that would be if you have a MBTiles file that contains the map tiles of the whole OSM planet with zoom levels from 0 to 10. And then you could have a specific country in another MBTiles file that has the zoom levels 11 to 15. By combining both, you allow a seamless experience where someone could zoom up to 10 on any part of the world and then zoom up to 15 in a specific country.

While TileStream does not support this, you could instead decide to merge the two MBTiles file together, to a single file. This single file can then be served by TileStream.

As part of the MBUtil project, a patch bash shell script is provided that allows to do exactly that, available here: It is as easy as executing the script, while providing the “source” and “destination” arguments to merge the two files (the destination MBtiles will become the merged file). Example:

./ Nepal013.mbtiles Nepal-1415.mbtiles

This script could also be used if you wanted to update an existing larger MBTiles file, with a newer MBTiles file (that might contain newer tiles for a specific region).

While this script will merge the two set of tiles together, it will not update the metadata of the MBTiles file. For example, if my destination file was a MBTiles file that contained the Nepal region from zoom level 14-15, and I merged it with the zoom levels 0-13, the metadata in the destination file will still mention that the minzoom and maxzoom are 14 to 15. I downloaded DB Browser for SQLite (Mac/Windows/Linux), open my merged MBTiles file, went to the metadata table, and from there it’s easy to update the minzoom to 0. (This step might not be required, this depends how strict your MBTiles implementation is, but this is a good practice to have your metadata match the actual data in the MBTiles)

Screen Shot 2015-08-06 at 09.57.22

After those steps, TileStream was able to serve the single merged MBTiles file across all zoom levels.

maps openstreetmap

Running Maperitive on MacOS X

I was looking for a way to run maperitive, which is a software that allows you to convert OpenStreetMap (OSM) data files in .pbf and .bz2 (that you can find on Geomatrik) to a Mbtiles file format. Mbtiles is a format created by mapbox that allows to stores millions of tiles in a SQLite database, which can be useful if you want to build an OSM solution that will store tiles offline.

Maperitive is a .NET application, that should typically works with mono on Linux and on Mac and will work natively on Windows. However, the author does not provide official support for Mac (installations for Linux are here).

I first downloaded Maperitive (2.3.33 at the time of this writing)

I first tried to install mono with homebrew and then launch the executable maperitive.exe located inside the maperitive directory
mono maperitive.exe
However I got this error
Unhandled Exception:

System.TypeInitializationException: An exception was thrown by the type initializer for System.Drawing.KnownColors —> System.TypeInitializationException: An exception was thrown by the type initializer for System.Drawing.GDIPlus —> System.DllNotFoundException: libgdiplus.dylib

However, I got lucky and found a recent solution in this github ticket

First had to install brew cask

brew install caskroom/cask/brew-cask

Installed mono-sdk with brew cask

 brew cask install mono-mdk 

And then opened the installer to install mono-mdk in OS X

 open /opt/homebrew-cask/Caskroom/mono-mdk/4.0.2/MonoFramework-MDK-4.0.2.macos10.xamarin.x86.pkg 

(At the time of this writing, this was Mono 4.0.2, adjust the path from the previous command in consequence)

After that, all I had to do was type

env PATH=/Library/Frameworks/Mono.framework/Commands:$PATH mono Maperitive.exe 

And maperitive launched!

I didn’t end up using maperitive (it does everything in RAM and does not scale well if you have a big OSM file), but I thought I might give the solution here.


How to Run a Non-Profit or Social Enterprise Website For Cheap

I previously wrote this post on the blog of ChooseSocial.PH, an online directory of social enterprises in the Philippines. This is a project I’ve been working on in the last few months and this blog post summarizes the technical choices I have made. 

When we started building ChooseSocial.PH, we wanted a website that would be reliable and fast while also being affordable. We want ChooseSocial.PH to become the ultimate resource for anyone who wants to learn about social enterprises in the Philippines. We also want it to eventually become profitable, though we admit we’re far from having a sound business model at this point. All things considered, this blog post is about the technical choices and services that we use in order to keep our operating costs to a minimum in the meantime.

First of all, we have to recognize that building a website and the applications that power it is rarely cheap. The fact that I have a technical background and that I decided to spent countless hours on the technical implementation myself allowed us to build our platform and website with minimal dollar investment. An NGO or social enterprise without in-house IT talent would need to invest in the initial implementation of their platform. That said, this article is about keeping your website online for cheap after it’s built.

One-time costs
There were some initial investments we had to make on the graphic design side. These are the “one time costs” that won’t be needed until your next redesign. We hired someone to create our logo (tip: get Daisy Munoz, she’s amazing), a website theme ($25 on ThemeForest) and stock pictures (65$ on Stocksy, they have great unique shots).

While it may be tempting to choose only the free or least expensive themes and stock photos, we would caution against this. Most people or small businesses will be doing the same, and it really wouldn’t be ideal to see elements of your website popping up everywhere.

Costs you can’t avoid – The domain name
You will always need a domain name and if you want to host and run your own website, you will have to pay for it every year. It can typically be $15 for a .com, but in our case we had to get a .PH domain name for Philippines (+/- $33 on

This is the part that gets more technical and might change based on your own technology stack. Keep in mind that the technical choices that we have made are for a moderately popular website  that could support somewhere in the range of 10 000 to 50 000 visits a month. Some of the services we are listing don’t recommend their free plan to be used in production for commercial usage. However, we have found that their stability and speed were sufficient for our needs.

We have both a website and a backend application (which we use to input the information and research that we conduct on each social enterprise featured on our website) that are running on a single Node.js application. We use the Heroku free plan to host the application. For our MongoDB database, we use the sandbox plan on mongolab, which offers 500Mb for free. We have a cron job setup for free locally to backup our database daily. The search engine powering our internal search is Azure Search and free as well when using less than 10 000 records.

To host the images on our website, we use AWS Amazon S3. The cost varies on the bandwidth being used, which is typically less than $1 a month. One way to keep the bandwidth cost low for S3 is to cache its content with CloudFlare. CloudFlare provides an amazing service that will cache all the static content of your website (images, javascript, css), deliver it from a server (CDN) near to the location of your visitor, and provide your site with protection against bots and DDOS attacks. All of this is free.

Email services
In order to let people email us with the Contact Us box on our website, we use SendGrid which allows 400 emails a day for free. People can also sign up for our newsletter using MailChimp, which supports up to 2000 free subscribers. In order to have an email associated with our domain name ( we use Zoho Mail that lets you to create 10 free email addresses with the Lite plan.

We built a map on our Explore page that lets people explore social enterprises based on their location in the Philippines. For various reasons we wanted to avoid Google Maps so the map is using the community and open-based OpenStreetMap. In order to render the map on a page, you need a server that will provide you with the tiles. You can get free tiles with MapQuest, which is what we used for a while. However, we weren’t very satisfied with the visual quality of those maps (colors are not great and the resolution is poor) so we switched to Mapbox. Their free plan is 50,000 views a month, which goes a long way. If you need more,  you can get a discount on their paid plan if you mention to them that you are a non-profit.

To ensure the stability of our website, we have various tools that help us monitor the performance and alert us of eventual problems. We use the free plan at UptimeRobot to monitor the website and email us if it goes down. We use the NewRelic stark free plan with Heroku to analyze and measure the speed of the website. And finally we use the Heroku free test plan at Raygun to email us if a user encounters an application error while browsing the site.

In conclusion
All of these services have allowed us to successfully launch ChooseSocial.PH and accomplish the technical goals we had set: reliability, speed and affordability. Feel free to contribute in the comments if you have found other useful and affordable services that could be useful for those in the NGO, social enterprise, and start up community!


Resources for elasticsearch

elasticsearch is an open-source search technology that recently reached the stable 1.0 version after many years of development. It has a great API and works very well. Here’s a list of resources to help you learn about this product

Official resources

elasticsearch – the definitive guide
New book written by the elasticsearch team, will eventually be available as an ebook/paper copy but also available for free online. The work is in progress and you can submit issues on github since it’s open-source

official documentation
Where you can find the full documentation for the product

Non-official resources

exploring elasticsearch
Free ebook with tutorials to learn more about elasticsearch when you get started. It was made before the 1.0 version so keep in mind it might not be entirely up to date, but it definitely serves as a good intro.

Learn and play with Elasticsearch | Found
Lots of  technical articles about elasticsearch, by a company that offers an elasticsearch as a service cloud solution. Very high quality

Updated on December 8th 2014


Install Oracle Endeca Commerce Tools and Frameworks v11 on Linux

If you are trying to install Tools and Frameworks under the new Oracle Endeca Commerce v11 with Linux, you might have noticed that the installer just got more complicated. We used to be able to install Tools and Frameworks by moving an unzipped directory to the right location and that would be all.

Now we need to run an installer located at:


If you are connecting through the console of a Linux box using SSH, you might have got this error

Preparing to launch Oracle Universal Installer from /tmp/OraInstall2014-03-04_02-06-11PM. Please wait ...<br />DISPLAY not set. Please set the DISPLAY and try again.<br />Depending on the Unix Shell, you can use one of the following commands as examples to set the DISPLAY environment variable:<br />- For csh: % setenv DISPLAY<br />- For sh, ksh and bash: $ DISPLAY=; export DISPLAY<br />Use the following command to see what shell is being used:<br />echo $SHELL<br />Use the following command to view the current DISPLAY environment variable setting:<br />echo $DISPLAY<br />- Make sure that client users are authorized to connect to the X Server.<br />To enable client users to access the X Server, open an xterm, dtterm or xconsole as the user that started the session and type the following command:<br />% xhost +<br />To test that the DISPLAY environment variable is set correctly, run a X11 based program that comes with the native operating system such as 'xclock':<br />% &amp;lt;full path to xclock.. see below&amp;gt;<br />If you are not able to run xclock successfully, please refer to your PC-X Server or OS vendor for further assistance.<br />Typical path for xclock: /usr/X11R6/bin/xclock

The installer is trying to run the program through a visual user interface that is using a X window, which isn’t possible through a text-based terminal.

From there, you can use the silentInstaller (see documentation for more details) but there are less options available and I personally had issues running it. So instead, here are the steps if you want to run the installer with X11 to be able to interact with the user interface.

If you are on Mac, download and install Xquartz, which is an open-source X.Org X Window System (if running OS X before 10.8, it is bundled with the OS).

For other operating systems, you will have to find your own version of X.Org X11 and install it.

Then, all you have to do is connect on the Linux server where Endeca will be installed by using the -X parameter (which enables a X11 forwarding funnel to your own machine)


ssh -X username@hostname

Then, when you will run the installer


The installer will automatically launch XQuartz (or your own version of on your operating system) on your machine and you’ll be able to install Endeca Tools and Frameworks on the Linux box.

Tools and Framework installer for Oracle Endeca Commerce


How I found a way to install node.js & npm on CentOs 5

I was having lots of difficulties to install node.js / npm on a server running CentOS 5. The node.js documentation mentions that we should be adding the EPEL repository and then using “yum install npm” to install it. However, no matter how much I tried, this package seems unavailable on CentOS 5. Compiling from source was a solution but I really did not want to install all the development tools on that machine that is essentially just a testing server. All I needed node.js for was to run tests from the mocha test framework.

After much research I finally found an answer on serverfault (a branch of StackOverflow). It suggests to use nave, a virtual environment for node that will install npm and node.js just for the local user. Turns out this is exactly what I wanted since it would simplify the migration if I need to move my testing environment to another server.

Commands in the console:
chmod +x
./ install stable
./ use stable

(You could replace stable with a specific node version number)

And that’s all! node and npm are available from this specific user, no need to compile node from source to use it from CentOS.


How-to Install Oracle Endeca Integrator on Mac OS X

Oracle Endeca Information Discovery is shipping with the ETL tool “Endeca Information Discovery Integrator”. This is in fact a white label name for the open-source tool CloverETL with some added features.

Since Clover can work on Mac, I was wondering if I would be able to make CloverETL work on Mac and be compatible with the existing Endeca tools. Most of the time during Endeca development, we use Endeca Integrator Server to load the data into Endeca Server. Those software typically run on powerful servers so the only tool you really need to use locally is Integrator, to be able to make modifications to the Endeca loading graphs and upload them onto Endeca Integrator Server. Since I use a Mac for all my development needs, it only makes sense to run Integrator locally on Mac and let the other tools run on the servers.

Those instructions are for Endeca OEID 2.3, assuming someone already installed it on Windows. I believe the instructions are very similar for Endeca OEID 3.0 and also on Linux (using different paths).

To get started, download CloverETL Designer Free Trial 3.2.1 for Mac

Since you already own a license of Clover by buying OEID, I assume that this is safe to transfer your license from the Windows version of Clover to the Mac version, but do it at your own risk!
So after installing Integrator on Windows

Find license.dat into


When you launch CloverETL Designer on Mac, copy paste the content of the license.dat file into the box that asks you to enter your license file.

Move every folder that starts with com.endeca. from Integrator Windows at this path:


to Mac

CloverETL Designer Application -> Right click Show Package Contents -> Contents -> MacOs -> plugins

Relaunch CloverETL on Mac, Go to Clover Preferences

Click Browse
Drag and drop the plugins folder in the window

Go choose


Click Import

Click Apply
Copy all the files in



CloverETL Designer Application -> Right click Show Package Contents -> Contents->MacOs -> plugins -> com.cloveretl.gui_3.2.1 -> icons

Relaunch Clover.

That is all! You now have CloverETL with all the extra Endeca graph components and also with the ability to export and run your graphs on Integrator Server. You can start editing your graphs in your existing Endeca project.


Learn how to access The Sentinel Project open data with our APIs

(This is a blog post that I wrote for The Sentinel Project blog, also available here:

At the Sentinel Project, we are big advocates of making the data we are creating openly available for everybody that wants access to it. Making our data available allows any people of the public to learn from our data, create data visualization, gain new insights or create mashup with other sets of data openly available.

In this blog post, you will learn how to easily access and manipulate the two main flows of data that we are making available.

The software that makes it very easy to get started is OpenRefine (formerly called Google Refine). This data manipulation tool was previously created by Google and later abandoned and given to the open-source community. Go ahead, install it and run it, it runs on Windows, Linux and Mac.

Before you get to run it, you need to decide on the data you will be using and build the URL you will need to access it. The two main streams of data available at The Sentinel Project are available in JSON format through a URL-based API.


Threatwiki is our genocide risk tracking and visualization platform to help monitor communities at risk of genocide around the world (more details about the tool on the launch article). The data is a list of events, researched and found by our research analysts, that are chosen because they would indicate a threat to the community and fit as part of our Stages of Genocide Model. We previously used this data set to create a visualization of the persecution against the Baha’i community in Iran.

There are 3 kinds of data

  • Datapoints: this is the main type of data. Those datapoints contain the events themselves, which can be further sorted by description, genocide stage, location, tags, event date, etc.
    • To get all the datapoints of the Iran Situation of Concerns (same API url used to build our visualization. Notice it’s under the format /api/datapoint/soc/Name_of_situation_of_concern),%20Islamic%20Republic%20of
    • All the datapoints under the Genocide stage Extermination
  • Situation of Concerns (SOC): the countries or regions that we are currently gathering data on
    • If you want a list of all the situation of concerns:
  • Tags: each datapoint gets tagged in order to simplify filtering among them
    • If you want a list of all the tags that are being used to classify datapoints into the Myanmar situation of concern:

Get a full list of all the URLs possible on the Github project page:


Hatebase is the world’s largest online database of hate speech launched in March. On top of being a catalog of hate speech terms, it also tracks usage of hate speech, either submitted manually by our users or automatically through a bot that scans geo-located tweets that contain hate speech terms. All this data is also available for free.

In order to query the API you first need to

Once this is done, Hatebase has a page with the instructions to query the Hatebase API.

For your convenience, here are few examples with the main two sets of data of Hatebase (keep in mind you are limited to 100 queries a day on Hatebase)

  • Vocabulary, which includes the hate speech words that are contained in the database
      • All the terms in French about ethnicity
    • All the vocabulary (if there is more than a 1000 words, you will need to use the pagination option)
  • Sightings, usage of the hate speech terms, either observed by our users or found on Twitter
    • All the sightings between 2013-07-01 and 2013-07-13
    • All the sightings (each page provides 1000 records, increment page number for the number of sightings you want)

Using OpenRefine

After installing OpenRefine and launching it, it opens a page in your browser.

  1. Click on Create Project -> Web Addresses. That’s where you put the URL link to the data you want to obtain and manipulate, either for Hatebase or Threatwiki.
  2. Choose JSON files parsing
  3. Select in the preview the part of the JSON data that corresponds to a record
  4. Choose a Project Name on the top right and click Create Project
  5. You get your data displayed in a table (excel-stype) type of format

On the button Export at the top right of the page, you can decide to export to other type of file formats (such as Excel) or other formats that would allow you to analyze the data in other software.

You can also use OpenRefine to manipulate the data directly. There are tons of resources out there on how to use OpenRefine. You can filter the data, sort it, change the name of columns, get a list of all the values available in a column, transform the data using a set of scripts, etc.

I’ve made this short video to show you quickly the kind of manipulation you could do with OpenRefine.

I hope this blog post helped you understand how to obtain data through our API. Don’t hesitate to write to us at and let us know how you use the data!


Explore the Endeca Configuration Repository with WebDav

With Oracle Endeca Commerce, the Workbench application is storing all its configuration files inside a configuration repository. That includes typical workbench configurations used by the MDEX, Experience Manager configurations such as cartridges XMLs, landing pages and also media assets accessible by the Media Banner cartridge.

While Endeca provides scripts in the control project folder to upload content to that configuration repository, such as set_media, set_editors_config, set_templates and import_site, one  less known fact is that you can  connect to the repository  through a WebDav client and have a direct access to all the files and folders.

The reason it’s possible is because the technology underneath the Endeca Configuration Repository is  Apache Sling, a REST web framework built on top of the Java Content Repository Apache JackRabbit. If you use Windows, you can download a client such as CloudSafe to connect to it. On Mac, Cyberduck supports it (in theory MacOS X is supposed to support the WebDav protocol natively but for some reason when trying to connect to this WebDav repository, I can’t establish the connection).

Once you are ready to connect to the WebDav server, the URL to use is the link stored  in the config file WorkbenchConfig.xml for the IFCR component.

If the URL is: http://localhost:8006/ifcr, you should add /sites/name_of_your_project at the end.
So for the reference application, it would be http://localhost:8006/ifcr/sites/Discover . The user/password is be the same as the one stored in that same config section (admin/admin by default).

WebDav configuration to access Endeca Configuration Repository with CloudSafe
WebDav configuration to access Endeca Configuration Repository with CloudSafe

Once you are logged in with WebDav, you can start transferring files back and forth between the config repository and your own hard drive. I wouldn’t advise to change the XML config files stored in that config repository without knowing exactly what you are doing but there is a clear use case for managing your media assets (such as pictures and videos) that are accessible through the Media banner cartridge in Page Manager. While Endeca provides the set_media control script to upload files, being able to drag and drop files and delete them as you want directly into the config repository is much more easier and natural to do with a user interface than having to re-run that control script every time you want to make changes to your media files. You might as well consider giving access to that repository to your business users so that they can easily start managing the media files that they will need to use when creating the landing pages.

Here is the list of folders visible from a WebDav client when logging in on the Configuration Repository

Endeca Configuration Repository on a Webdav client

Let me know in the comments if you can think of other interesting use case to use the Endeca Configuration Repository through a WebDav client!

Crossfilter D3 javascript Leaflet maps

Creating a data visualization tool using D3.js, Crossfilter and Leaflet

I’ve recently completed my first javascript data visualization project at The Sentinel Project for Genocide Prevention to develop a dashboard that tracks indicators of hate crime in Iran.

You can take a look here:

Screenshot of Threatwiki Data Visualization

Initial requirement

I had a set of data coming from a JSON REST API and I wanted to show the main data in a table, display the geographical coordinates of the data on a map and be able to filter by date and tags.

Technology choices

I chose Crossfilter to be able to filter through the data, D3.js to generate/display all the data and the map itself is created with leaflet.js

Implementation details

For the map generated with leaflet.js, I used CloudMade (which I highly recommend) to get maps from OpenStreetMaps. I added a layer on top of the map to show different regions of the country, which was coming from Natural Earth shapefiles and converted to data with TopoJson and added on the map with D3. Leaflet turned out to be a great tool that I will definitely use again, I used the D3 + Leaflet tutorial from Mike Bostock to learn how to combine the 2 tools together. To add the datapoints at specific locations on the map, I used the marker functionality of Leaflet. I started adding the datapoints manually on the map by generating the proper svg tags with D3 on the canvas but I switched to markers because this will allow me later to integrate with LeafLet Market Cluster. This Leaflet plugin gives you the possibility of combining multiple closely located points on a map, which becomes useful when you have a high concentration of points in a a small region that are hard to distinguish at the normal zoom level. This other tutorial Let’s make a map by Mike Bostock  also come handy when working with D3 and maps.

There was a good learning curve necessary to understand D3.js. Reading code from examples is not always enough, I had to read about the core concepts and the introduction available on their github wiki. However, once you understand the concept, you realize that this is a very powerful tool to generate all kind of visualization either visually (in a canvas using SVG) or just displaying a set of data as text.

To be able to filter by tags, date, and other metadata, I used Crossfilter, a open-source library developed by Square. To learn about Crossfilter, I used this excellent example on their website. I used their example code to create the visual bar chart that is used as a timeline. There is also a great tutorial on the Wealthfront Engineering blog that really helped me to understand the concepts of the library.  Crossfilter was definitely easier to grasp and understand. The most recent version of the tool, 1.2.0., released just a few days ago, is definitely necessary. They introduced the concept of filterFunction, to give you more control on how you filter your data.

For a typical filtering in Crossfilter, you only provide the object to the filter function of a chosen dimension and Crossfilter will do the filtering. You  need to implement a filter function when your data structure gets more complex. Let’s say you have a record that can have one or multiple tags. If you only filter by giving the name of a tag, you will only get back the records that contain very specifically only this 1 tag and ignore all the records that have this tag combined with other tags.

So you want to implement a filterFunction like this:

window.filter = function(tagname) {
 byTags.filterFunction(function (tag) {
  if (tag!==null && typeof(tag)!='undefined'){
   for(i=0; i<tag.length; i++) {
    if (tag[i].title==tagname){
     return true;
  return false;

In that case we will go through our array of tag inside the record and return true to the filter at the moment we find the matching tag inside the array, so it wouldn’t matter if we have 1 tag or 10 tags on that record.

To learn more and get involved

If you are interested to look at the code, our project is hosted on Github. You can also look directly at the visualization javascript file. The data for this project is free of access to anyone who wants it, drop me an email if you need information on that.

That visualization for The Sentinel Project for Genocide Prevention has been announced on their blog.

If you have some experience with data visualization, maps, GIS or you just are a data nerd send me an email, we have lots of interesting projects coming up!


Introduction to Infographics and Data Visualization course

In the last few months, I’ve followed an online class called “Introduction to Infographics and Data Visualization” by Alberto Cairo  (Twitter: @acairo) at the Knight Center for Journalism in the Americas.

It was the second edition of the course and I would recommend you to join a future edition if you are interested in the topic. It was mostly targeted for journalists, so there was a lot of content about infographics and lots of examples are visualizations we see in magazines. However, lots of those concepts can still be applied in IT world and there was some good reading provided about designing dashboards. There are lots of examples to explain why some types of data visualization (pie charts, bubble chart on a map) rarely make any sense for the readers. In any way, it was a great introduction to the topic of data visualization. I wasn’t new to the topic because of my experience with Endeca dashboards and Salesforce dashboards but it’s great to leave the technical aspect of those visualizations and focus on what data visualization can actually mean for to the users.

The class didn’t provide training on technical tools, it was really more about learning about the theory behind the topic. The class provided a trial license for Tableau, which is a nice touch, but I haven’t spent time learning on that.

I was initially planning to do all the homeworks related to the course and apply for an official course certificate at the end of the class. However, I quickly fell behind on the weekly schedule they were proposing and did not have enough time to do all that. So I really only focused on learning on the topic based on the content they provide (youtube videos + PDF reading) and did not do any homework. It really reminded me how I do not miss university!


Oracle Endeca Commerce Resources

Updated on September 12th 2014

Since I already wrote a blog post listing a list of resources for Oracle Endeca Information Discovery, I figured it would be good to do the same for the Oracle Endeca Commerce (Experience Manager).

Official Oracle resources

OTN Technical Questions Forum
The official forum for people having technical questions. Use the search feature when you have an issue, you will surely find someone that had a similar problem.

Documentation Index
Full documentation for Endeca Commerce in PDF and all related modules

Oracle Endeca Community
Official discussion support forum limited to people having access to Oracle Support (you will need a login access to Oracle Support). Less questions than the OTN Technical Questions but still a lot of relevant informations available.

You can also have access to the Knowledge Base which include articles written by the members of the Oracle Support team that relates to common bugs and problems encountered by the various clients.

OnDemand webcast: Oracle Endeca Commerce v11 and v11.1 release
90 minutes webcast talking about the new features of v11.1 release

2 hours long free webcast that explains the new features in Endeca Commerce (now called Oracle Commerce v11) with 1 hour just for details on the technical implementation, very useful

Oracle Commerce v11 and v11.1 What’s new document
The document lists the new features of Endeca with Oracle Commerce v11 and v11.1 in PDF. As part of Endeca Documentation Index you can also look at the release notes of each application but this one provides a more business-friendly list of features.

Oracle Commerce Youtube Channel
Not as good as the Oracle OEID Youtube channel, but you might still find some generic information about Endeca Commerce on this Youtube channel.

Oracle A-Team Chronicles
Oracle has a technical team that publishes various blog posts, they have some interesting Endeca articles available

Non-Official Oracle resources
Questions & Answers type of website that includes questions on both Endeca Commerce and Endeca Information Discovery. The Learning Center includes a couple articles on Endeca Commerce.

Bird’s Eye View blog
Contain some articles on the integration between Endeca and ATG

Faceted Guides
Community written guides that include instructions on how to install Endeca and do typical operations

Quest 4 ATG
Some blog posts on the integration of Endeca with ATG and steps to deploy an Endeca application

GroupBy Inc
Oracle Gold Partner consulting firm, full of Endeca experts (and former Endeca employees). If you need help with your Endeca implementation, contact them. (I work there!)

salesforce sublimetext

Sublime Text 2 integration with Salesforce

Ever since I’ve been working on Salesforce development, using the official Salesforce Eclipse-based IDE on Mac has been very painful. Everything is extremely slow and that’s just not an environment that is enjoyable to use. Editing directly in the text editor of the Sandbox was my quick-and-dirty way of changing code without going through the process of launching this monster Salesforce IDE.

I was looking for some kind of plugins to do Salesforce development using Sublime Text 2. What I found was even much better than what I expected.

MavensMate IDE for on GitHub (OS X only)

Using the instructions available here, you use some commands in the terminal to install the add-on. It adds a new menu to Sublime Text exclusively for your Salesforce APEX development.

After configuring your Salesforce user/password (with your security token appended to the password), you decide which objects metadata to load locally and you are good to go. You have access to your classes, tests, triggers, visualforce, etc..

You can also run your tests, verify code coverage of your tests, and you get auto-completion on your code using the metadata loaded previously.

Just the same as if you were using the Salesforce IDE, but much better integrated in your Sublime development environment!


javascript node.js

MongoDB and Node.Js Tutorial

Quick post to link to an excellent tutorial on the Heroku website to use MongoDB as a database with Node.Js using the Mongoose library.

This is the exact same implementation that we used to build ThreatWiki with Mongoose. Quick and fast to implement.

One thing that surprised me about this guide is that it recommends to specify the option “safe” mode for the connection with mongoDB. The safe mode is the mode that guarantees that you will be informed by an error if a write to the database fails (in the default implementation of MongoDB it’s off by default to increase speed).

However, as we can read on Mongoose documentation page:

By default this is set to true for all schemas which guarentees that any error that occurs will get reported back to our method callback.

So the safe mode is already enabled by default by mongoose for every save on the database, so it doesn’t seem necessary to specify it like the tutorial is proposing.

bootstrap javascript

Select2 javascript library with Bootstrap

I’m currently volunteering for The Sentinel Project on a genocide risk tracking and visualization platform named ThreatWiki built with node.js and mongoDB. Read more details about the launch on The Sentinel Project blog

One important feature in the frontend is the ability to add tags to datapoints created in the system. As we were testing locally in our environment, the typical multi-select input box looked like a good enough way to select tags.

Typical tag list

However after starting to import an Excel sheets that contains over 500 datapoints and over 50 different tags, it became clear that this way of selecting tags wouldn’t work well for the users. I decided to add the Select2 library that allow us to search across all the tags and will display the existing tag choices below.

Adding it to our existing codebase was very minimal work, I actually wish I did it before. My only concern was to know if it would play nice with our Bootstrap frontend framework. At first look it didn’t but adding “width: ‘resolve'” as option to select2 fixed it (see that issue on select2’s github). The result looks fantastic and easier to use by our users

Tag selection with Select2

Look at ThreatWiki on GitHub if you want to know more about the project!


Resources for Oracle Endeca Information Discovery (OEID)

(Updated on Nov 29th 2013)

Since Endeca was bought by Oracle, resources to help you implement your Oracle Endeca Information Discovery solution have been scattered a little bit everywhere, so here is a summary of all the resources I have found.

Official Oracle resources

Oracle OTN Discussions Forum
Publicly accessible forum where lots of questions are being asked daily, lots of great answers. Make use of the search to read the archives!

Product Documentation
The complete documentation for Information Discovery 3.0 and the previous versions. Available online in HTML or in .PDF
There’s no search across all documents online so I suggest to download the full package in PDF and use your operating system search to execute a full search.

Oracle EID Youtube channel
Training videos for Information Discovery 3.0 and 2.3 by the Oracle team. It’s a very good introduction to the product and it covers some advanced features as well. The videos are not too long, it’s definitely worth a watch

Oracle Endeca Information Discovery Wiki:
This official Oracle wiki contains great Design Patterns for Integrator (CloverETL) and some informations on Endeca Query Language (EQL)

Endeca Community Forum
This forum is only accessible when you have access to Oracle Support. The forum is supposed to be monitored by the Support team but in my experience it is hard to get answers to questions and a lot less people are reading this forum compared to the public forum.

To access it, click on that link and login using your Oracle Support user/password. Then click Discussion.

From the Support page, you can also access the Endeca Knowledge Base

And the Information Center 

Non-Oracle resources

Endeca123 Blog
Contains great tutorials on common Endeca use cases that are not very well covered by the official documentation

Another Question and Answers forum where questions from Information Discovery are mixed with the other Endeca products. Look at the Information Center for some blog articles on Endeca.

Rittman Mead Consulting Blog
Some Endeca tutorial and a great article about record-level security.

Branchbird Blog
Lots of articles on Endeca Information Discovery and eCommerce

3sixty-analytics Blogs
Some blog articles on Endeca Information Discovery

GroupBy Inc
Oracle Gold Partner consulting firm, full of Endeca experts (and former Endeca employees). If you need help with your Endeca implementation, contact them. (I work there!)

chez brochez blog
Blog from an Endeca consultant (Ryan B. Rochez), lots of technical posts about Endeca Information Discovery 3.0

OBIEE tips and tricks
Great article on how to install quickly OEID 3.1

I will update this post if I hear of other resources!