Oracle Connection IO/Reset Error Issue in Talend

0

The Talend Studio allows you to build your Jobs as shell scripts that are executable out of your Talend Studio – in Talend Administration Center or on a command line. When executing a built Job that contains Oracle components out of your Studio, you may run into a Connection Reset error on a random basis. This article provides a workaround that allows you to avoid such errors.

Symptoms/Description

When executing a built Job out of your Studio, and if your Job uses the Oracle 11g driver to connect to an Oracle database, you may get the following error:

Exception in component tOracleConnection_1
java.sql.SQLRecoverableException: Exception d'E/S: Connection reset
        at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:101)
        at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:133)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:199)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:263)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:521)
        at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:418)
        at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:508)
        at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:203)
        ……

Caused by: java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:113)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
        at oracle.net.ns.DataPacket.send(DataPacket.java:150)
        at oracle.net.ns.NetOutputStream.flush(NetOutputStream.java:180)
        at oracle.net.ns.NetInputStream.getNextPacket(NetInputStream.java:169)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:117)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:92)
        at oracle.net.ns.NetInputStream.read(NetInputStream.java:77)
        at oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1034)
        at oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1010)
        at oracle.jdbc.driver.T4CTTIoauthenticate.receiveOauth(T4CTTIoauthenticate.java:760)
        at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:368)
        ……

Resolution

Use either of the following options to configure your JVM to use the urandom device.

Option 1

Start the JVM with the following parameter:

-Djava.security.egd=file:///dev/urandom

Option 2

Use the urandom device globally:

  1. Open the $JAVA_HOME/jre/lib/security/java.security file in a text editor.
  2. Find the following line:
    securerandom.source=file:/dev/random
    and modify it to:
    securerandom.source=file:/dev/urandom
    or
    securerandom.source=file:///dev/urandom

    Alternately, to avoid Java parsing issues on a Unix or Linux operating system, modify the line as follows:
    securerandom.source=file:/dev/./urandom
    or
    securerandom.source=file:/dev/../dev/urandom
  3. Save your change, exit the text editor, and run your Job.

Now the problem should be fixed.

ThreadPool size in Karaf

0

Question

How to increase the Jetty ThreadPool size in Karaf?

Answer

Add the following lines to org.ops4j.pax.web.cfg, located under runtimecontainer/etc, and restart the runtime server:

org.ops4j.pax.web.server.maxThreads=<value>
org.ops4j.pax.web.server.minThreads=<value>
The values can be verified in JConsole.

Command to find the largest directory in the current directory

0

Command to find the largest directory in the current directory

 du -a | sort -n -r | head -n 5

2>&1 in Our Scripting

0
When we are working with a programming or scripting language we are constantly using some idioms, some things that are done in this certain way, the common solution to a problem. With Shell Script this is not different, and a quite common idiom, but not so well understood, is the 2>&1, like in
ls foo > /dev/null 2>&1.
Let me explain what is going on here and why this works the way it does.

A quick introduction to I/O redirection

Simply put, redirection is the mechanism used to send the output of a command to another place. For instance, if we just cat a file, its output will be printed in the screen, by default:
$ cat foo.txt
foo
bar
baz
But we can redirect this output to another place. Here, for example, we are redirecting it to a file called output.txt:
$ cat foo.txt > output.txt

$ cat output.txt
foo
bar
baz
Note that in the first cat we don’t see any output in the screen. We changed the standard output (stdout) location to a file, so it doesn’t use the screen anymore.
It’s also important to know that there are this other place, called standard error (stderr), to where programs can send their error messages. So if we try to cat a file that doesn’t exist, like this:
$ cat nop.txt > output.txt
cat: nop.txt: No such file or directory
Even if we redirect the stdout to a file, we still see the error output in the screen, because we are redirecting just the standard output, not the standard error.

And a quick introduction to file descriptors

A file descriptor is nothing more that a positive integer that represents an open file. If you have 100 open files, you will have 100 file descriptors for them.
The only caveat is that, in Unix systems, everything is a file. But that’s not really important now, we just need to know that there are file descriptors for the Standard Output (stdout) and Standard Error (stderr).
In plain English, it means that there are “ids” that identify these two locations, and it will always be 1 for stdout and 2 for stderr.

Putting the pieces together

Going back to our first example, when we redirected the output of cat foo.txtto output.txt, we could rewrite the command like this:
$ cat foo.txt 1> output.txt
This 1 is just the file descriptor for stdout. The syntax for redirecting is [FILE_DESCRIPTOR]>, leaving the file descriptor out is just a shortcut to 1>.
So, to redirect stderr, it should be just a matter of adding the right file descriptor in place:
# Using stderr file descriptor (2) to redirect the errors to a file
$ cat nop.txt 2> error.txt

$ cat error.txt
cat: nop.txt: No such file or directory
At this point you probably already know what the 2>&1 idiom is doing, but let’s make it official.
You use &1 to reference the value of the file descriptor 1 (stdout). So when you use 2>&1 you are basically saying “Redirect the stderr to the same place we are redirecting the stdout”. And that’s why we can do something like this to redirect both stdout and stderr to the same place:
$ cat foo.txt > output.txt 2>&1

$ cat output.txt
foo
bar
baz

$ cat nop.txt > output.txt 2>&1

$ cat output.txt
cat: nop.txt: No such file or directory

Recap

  • There are two places programs send output to: Standard output (stdout) and Standard Error (stderr);
  • You can redirect these outputs to a different place (like a file);
  • File descriptors are used to identify stdout (1) and stderr (2);
  • command > output is just a shortcut for command 1> output;
  • You can use &[FILE_DESCRIPTOR] to reference a file descriptor value;
  • Using 2>&1 will redirect stderr to whatever value is set to stdout (and 1>&2 will do the opposite).

Doing TAC Activities through MetaServlet API Calls

0


Run the Task by Task Id:

{
  "actionName": "runTask",
  "authPass": "admin",
  "authUser": "admin@company.com",
  "mode": "asynchronous",
  "taskId": 108464
}

Encode the JSON To Base64 Format

Query: http://localhost:16701/org.talend.administrator/metaServlet?ewogICJhY3Rpb25OYW1lIjogInJ1blRhc2siLAogICJhdXRoUGFzcyI6ICJhZG1pbiIsCiAgImF1dGhVc2VyIjogImFkbWluQHR1aS5jb20iLAogICJtb2RlIjogImFzeW5jaHJvbm91cyIsCiAgInRhc2tJZCI6IDEwODQ2NAp9Cg==


Creating a task :
{
  "actionName": "createTask",
  "active": true,
  "applyContextToChildren": false,
  "authPass": "admin",
  "authUser": "admin@company.com",
  "branch": "trunk",
  "contextName": "Default",
  "description": "task1 for extracting data from DB1",
  "execStatisticsEnabled": false,
  "addStatisticsCodeEnabled": false,
  "executionServerName": "runtime_server_st1_new",
  "jobName": "tSimpleDataFlow",
  "jobVersion": "0.1",
  "onUnknownStateJob": "WAIT",
  "pauseOnError": false,
  "projectName": "company_CDM_2",
  "regenerateJobOnChange": false,
  "taskName": "CreationThroughAPI",
  "timeout": 3600
}

Encode the JSON To Base64 Format

http://localhost:16701/org.talend.administrator/metaServlet?ewogICJhY3Rpb25OYW1lIjogImNyZWF0ZVRhc2siLAogICJhY3RpdmUiOiB0cnVlLAogICJhcHBseUNvbnRleHRUb0NoaWxkcmVuIjogZmFsc2UsCiAgImF1dGhQYXNzIjogImFkbWluIiwKICAiYXV0aFVzZXIiOiAiYWRtaW5AdHVpLmNvbSIsCiAgImJyYW5jaCI6ICJ0cnVuayIsCiAgImNvbnRleHROYW1lIjogIkRlZmF1bHQiLAogICJkZXNjcmlwdGlvbiI6ICJ0YXNrMSBmb3IgZXh0cmFjdGluZyBkYXRhIGZyb20gREIxIiwKICAiZXhlY1N0YXRpc3RpY3NFbmFibGVkIjogZmFsc2UsCiAgImFkZFN0YXRpc3RpY3NDb2RlRW5hYmxlZCI6IGZhbHNlLAogICJleGVjdXRpb25TZXJ2ZXJOYW1lIjogInJ1bnRpbWVfc2VydmVyX3N0MV9uZXciLAogICJqb2JOYW1lIjogInRTaW1wbGVEYXRhRmxvdyIsCiAgImpvYlZlcnNpb24iOiAiMC4xIiwKICAib25Vbmtub3duU3RhdGVKb2IiOiAiV0FJVCIsCiAgInBhdXNlT25FcnJvciI6IGZhbHNlLAogICJwcm9qZWN0TmFtZSI6ICJUVUlfQ0RNXzIiLAogICJyZWdlbmVyYXRlSm9iT25DaGFuZ2UiOiBmYWxzZSwKICAidGFza05hbWUiOiAiQ3JlYXRpb25UaHJvdWdoQVBJIiwKICAidGltZW91dCI6IDM2MDAKfQ==

Talend MetaServlet API

0

How Jobs are handled via Talend Administration Center

Talend Jobs are scheduled from Talend Administration Center, which provides a browser management interface, and are parameterized with Context variables. But often you may prefer to have programmatic control of Jobs via an API. This post covers how to expose TalendJobs via the Talend Administration Center API. It provides sample Jobs, some useful browser utilities, and an example of wrapping the API in a RESTful service layer using Data Services.
The Job Conductor page of Talend Administration Center is easy to use and powerful: its allows you to schedule Jobs with simple, more complex Cron triggers, or file triggers. You also have the possibility to launch Jobs manually from the browser. With exception of the file trigger, the Job is still running at a pre-determined time or with explicit human intervention. Using file triggers as the mechanism for inter-process communication may require access privileges which are not allowed in a secure environment, which is why it may be preferable to invoke a Talend Job via a real API.
Jobs can also be parametrized with Context variables. Context variables can be overridden by system administrators on the Job Conductor page to provide additional flexibility. But when the Job is running, it always runs with the same set of pre-configured Context variables, whether they are the default values or the overridden values, they cannot be changed without human intervention. It is preferable to be able to pass parameters via an API.
One option is to build Jobs as self-contained .zip files. The generated archives will include launching scripts and all necessary .jar files. However, the resulting Jobs are running in isolation and lack the monitoring, management, and control parts provided by Talend Administration Center. No centralized logging is provided, and there is no concept of Job Servers or a Job grid. Instead, these responsibilities fall on the developer. As individual solutions proliferate, the management of the broader system becomes more difficult and the maintenance tail becomes more unwieldy.
So while exported Jobs provide flexibility, they sacrifice manageability. The Talend Administration Center API provides a very simple and powerful alternative.

Talend Administration Center MetaServlet API

The Talend Administration Center MetaServlet API is an RPC style HTTP API, (not restful) that it is very easy to use and can be easily wrapped with a RESTful interface if desired.

MetaServlet API

All MetaServlet operations are invoked via an HTTP Get request. All parameters to the operation are encoded as a single, unnamed base-64 encoded parameter to the get request.
The Talend Administration Center MetaServlet command-line tool is available in the following folder of the Talend Administration Center installation directory:
<tomcat_path>/webapps/org.talend.administrator/WEB-INF/classes/MetaServletCaller.bat on Windows
<tomcat_path>/webapps/org.talend.administrator/WEB-INF/classes/MetaServletCaller.sh on Linux
Running the MetaServletCaller with no arguments shows the top level help message:
<tomcat_path>\webapps\tac\WEB-INF\classes>MetaServletCaller.bat
usage: Missing required option: url
 -f,--format-output          format Json output
 -h,--help                   print this help message
 -json,--json-params <arg>   Required params in a Json object
 -url,--tac-url <arg>        TAC's http url
 -v,--verbose                display more informations

--tac-url

In order to get the full detailed help message, the Talend Administration Center service must be up and running and you must pass the --tac-urlparameter.

--help all, -h all

Use --help all to get the full help, and the -h all for an abbreviated version. The examples below capture the output to a text file for subsequent reference.
<tomcat_path>\webapps\org.talend.administrator\WEB-INF\classes>MetaServletCaller.bat 
--tac-url=http://localhost:8080/org.talend.administrator/ -help all > tac-help.txt

<tomcat_path>\webapps\org.talend.administrator\WEB-INF\classes>MetaServletCaller.bat 
--tac-url=http://localhost:8080/org.talend.administrator/ -h > tac-help-short.txt

runTask

Runs a task based on its ID.
<tomcat_path>\webapps\org.talend.administrator\WEB-INF\classes>MetaServletCaller.bat 
--tac-url=http://localhost:8080/org.talend.administrator/ -help runTask
----------------------------------------------------------
  Command: runTask
----------------------------------------------------------
Description             : Allows to run a task defined in Job conductor by its id. 
Mode can be 'asynchronous' or 'synchronous'
Requires authentication : true
Since                   : 4.2
Sample                  :
{
  "actionName": "runTask",
  "authPass": "admin",
  "authUser": "admin@company.com",
  "jvmParams": [
    "-Xmx256m",
    "-Xms64m"
  ],
  "mode": "synchronous",
  "taskId": 1
}
In order to run a task, you need to know its system generated taskId. This information can be retrieved by executing the getTaskIdByName command.

getTaskIdByName

Gets the corresponding ID of the task by looking for its taskName.
<tomcat_path>\webapps\org.talend.administrator\WEB-INF\classes>MetaServletCaller.bat --tac
-url=http://localhost:8080/org.talend.administrator/ -help getTaskIdByName
----------------------------------------------------------
  Command: getTaskIdByName
----------------------------------------------------------
Description             : Get task id by given taskName
Requires authentication : true
Since                   : 5.1
Sample                  :
{
  "actionName": "getTaskIdByName",
  "authPass": "admin",
  "authUser": "admin@company.com",
  "taskName": "task1"
}

Invoking the Talend Administration Center API interactively

When developers work with the MetaServlet API, it can be useful for them to interactively invoke the Talend Administration Center API.
  1. Go to the Job Conductor page of Talend Administration Center, select the arrow icon next to any of the columns and make sure the Id column is checked so that it will be displayed.
    Note that you can also retrieve the ID of the task using MetaServlet by executing the getTaskIdByName command, see Talend Administration Center MetaServlet API commands.
  2. Go to https://www.base64encode.org/ to encode the MetaServlet JSON arguments in base64.
  3. Paste the encoded result into your web browser using the following syntax:
    http://<host>:<port>/<TalendAdministrationCenter_name>/metaServlet?<base64_arguments>
    http://localhost:8080/org.talend.administrator/metaServlet?ew0KICAiYWN0aW9uTmFtZSI6ICJydW5
    UYXNrIiwNCiAgImF1dGhQYXNzIjogInRhZG1pbiIsDQogICJhdXRoVXNlciI6ICJ0YWRtaW5AZW9zdC5uZXQiLA0KI
    CAibW9kZSI6ICJhc3luY2hyb25vdXMiLA0KICAidGFza0lkIjogIjgiDQp9DQo=
    Tip: Add this URL as a bookmark to simplify your access to the website in the future.
    The result of the HTTP Get message is returned as JSON to the object. It includes the execRequestId which is the handle for your new Job instance.
    { execRequestId: "1432855205979_a5zn8", executionTime: { millis: 564, seconds: 0 }, returnCode: 0 }
  4. Once the HTTP Get message is sent, monitor the progress of the execution through the Execution History page of Talend Administration Center.

Invoking the Talend Administration Center API programmatically

Once the JSON arguments are base-64 encoded, they can be passed as the sole parameter to the HTTP Get request. If you are integrating with Talend, your application might be written in regular Java, or possibly some other language, which should not cause any issues since the HTTP Get and base-64 are interoperable standards.
To invoke the Talend Administration Center API, the JSON objects must be base-64 encoded.
  1. If you happen to be integrating with a Java application, use the following Apache Commons Base64 class method:org.apache.commons.codec.binary.Base64.encodeBase64()
  2. If you are using Talend, use the tLibraryLoad component to add the Apache Commons library.
    You can retrieve the MetaServlet Job archive attached to the Downloads tab on the left panel of this page to invoke the Talend Administration Center API from the Talend Job.
    It uses the encodeBase64() method within a tMap prior to the tRESTclient invocation of the Talend Administration Center API operations. Three operations are invoked, and each operation is invoked within its own SubJob. Each SubJob starts by initializing the request from the Context Parameters:
    • The first invocation looks up the taskId based on the human readable Job name.
    • The second invocation uses the taskId returned from the first invocation to trigger the Job.
    • The third invocation uses the returned execRequestId handle as the argument to the getTaskExecutionStatus operation to monitor the Job status.