URDU SPEECH TO TEXT API REFERENCE

This section presents code snippets for using CLE Urdu Speech to Text API with JAVA. The API can be used with socket and RestFul API.

For Audio File

1. The recognition method of CLE Speech to Text web service takes two arguments (input speech in base64 format and access token) in JSON format. Audio file is first converted into base64 string. File must be in wav format with 16000 sampling rate, mono channel and a maximum duration of one minute. Library for converting audio to base 64 can be downloaded from the Link.


                    
/*Conversion of wav file into base64 format*/
String WAV_FILE = "<<Path to your wav file>>";
ByteArrayOutputStream out = new ByteArrayOutputStream();
BufferedInputStream in = new BufferedInputStream(new FileInputStream(WAV_FILE));

int read;
byte[] buff = new byte[1024];
while ((read = in.read(buff)) > 0) {
   out.write(buff, 0, read);
}
out.flush();
byte[] audioBytes = out.toByteArray();

String encoded = Base64.encodeBase64String(audioBytes);
/* End of conversion */

String accessToken = "<<your access token>>";
int sRate = 16000;
String lang = "ur"; //It can be either ur (for Urdu) or en (for English)

String JSON_MSG = "{ \"file\" : \"" + encoded + "\" , \"token\" : \"" + accessToken + "\", \"lang\" : \"" + lang + "\", \"srate\" : \"" + sRate + "\"}";
 



                    
                

2. Use JAVA HTTP client for connecting to the web service as shown below. The Apache HttpCore and Apache HttpClient libraries are required which can be downloaded from the Link.

                     
String URL = "api.cle.org.pk";
String postURL = "https://" + URL + "/v1/asr";
HttpClient httpClient = HttpClientBuilder.create().build();
HttpPost post = new HttpPost(postURL);
StringEntity postingString = new StringEntity(JSON_MSG,"UTF-8");
post.setEntity(postingString);
post.setHeader("Content-type", "application/json;odata=verbose");
HttpResponse response = httpClient.execute(post);
String JSON_Response=convertStreamToString(response.getEntity().getContent());

 
                    
                

3. The method will return a HTTP response message. The JSON message can be extracted from the HTTP response using the following function. The JSON message contains status and domain of the input document which can be processed.

                    
private static String convertStreamToString(InputStream is) {

        BufferedReader reader = new BufferedReader(new InputStreamReader(is));
        StringBuilder sb = new StringBuilder();

        String line = null;
        try {
            if ((line = reader.readLine()) != null) {
                sb.append(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                is.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return sb.toString();
 }
  
                    
                

For Microphone Streaming

1. Mic stream should be captured with a sampling rate of 16000, 16 bit encoding, mono chanel, signed and little endian.

                     
AudioFormat format = new AudioFormat(16000, 16, 1, true, false); //Define stream format
Socket server;
InputStream inStream;
String token = "<<your access token>>";
String lang = "ur";//It can be either ur (for Urdu) or en (for English)
String accessToken = token + "/" + lang;

DataLine.Info targetInfo = new DataLine.Info(TargetDataLine.class, format); //Set format of mic stream
                    
                

2. Connect to websocket server and write captured mic stream to socket's output stream. However, the first output to stream must be your access token.

                    
try {
/*Connect to server*/
System.out.println("Creating Socket...");
server = new Socket("202.142.147.156", 3000);
OutputStream out = new DataOutputStream(server.getOutputStream());

/* Start capturing mic stream */
TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(targetInfo);
targetLine.open(format);
targetLine.start();

int numBytesRead;
System.out.println("Buffer Size:" + targetLine.getBufferSize());
byte[] targetData = new byte[(targetLine.getBufferSize() * 2) / 4]; //Initialize byte array to be sent to websocket server

out.write(accessToken.getBytes()); //First output to websocket stream must be your access token
		/* Sending stream and getting results from server */
            while (true) {
                numBytesRead = targetLine.read(targetData, 0, targetData.length);

                if (numBytesRead == -1) {
                    break;
                }

		/* Get decoded results back from server
                JSON string is recieved as result after each second with format {"final":"<true/false>","text":"Decoded output text in utf-8","status":"OK/Failed"}
		"final" field specifies if the hypothesis is finalized or not. 
		"text" field gives decoded output of utterance.
		"status" defines if the utterance is decoded successfully or not.
		*/
                inStream = server.getInputStream();
                if (inStream.available() >1) {
                    byte[] r = new byte[inStream.available()];
                    inStream.read(r);
                    String result = new String(r);
                    System.out.println(result);
                }

		/* Send mic stream to server as byte array */
                try {
                    out.write(targetData, 0, numBytesRead);
                } catch (Exception e) {
                    System.out.println(e);
                }
            }

        } catch (Exception e) {
            System.err.println(e);
        }