I’d like to take a moment to talk about doing research. I’m currently working towards obtaining my MS degree, which entails either completing a thesis, or passing a (difficult) test. I think the thesis route is generally more useful as a whole, even though it’s so much more scary, which is pretty much why I’m headed that direction. I’m theoretically doing research in multicore systems/parallel languages, although it really is difficult to come up with a testable, unique idea. Of course, if that wasn’t the case then everyone would be doing it.
In any case, recently I’ve been reading a lot about PGAS languages, particularly UPC, and the runtimes associated with it. The main component that enables performance on distributed systems is RDMA functionality, which in the Berkeley UPC case, is supplied by GASnet, a communication library that abstracts the underlying communication fabric (Infiniband, Myrinet, Ethernet, etc) for high performance remote memory access. Now, where things get interesting is that the HPC community seems to have agreed to some extent that the SPMD/PGAS model is an interesting candidate for future computation, most prominently due to CUDA, as NVIDIA’s graphics cards have enabled some incredible numbers on the latest TOP500 listings. Since today’s commodity chips actually derive quite a bit (more than you expect) from yesterday’s supercomputers, any sort of interesting functionality enabled by the communication network of today’s supercomputers should be applicable to a future many-core (32+) chip. The key point here is that although the appearance of the system is quite different (think Jaguar vs TILE64), the architecture is really quite similar, because as the number of cores on chip increases, the need for a sophisticated communication network, very much like those found in today’s massive supercomputers, becomes much more apparent. It is here that my research efforts are currently directed.
While all this is well and good, my problem is that I’m still lacking “the idea”, although I have a suspicion that is due to me searching for some grand unifying theory of many-core systems as it were, instead of concentrating on finding some salient point of interest and digging from there.
I didn’t expect to be this busy with stuff this quarter, class projects and such have been taking up all my time, and I STILL haven’t gotten around to finishing the resample/background instrumentals functionality for MicDroid.
So here’s a short list of what I HAVE been working on this quarter:
1. UCLA CS 259 project: using high level synthesis tools like Autopilot to generate a custom FPGA for accelerating medical imaging benchmarks.
3. UCLA CS 180: yes, the algorithms class I didn’t do so well in as an undergraduate class is back!
As an undergrad majoring in Computer Science, the one class that will get you a job is going to be your algorithms class. It’s also one of the few classes I didn’t do so well in as an undergrad, which is why I’ve got to take the course again as a graduate student. That said it doesn’t help that it feels like a remedial course this time around. Still, the timing was incredibly useful since there is a lot of interviewing to be done this quarter, so reviewing things like hash tables, stacks, tree traversal, sorting, and big-O notation in class was a great help.
Big-O notation surprisingly is by far the subject that comes up the most, since every interviewer ever will ask for the (typically time, sometimes space too) complexity of every algorithm you design during an interview. Reviewing average-case complexity for things like hash tables and quicksort is also worthwhile, since those questions do (and did for me) come up.
Next up is tree traversal, which is surprisingly common, most typically in the context of search, pointers, and recursion. The most common questions I’ve heard this year ask for either tree comparison or search. Usually this involves designing a tree structure, including nodes using pointers, or references, followed by implementing a recursive search implementation of depth-first search. The best advice I have for this is to keep things simple, and be very familiar with your language of choice.
Sorting is a perennial favorite for interviews, since sorting algorithms cover the entire range in time and complexity. But for an interview, there’s no need to know all the sorting algorithms in existence, just pick one simple one with linear time complexity, and another with O(log n) complexity, and that should be enough. It’s important to be able to re-implement your chosen algorithms from scratch on the fly. I’ve had a few interviews where I had to do this as part of a larger question, so make sure to practice!
The last, most interesting point I’ve noticed about interview questions is the frequent occurrence of hash tables. There are many cases where the question asked has structure such that comparison and lookup are linked very tightly, a perfect application of hash tables. A typical example question will be something like finding duplicate elements in a list, or anything that is attempting to uniquely identify some subset of data. This is where the comparison+lookup structure of the program lies. When the interviewer is unsatisfied with the time complexity of the problem, I have found they are frequently looking to speed up some part of the algorithm through use of a hash table. Just be certain that it is completely clear a hash table lookup is average case constant time, and worst case linear time.
My interview experience had 90% of the questions covered by the topics I’ve posted about above. The other 10% of questions tended to be more in-depth questions that link a few different topics together. I’m looking forward to finishing up my interviews and finding out where I’ll be headed this summer. But for now I plan to enjoy my break, and tentatively finish up some long overdue work.
I’ve heard T-Pain himself mentioned that the famous I Am T-Pain app will be coming to Android, and that there will be much rejoicing. What does this mean for MicDroid? Well, despite the fact that I definitely don’t have the backing of a company to equalize on features, I don’t plan on closing up shop. It will however let me work at a more leisurely pace (as if the pace I work at isn’t leisurely enough already), because there are a few things which are interesting enough to me to keep me working on this project.
Those of you out there who actually keep up to date with my commits on github will probably have noticed that MicDroid is in fact capable of background music now, albeit with some pretty serious caveats. First and foremost is the fact that in order to set a track as background music, it is required that the track be in WAVE format, and not MP3, AAC, or any of the other less popular audio formats. This is primarily because Android does not expose any sort of audio decoder (or encoder) API to developers, despite the fact that most phones have hardware capability to do so. The current workaround is to use Lame4Android to decode MP3 to WAVE, then set that WAVE as backgrond music. Ideally I’d like to add liblame to MicDroid and just decode MP3s on the fly, but the MP3 codec does in fact have licensing issues, and I’d prefer not to deal with that for now, at least not until other more interesting problems have been solved.
The other two remaining issues that I’d like to look at are echo cancellation and audio resampling. Both of these are necessary since phones may be recording at an arbitrary sample rate, while background music may be playing at a different sample rate. Since MicDroid uses a simplistic Output = Mic / 2 + Music / 2 sort of formula to mix, this means the PCM data input to the mixer needs to be sampled at the same rate. Currently I’m looking to tap CCRMA’s resample library to do this, which means reading and understanding the code. Sadly my background is in software, so I don’t have the DSP knowledge to make it a trivial task, so this may take longer than I hope. Secondly, the echo cancellation, which will make it far easier to deal with live correction (hopefully), is something else I’d love to add in. Personally I’m actually not too sure there’s a lot that can be done, due to the lack of any low latency audio API in Android, but I think it’s a direction worth looking into. The best sources for this seem to lie in VOIP software and solutions, and I’m looking into sourcing some code from Oslec and csipsimple. Again, DSP is not my strong suit, so it remains to be seen how much I can do with the code.
So in summary, there are still plenty of things left to build, and plenty of directions to expand. I’m thinking that once I get basic audio mixing capabilities functioning using resampling I will release an update with instrumental support as an experimental feature. Additionally, it is my plan to do a write up about the various audio processing that MicDroid does as well.
Until then though, I have plenty of coding to do.
I’ve put an updated version of Lame4Android out on the Android Market today, and while I’m glad to say it now supports decoding WAV->MP3, it does still have one small glitch, where feeding in files created using Lame4Android will not encode properly. I have a feeling that this is due to the first few bytes of the file not being skipped correctly, which causes the encoder to crap out immediately. I will have a fix for this soon.
Now, on to what I wanted to talk about. Any programmer over the span of their career will have written an API of some shape or form. It may not be good, or widely used, but they’ll have done it. This is one of those parts of programming you wish they taught everyone else in school, along with variable naming and spacing, in that you’ll wish everyone wrote their APIs like you would. It’s also one of those things where everyone can tell when there is a good API, but cannot necessarily write one themselves.
I’ve had plenty of experience (attempting) to write APIs, but the first time I truly had to worry about what I was writing was when I worked at Deluxe. At the time I was given the task of re-writing the framework on which all of the Universal Blu-Ray releases were written on, since the previous version was getting a bit long in the tooth, and did not support some of the newer base framework functionality. Additionally it was full of strange quirks and unnecessary wrapper functions. Now here I think is the crux of writing a good API: exposing the right amount of magic to the client program. What I mean by this is that the amount of magic the API does behind the scenes to provide the functionality available underneath must not be too little, nor too much. Too little magic in the API and you’re left with a nothing more than a thin wrapper around the underlying functionality. An example of this would have been the Universal Blu-Ray framework I was talking about earlier. In quite a few places the old framework would provide wrapper functions to switch audio tracks (think switching from French audio to German) that was nothing more than a wrapper around the underlying base framework’s switch audio track functions. On the other hand, too much magic in the API leaves you with a very inflexible set of exposed functionality. An example I’m thinking of would be the Android MediaRecord API. While the designers expose pretty much all the functionality a developer would need to record audio, it just isn’t very configurable. You can only record in a certain format, and then only directly to a file, not to a buffer, or to the speaker, or anything else. Nor can you (not until Gingerbread at least) apply effects to the recorded stream. In order to do any of the above, you as a developer must use AudioRecord, which only outputs to a buffer that the developer must manipulate. Too much work is done by the underlying layers, all of it magical to the client programs.
Now to get back to the point of everything though. Today I’d like to say the liblame API is finished (or as finished as a hobby project ever gets), and I’d like to say I did a good job in exposing enough magic to anyone who would like to use it. Please let me know if you do, I’d love to hear from you.
I’ve recently been bitten (again) by Samsung Galaxy S AudioRecord bugs, and after having to ignore it to focus on classwork for the past few weeks, it’s time to get back into it. The last MicDroid update featured improved error handling due to proper (sort of) use of exceptions instead of an Android Handler to route all exceptions to error handling code. Unfortunately this broke Galaxy S support due to what I believe is described in these posts. It appears the Galaxy S phone just locks up the AudioRecord if you try AudioRecord.startRecording() while it is initialized improperly. This behavior certainly seems consistent with the (mass of) bug reports I’ve been getting from users. I think I’m going to have to try some of the dirty hacks mentioned above to fix it.
Another project which appears not to have had too many issues with the Galaxy S series is the rather famous Sipdroid. They also have AudioRecord code which supposedly works on the Galaxy S series. They use an interesting method of delaying until the next frame is read before reading from AudioRecord again. I’m fully intending to try this method out also, since if it works, it means I can potentially simplify a lot of the complex recording code current around. Also, it could potentially mean fixing the horrible buffer size hacks I’m using right now and finally allow the Galaxy S series to take advantage of live recording.
Should I get it to actually work, I definitely plan on releasing details.
Finally, the last, best hope for the Galaxy S series is the upcoming release of an official 2.2 ROM. There have been rumors that the 2.2 ROM fixes a lot of the audio bugs that plague the 2.1 Galaxy S ROM, and should an update that fixes these fully roll out, all of these horrible Galaxy S problems should be solved. Obviously this is the best solution for everyone :)
Something else I’ve been throwing around in my head has been the fact that the Galaxy S has a relatively slow internal SD card. I’m wondering if this has any effect on recording, especially if AudioRecord.read() reads audio at a rate faster than the phone can write.
It’s problems like these that really have me looking forward to Gingerbread, as it’s OpenSL and effects pipeline support can potentially go a LONG way towards resolving basic audio functionality issues like these. Now if only we could get Google to release the source code and Samsung to follow standards and release fully baked hardware…
Google announced Gingerbread today and the list of new shiny things to play with is quite long. It includes things like a complete native API for activities/OpenGL ES/OpenSL ES/sensors, audio effect API, and NFC. Of particular interest to me is the OpenSL ES, audio effects, and AAC encoding APIs. It’s very obvious to see that these platform changes are meant to push Android gaming in a serious way. Google is hoping game developers will sign on, meaning we might see quite a few more corporate-produced games. All of this will mean a lot of interesting things for the future, especially for MicDroid. Unfortunately all of these fancy new toys are available only under 2.3, meaning we will all need to start pushing for updates from our phone providers. :)
Finally, I’d like to take a minute to talk about MicDroid and updates. Especially the fact there haven’t been too many in recent weeks. The most obvious reason for this is that I’m back in school, and classes are taking up a significant portion of my time. I’m hoping (again) to get more free time in the coming 3 week holiday break to put some fixes out for Galaxy S (which I cannot test easily), and (finally) finish up LAME MP3 support and background music import. Additionally, I’d like to take a moment to apologize to Galaxy S users for the latest update. I’m not certain where the failure is coming from, but my suspicion is the audio hardware initialization and wave writing code. I can’t really imagine going out to pick up a Galaxy S phone, but if any kind soul out there would like to lend some Galaxy S testing time, it would be much appreciated. Finally, I’m also going to be looking at integrating new 2.3 features, as well as hoping that Google drops the 2.3 source code soon :)
Been reading articles about Partitioned Global Address Space since a week or two ago, and I have to say, the programming model is exactly the same as CUDA. The PGAS programming model languages (most prominent of which seems to be Unified Parallel C) were originally developed for use in supercomputer class machines such as the Cray T3D, SGI Origin 3000 (I love reading about these machines!) and Beowulf clusters. The main idea of these languages is to take advantage of the copious parallel resources of these machines by splitting the memory into several levels local to each processor. CUDA does this by dividing the memory and cache available to the stream processors on the GPU into thread local, block shared, and program global. These pretty much correspond to the private, shared, and global memory areas of Unified Parallel C, with the same thread restrictions. As I mentioned earlier, threads can be blocked. This is basically a logical bundling of threads meant for use on a particular task, as UPC and CUDA have shared memory for this end. The typical use of this block shared memory is to allocate each thread a small slice of shared memory to write results into. As a result, to obtain maximum performance in both UPC and CUDA, intelligent layout of data in block shared memory is paramount.
In terms of language features, UPC and CUDA differ only very slightly, the primary differences being the more flexible memory layout of UPC and the more restricted language of CUDA. Underneath the language runtime though, the architecture is also strikingly similar. On one hand you have machines like the Cray T3D, SGI Origin 3000, etc, which can arguably be considered ancestors to today’s GPUs in a sense. The Crays and Origins of the past are typically massively parallel shared memory systems. Each compute board contains a few processing units, associated cache, and shared memory, as well as interconnects to many other similar compute boards. The memory structure of these machines can be mapped to the PGAS programming model fairly easily. Thread private data can reside in the caches present on the compute boards, while each block can be represented by an single compute board, or even a compute cabinet, with the aggregate memory used as block shared memory. Global memory can be thought of as the control/loader machine’s memory. Interestingly, while the model described above can be considered relatively intuitive, the truth is that due to the hardware implementation of machines like the T3D and Origin 3000, it was frequently the case that partitioned global address space language performance may not have been as fast as a traditional message passing approach, or may have even been implemented using message passing. This was especially true in the case of the T3D, as it’s specialized “shell” circuitry actually made the implementation of a global address space language quite difficult. In a fair amount of papers, strict comparisons against different parallel programming models are not as common as I would have liked. But let’s move on to CUDA for now.
CUDA on the other hand relies on a control/loader machine, the PC, as well to initialize running of CUDA programs. Global memory can be thought of as this control machine’s memory as well. From there it can load up and distribute sections of work to each logical thread block and associated shared memory. The processing elements (NVIDIA would have you call them stream processors I believe) of the GPU are also very similar to a compute board in that there are many of them, and each element has several processors with associated cache, and access to some of the global memory. (the memory model is more complex than I’m going into here, but IIRC, for some of the later GPUs, the number of processing elements is directly tied to memory bandwidth in a NUMA architecture of sorts) In many ways then, CUDA is the essence of late 80s/early 90s supercomputers distilled into a single add-in board for our modern PCs. Who says supercomputer technology is useless for the average consumer?
I actually made a post on reddit about this a little bit ago, so I thought it might be a good idea to explain in more detail here.
There are an awful lot of settings available in MicDroid, and here’s a bit more explanation about what they do.
Prevent Screen Lock: Self-explanatory, screen won’t turn off while recording if this is on.
Enable Ads: Self-explanatory, disable this to not show ads.
Enable Live Correction: Enables playback of tuned sound over speakers in realtime. Requires headphones, otherwise you get really nasty feedback. Also requires that your phone be powerful enough to run it. Realistically this means anything with an ARM v7 chip, which is pretty much all phones after the Motorola Droid 1 except the MyTouch Slide, HTC Aria, and any Samsung phone not as powerful as the Galaxy S (Galaxy, Intercept, etc).
Change Sample Rate: Allows user to change the sample rate of the recording from 44.1kHz to 8kHz. Higher numbers means better sound typically, although it does require your phone to be a bit more powerful. Also, not all phones can record at all sample rates. If you are having trouble getting your phone to record, try changing this to 8kHz or 22kHz.
Change Buffer Size: Allows user to change the buffer size (indirectly). Larger values means the buffer is larger. This value is actually a multiplier, so the difference between an 8x value and a 16x value is VERY large. Larger buffers mean there will be less skipping if your phone can’t keep up with the recording and processing. Larger buffers also take much longer to fill than smaller ones, so if you are using Live Correction, it is HIGHLY recommended you keep this number SMALL. Samsung Galaxy S* users are recommended to bump up this setting if they experience trouble recording.
Change Key: Self-explanatory, this allows the user to change the key the program is tuning to. For purposes of this program, minor keys are the same as the relative major key. Musically this is not exactly correct, but due to the way the program processes the notes (it looks at key signature, not the order of notes), there is no difference.
Pull to Fixed Pitch: This oddly named setting controls the degree to which the pitch you hear is your voice, versus a specific pitch (Concert A by default). Higher numbers make it sound more like a generated tone. This should be defaulted to 0.0.
Pitch Shift: Controls how many notes the output is shifted by. Setting negative numbers tends to make the sound deeper, positive numbers tends to make the sound squeakier. Usually you don’t need to change this setting too much, maybe a few notes negative if you find the default makes you sound rather chipmunk-like.
Correction Strength: This aptly named setting controls the strength of the auto-correction. Higher numbers implies you want the program to make your voice sound more like the correct pitch. Leaving this at Full is generally fine.
Correction Smoothness: This setting controls how gradual the transition is from tuned note to tuned note. For a more natural sound (similar to what is used for vocals in most songs), set this to ‘Some’ or more. For the T-Pain/Madonna effect, this NEEDS to be set to ‘No Smoothing’. The abrupt transitions between tuned notes is what gives this setting it’s signature robotic sound.
Formant Correction: Allows formant correction. This option allows the following to be changed, but does nothing by itself if the Warp Amount is not changed.
Correction Warp Amount: This setting is meant to fix the chipmunk sound that frequently happens. Negative values attempt to counteract this by making the sound deeper. Positive values seem to make the sound even squeakier. I personally find that this setting is generally unneeded, as pitch-shifting tends to give a better result. Probably a better idea to use negative numbers, if at all.
Mix: This setting changes the amount of tuned (processed) sound output versus original (unprocessed) sound. Generally you want the tuned signal, as that’s why you’re running this app. :P
Reset to Default: self-explanatory, resets all settings to default.
MicDroid 0.40 has been released!
Release highlights include many force close errors fixed, including some more weird async task related rotation issues. Also there have been ads added to the recording library. AdMob ads can be removed via a setting in the options however, they’re purely optional. Code for instrumental support has been added, however it is not enabled, since it only supports wave files currently, and that isn’t much use for most users. Future work primarily involves getting LAME built as a library for proper mp3 support. Hopefully this release will be relatively trouble-free, and have fixed most of the outstanding issues from 0.39. Thanks for your support!
I’ve been pretty busy with school (I know I keep saying it) so I haven’t had as much time to devote to Android related things recently, and for that I apologize.
My girlfriend made a giant Android cake for my birthday, and it was delicious! I bought a G2 that same day to replace my Nexus One, because I’ve been waiting for a high-spec Android slider on T-Mobile for nearly a year. Now all it needs is permanent root… Also, I’m selling my Nexus One, so if anyone out there is interested, feel free to email me about it.
The latest work available on GitHub shows that backing instrumentals are now working, although the functionality is pretty rough, and is limited to wave files only right now. The next logical step is to make the instrumental selection and playback/record functionality a little more user friendly, with a nice countdown toast (or overlay if I can find someone to do that for me), and a nice instrumental selection menu. Additionally, mp3 support via LAME will be included before I release again, since that is one of the big items people have been waiting for, and would allow music library import, mp3 saving, and MMS support (MMS requires mp3 format). This means the next MicDroid release is probably a while off. I’d like to include an ad library too, partly because I intend to get familiar with the API, and partly because ad revenue would be nice too ;) although I fully intend to leave the user an option to disable ads. Finally, if anyone wishes to help out with MicDroid development, feel free to fork at GitHub and submit patches back to me.
This is pretty much stalled. Fortunately OpenTyrian on the Android Market is at a very playable state. I should concentrate on getting mp3 support for MicDroid done first.
Classwork comes first now. I don’t intend to repeat my undergraduate academic performance.